« DB XML with Python | Main | Google Summer of Code 2008 »

February 14, 2008

Parsing JSON into XQuery

Doug Crockford stirred things up at XML 2007 with his comments on JSON and XML. He was wrong - mainly because he's only looking at structured data transfer, rather than anything else that XML is very good at like documents and semi-structured data. However he got me thinking about how easy it would be to process JSON in XQuery, which as Data Direct has shown is very good at manipulating all sorts of data formats.

I headed over to json.org to take a look at the work that had already been done on converting JSON to and from XML. Then I googled, and read around. I even had an email conversation with Dimitre Novatchev of FXSL fame, who's written a JSON parser entirely in XSLT!

All of the designs I looked at had at least one of the following problems:

  1. They could only convert a subset of JSON - things like map keys that weren't valid NCNames would cause problems.
  2. They didn't specify a 1-1 mapping - in other cases map keys were munged to be valid NCNames using a function with no inverse. This would mean that I wouldn't be able to convert back to JSON from it's XML representation.
  3. They lost the JSON type information, like whether a value was null or an empty string - which is also a way in which the mapping is not 1-1.

So given nothing fitted I chose to come up with my own mapping from JSON to XML - which is, after all, the JSON way. So here it is, in all it's simplicity:

JSONtype(JSON)toXML(JSON)
JSON N/A <json type="type(JSON)">toXML(JSON)</json>
{ "key1": value1, "key2": value2 }
object
<pair name="key1" type="type(value1)">toXML(value1)</pair>
<pair name="key2" type="type(value2)">toXML(value2)</pair>
  
[ value1, value2 ]
array
<item type="type(value1)">toXML(value1)</item>
<item type="type(value2)">toXML(value2)</item>
  
"value"
string
value
number
number
number
true / false
boolean
true / false
null
null empty

The table defines two abstract functions "type" and "toXML", which are recursively defined on the structure of the input JSON. The extension functions to parse and serialize JSON are called xqilla:parse-json() and xqilla:serialize-json(), and will available in the next release of XQilla.

xqilla:parse-json($xml as xs:string?) as element()?
xqilla:serialize-json($json-xml as element()?) as xs:string?

The translation produces a simple generic XML document - as an example, here's a query to parse a sample of JSON (swiped from wikipedia):

xqilla:parse-json('{
     "firstName": "John",
     "lastName": "Smith",
     "address": {
         "streetAddress": "21 2nd Street",
         "city": "New York",
         "state": "NY",
         "postalCode": 10021
     },
     "phoneNumbers": [
         "212 732-1234",
         "646 123-4567"
     ]
 }')

And here's its translation, the result of the query:

<json type='object'>
  <pair name='firstName' type='string'>John</pair>
  <pair name='lastName' type='string'>Smith</pair>
  <pair name='address' type='object'>
    <pair name='streetAddress' type='string'>21 2nd Street</pair>
    <pair name='city' type='string'>New York</pair>
    <pair name='state' type='string'>NY</pair>
    <pair name='postalCode' type='number'>10021</pair>
  </pair>
  <pair name='phoneNumbers' type='array'>
    <item type='string'>212 732-1234</item>
    <item type='string'>646 123-4567</item>
  </pair>
</json>

One of the nice things about the XML format for JSON is that it's easy to navigate. If I want to get the city from the JSON object above, the XQuery for it would be:

xqilla:parse-json("...")/pair[@name="address"]/pair[@name="city"]

Or if I want to get both the phone numbers I could use:

xqilla:parse-json("...")/pair[@name="phoneNumbers"]/item

Now I probably have to do a follow up post about all the cool things you can do in XQuery when you can parse JSON...

Posted by john at February 14, 2008 05:21 PM

Comments

Hello John !


I think it would be nice to go a little bit further achieving something like this :
http://code.google.com/apis/gdata/json.html


Instead of writing xqilla:parse-json("...")/pair[@name="address"]/pair[@name="city"] we would be able to write xqilla:parse-json("...")/address/@city closer to what we would write using JSON.


Using what you have already done, node constructors and fn:QName(), it seems doable.


Cheers,
Rémi

Posted by: Rémi at February 20, 2008 12:26 PM

Hi Rémi,

I'd love to do that, since I can certainly see that it's easier to navigate. However it wouldn't be a 1-1 mapping if I did, since I can't translate a JSON key of "My Dog" into an XML NCName without some kind of lossy kludge.

John

Posted by: John Snelson at February 20, 2008 03:52 PM

John,

Some conventions like $ to convey namespaces and $t for text nodes from the google example and another one for spaces would do an fully reversable XML JSON transformation.

Cheers,
Rémi

Posted by: Rémi at February 21, 2008 09:06 AM

Hum, not fully reversable at least because of mixed content. But something workable...

Posted by: Rémi at February 21, 2008 09:33 AM

In this instance I'm looking specifically at mapping any JSON instance into XML. Lots of people (including the Google website you linked) have also looked at converting any XML document into JSON - to varying degrees of success.

A solution to one of these problems does not solve the other, so Google's mapping for (a subset of) XML documents into JSON cannot work to map an arbitrary JSON instance into XML.

John

Posted by: John Snelson at February 21, 2008 11:48 PM

You're right. What you've done on xqilla:parse-json is good stuff since it allows some further transformations for specific mappings without taking care of the JSON parsing.

Rémi

Posted by: Rémi at February 22, 2008 10:01 AM