Parsing JSON into XQuery
Doug Crockford stirred things up at XML 2007 with his comments on JSON and XML. He was wrong - mainly because he's only looking at structured data transfer, rather than anything else that XML is very good at like documents and semi-structured data. However he got me thinking about how easy it would be to process JSON in XQuery, which as Data Direct has shown is very good at manipulating all sorts of data formats.
I headed over to json.org to take a look at the work that had already been done on converting JSON to and from XML. Then I googled, and read around. I even had an email conversation with Dimitre Novatchev of FXSL fame, who's written a JSON parser entirely in XSLT!
All of the designs I looked at had at least one of the following problems:
- They could only convert a subset of JSON - things like map keys that weren't valid NCNames would cause problems.
- They didn't specify a 1-1 mapping - in other cases map keys were munged to be valid NCNames using a function with no inverse. This would mean that I wouldn't be able to convert back to JSON from it's XML representation.
- They lost the JSON type information, like whether a value was null or an empty string - which is also a way in which the mapping is not 1-1.
So given nothing fitted I chose to come up with my own mapping from JSON to XML - which is, after all, the JSON way. So here it is, in all it's simplicity:
| JSON | type(JSON) | toXML(JSON) |
|---|---|---|
| JSON | N/A | <json type="type(JSON)">toXML(JSON)</json> |
{ "key1": value1, "key2": value2 } |
object | <pair name="key1" type="type(value1)">toXML(value1)</pair> <pair name="key2" type="type(value2)">toXML(value2)</pair> |
[ value1, value2 ] |
array | <item type="type(value1)">toXML(value1)</item> <item type="type(value2)">toXML(value2)</item> |
"value" |
string | value |
number |
number | number |
true / false |
boolean | true / false |
null |
null | empty |
The table defines two abstract functions "type" and "toXML", which are recursively defined on the structure of the input JSON. The extension functions to parse and serialize JSON are called xqilla:parse-json() and xqilla:serialize-json(), and will available in the next release of XQilla.
xqilla:parse-json($xml as xs:string?) as element()? xqilla:serialize-json($json-xml as element()?) as xs:string?
The translation produces a simple generic XML document - as an example, here's a query to parse a sample of JSON (swiped from wikipedia):
xqilla:parse-json('{
"firstName": "John",
"lastName": "Smith",
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": 10021
},
"phoneNumbers": [
"212 732-1234",
"646 123-4567"
]
}')
And here's its translation, the result of the query:
<json type='object'>
<pair name='firstName' type='string'>John</pair>
<pair name='lastName' type='string'>Smith</pair>
<pair name='address' type='object'>
<pair name='streetAddress' type='string'>21 2nd Street</pair>
<pair name='city' type='string'>New York</pair>
<pair name='state' type='string'>NY</pair>
<pair name='postalCode' type='number'>10021</pair>
</pair>
<pair name='phoneNumbers' type='array'>
<item type='string'>212 732-1234</item>
<item type='string'>646 123-4567</item>
</pair>
</json>
One of the nice things about the XML format for JSON is that it's easy to navigate. If I want to get the city from the JSON object above, the XQuery for it would be:
xqilla:parse-json("...")/pair[@name="address"]/pair[@name="city"]
Or if I want to get both the phone numbers I could use:
xqilla:parse-json("...")/pair[@name="phoneNumbers"]/item
Now I probably have to do a follow up post about all the cool things you can do in XQuery when you can parse JSON...
Posted by john at February 14, 2008 05:21 PM
Hello John !
I think it would be nice to go a little bit further achieving something like this :
http://code.google.com/apis/gdata/json.html
Instead of writing xqilla:parse-json("...")/pair[@name="address"]/pair[@name="city"] we would be able to write xqilla:parse-json("...")/address/@city closer to what we would write using JSON.
Using what you have already done, node constructors and fn:QName(), it seems doable.
Cheers,
Rémi
Posted by: Rémi at February 20, 2008 12:26 PM