I recently needed to add version control information into a file which had data stored in json format. As it turns out, I’m not the only one who learned about the issues surrounding json and comments. I wanted an HTML style comment at the end of the file. The solution was real simple, when using pyparsing. Below is an example so you can judge for yourself.
import pyparsing
import simplejson
class JsonAndHtmlCommentDecoder(simplejson.JSONDecoder):
def raw_decode(self, s, idx=0):
try:
obj, end = self.scan_once(s, idx)
except StopIteration:
raise ValueError("No JSON object could be decoded")
except Exception, e:
print e
# The calling method will raise an error when the value of end is
# less than the length of the input string.
try:
pyparsing.htmlComment.parseString(s[end:], parseAll = True)
end = len(s)
except pyparsing.ParseException, e:
pass
return obj, end
if __name__ == '__main__':
# input string was copied from:
# http://json.org/example.html
json_input = '''{"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}}
<!-- Source control information -->
'''
print simplejson.loads(json_input, cls=JsonAndHtmlCommentDecoder)
# The statement below will fail with a simplejson.decoder.JSONDecodeError
print simplejson.loads(json_input)
When executed, it will output this:
{u'menu': {u'popup': {u'menuitem': [{u'onclick': u'CreateNewDoc()', u'value': u'New'}, {u'onclick': u'OpenDoc()', u'value': u'Open'}, {u'onclick': u'CloseDoc()', u'value': u'Close'}]}, u'id': u'file', u'value': u'File'}}
Traceback (most recent call last):
File "...projecten\dx\aa.py", line 45, in ?
print simplejson.loads(json_input)
File "...\python2.4\lib\simplejson\__init__.py", line 413, in loads
return _default_decoder.decode(s)
File "...\python2.4\lib\simplejson\decoder.py", line 405, in decode
raise JSONDecodeError("Extra data", s, end, len(s))
simplejson.decoder.JSONDecodeError: Extra data: line 12 column 1 - line 13 column 1 (char 242 - 256)
The first line prints the output of an successful attempt to decode json. The stacktrace is what you get when the default decoder attempts to decode the string and fails. I’ll describe what I did in the next paragraph.
The design of the simplejson.loads method allows for an alternate JSONDecoder. In the example above the custom decoder is named JsonAndHtmlCommentDecoder
. This decoder parses the remaining bytes and kicks in when scan_once
finishes without raising an exception. No problem when there is no remainder, no problem when the remainder contains a HTML style comment, but there is a problem when the remainder contains something else. The calling method uses the return value of raw_decode
to determine the decode state; there is no error when it matches the length of the input string and there is an error otherwise.
This, of course, is just a simple example. One could do more magic when the results of parseString
would be interpreted or when you would use scanString
. Check the documentation at packages.python.org to see for yourself. When you take a minute to review the pyparsing documentation, be sure to take a look at the Variables. You’ll see it’s not limited to htmlComments
.
Comments, improvements or praise? Let me know.