Is it possible to make lxml use hex instead of decimal for unicode entities?
I am porting a perl/SAX tool to python/lxml. Ideally, given the same input, the new tool should produce the same output as the old tool. In fact, it introduces a number of problems for me if this is not the case. One annoying problem I am encountering is that SAX seems to store unicode entity IDs in hex, whereas lxml uses decimal, regardless of what value is used in the input:
>>> import lxml.etree as etree
>>> example_sax_output = "<foo>Copyright © 2009 Foocorp, Inc</foo>" # Note: xA9
>>> e = etree.fromstrin
>>> etree.tostring(e)
<foo>Copyright © 2009 Foocorp, Inc</foo> # Note: 169
Is it possible to avoid this without doing something horribly kludgey like going through the output with a regex search and manually converting the values to hex?
Question information
- Language:
- English Edit question
- Status:
- Solved
- For:
- lxml Edit question
- Assignee:
- No assignee Edit question
- Solved by:
- usernamenumber
- Solved:
- Last query:
- Last reply: