Also see the list of articles, none to be taken seriously.
HTMLFilter, in its first public standalone release, is a module for Python programs. It parses an HTML 4 document, allowing subclasses to pass through or modify the the text and tags as they go by. The resulting copy will be an otherwise exact replica of the original, including whitespace and comments. ASP, PHP, JSP, or other server-side code will generally survive the round trip. (The only exception is if the code is embedded inside an HTML tag you’re actually modifying, not just passing through, and in most cases any tag attributes not explicitly modified are safe.)
The use can be as simple as adding a <meta> tag to an existing web page without disturbing the rest, or as complex as merging two HTML pages (as it’s used in ShearerSite, which intelligently merges content pages into template pages).
You can also use it to generate HTML from scratch, with HTMLFilter taking care of the attribute encoding for tags.
HTMLFilter. Python-licensed. Unicode and encoding-savvy. Tested with Python 1.5.2 through 2.3.
I just released version 1.1 of XMLFilter, which marks the first public standalone release. XMLFilter is an open-source Python module you can include with your programs to provide XML parsing even if the target system lacks a working xml.sax package. You can use it to quickly adapt existing xml.sax-compatible scripts to work out of the box, for example, on Jaguar (Mac OS X 10.2), which lacks expat.
It works by using the older xmllib module as a fallback for xml.sax. A test suite verifies call-by-call compatibility no matter which module ends up being used.
Other features include XML event-stream filtering, writing, and creation, with support for writing CDATA sections. (Using these classes also avoids bugs in some versions of xml.sax.)
Generally, the newer your version of Python, the faster it goes. For example, if xml.sax and expat are working, they give a factor-of-3 speedup over the pure-Python xmllib, and on Python 2.3, Unicode encoding conversions will use xmlcharrefreplace for faster writing of XML numeric entities.
Python-licensed. Tested all the way down to Python 1.5.2 and up to Python 2.3. xml.sax-compatible, Unicode-savvy (wherever Python is), and optionally namespace-aware.