HTMLFilter, in its first public standalone release, is a module for Python programs. It parses an HTML 4 document, allowing subclasses to pass through or modify the the text and tags as they go by. The resulting copy will be an otherwise exact replica of the original, including whitespace and comments. ASP, PHP, JSP, or other server-side code will generally survive the round trip. (The only exception is if the code is embedded inside an HTML tag you’re actually modifying, not just passing through, and in most cases any tag attributes not explicitly modified are safe.)

The use can be as simple as adding a <meta> tag to an existing web page without disturbing the rest, or as complex as merging two HTML pages (as it’s used in ShearerSite, which intelligently merges content pages into template pages).

You can also use it to generate HTML from scratch, with HTMLFilter taking care of the attribute encoding for tags.

HTMLFilter. Python-licensed. Unicode and encoding-savvy. Tested with Python 1.5.2 through 2.3.

blog comments powered by Disqus