RSSFilter is a Python module that builds on XMLFilter to parse and modify RSS feeds. It exposes an interface for filtering RSS files with high-level operations such as adding, listing, editing, and removing posts. It can treat blogBrowser date-based archives as one big file (with optimizations to find posts quickly based on date). It can get all posts matching given criteria (category, date range, and/or post id), or add, modify, or delete posts.
It is distributed under a Python license.
[Download rssweblog.py; 48K]
class RSSFilter(XMLFilter.XMLFilter): """XMLFilter that (optionally) parses each item into an RSSItem instance instead of passing the xml code through. At the start of the item, self.shouldParseItem() returns a boolean, which if true causes all XML to be diverted to a new post object stored as self._currentitem. While self._currentitem is None, the XML is passed through as usual.""" def __init__(self, nextFilter): ... def shouldParseItem(self): """overrideable""" return 1 def itemFinished(self, item): """overrideable""" pass class RSSAdder(XMLFilter.XMLFilter): """Prepend a post to an RSS XML stream. Not necessary to inherit from RSSFilter because we don't need to parse any RSS items.""" def __init__(self, out, newPost): ... class RSSEditor(RSSFilter): """Filter an XML RSS stream, replacing a particular post with an updated version. The new post is substituted when a target postid comes along. """ def __init__(self, out, postid, newPost): ... class RSSReplacer(XMLFilter.XMLFilter): """Filter an XML RSS stream, dropping all posts and replacing them with the given posts, if any. The channel info is preserved, making this useful for making a new empty file from a 'sample' RSS file. """ def __init__(self, nextFilter, items = []): ... class RSSLister(RSSFilter): """Accumulate the parsed RSS items into a big Python list, up to an optional maximum number of items.""" def __init__(self, maxposts = None): ... def getResult(self): """Return the list of accumulated posts.""" ... class RSSFilteredLister(RSSFilter): """Accumulate the parsed RSS items into a big Python list, up to an optional maximum number of items.""" def __init__(self, minDate = None, maxDate = None, minNumber = None, maxNumber = None, category = None): ... def getResult(self): """Return the list of accumulated posts.""" ... class RSSGetPostID(RSSFilter): def __init__(self, postid = None, guid = None): """postid is a string that looks like an integer, for Blogger API clients, which may not handle anything more. guid is the actual value from the RSS file, which may happen to contain the postid. Only specify one.""" ... def getResult(self): ... class RSSPostIDChecker(RSSFilter): """count the number of occurrences of the given postid in an RSS file""" def __init__(self, postid): ... def getResult(self): ... class RSSException(Exception): pass class NoPostIDException(RSSException): pass class NoChannelForNewItemException(RSSException): pass class MissingRSSFileException(RSSException): pass class NoSampleRSSFileException(RSSException): pass
import rssweblog weblogInfo = { 'path': r'/Users/testuser/Sites/rss.xml', 'permaLinkFormat': 'http://www.example.com/%Y/%m/%d#p%m%d%H%M%S', 'categories': [ {'name': ''}, {'name': 'Software', 'description': ''}, {'name': 'Technology'}, {'name': 'Mac OS X'}, {'name': 'Politics'}, {'name': 'Outdoors'}, {'name': 'Pictures'}, ]} # weblogInfo dict can contain 'path' (string) or 'stream' (file-like object) # as well as optional 'categories', 'guidFormat', and 'permaLinkFormat' members # and, for blogBrowser archives, 'recent-file' and 'recent-max'. # The permaLinkFormat, guidFormat and categories items are purely for the use # of clients; rssweblog only provides accessors. # The most important item is 'path', which can be a folder or a file. # (Or, use 'stream' instead and assign it a file-like object.) # When 'path' points to a folder, 'recent-file' and 'recent-max' are # also used. If 'recent-max' is set to, say, 15, the last 15 posts # in the blogBrowser archive are also saved to the file specified by # 'recent-file', after any operation which modifies the archive. testWeblog = rssweblog.WeblogFactory(weblogInfo) count = testWeblog.readRSS(rssweblog.RSSPostIDChecker, (postid,), postid = postid) # For this example, content is a variable containing a MetaWeblog post, # as passed directly from xmlrpclib. postid is the post id to change. item = rssweblog.RSSItem(testWeblog) item.setFromMetaWeblogFormat(content) item.setModificationDate() item.setPostID(postid) success = testWeblog.modifyRSS(rssweblog.RSSEditor, (postid, item)) if not success: raise PostNotFoundError, "The edited post not be saved, because the original post with" " the specified ID could not be found."
1.5.2 2003-07-16 Andrew Shearer Prepend filename (minus dir path) to UnicodeError messages when parsing a weblogArchive; xmllib sometimes throws UTF-8 errors and otherwise there's no way to tell which file they came from. 1.5.1 2003-07-15 Andrew Shearer Accept W3CDate, not just xmlrpclib.DateTime, in the MetaWeblog API's dateCreated member. This allows the caller (the XML-RPC newPost handler) to pass the current date with timezone and DST intact. 1.5 2003-07-07 Andrew Shearer Read-only support for RSS 1.0, as well as RSS 2.0 in a namespace. flNotOnHomePage support. Editing a post to remove all categories now works. Renamed ISO8601Date to W3CDate; moved W3CDate and XMLFilter to their own modules.