Andrew Shearer: RSSFilter

RSSFilter

Perform in-place operations on RSS files

RSSFilter is a Python module that builds on XMLFilter to parse and modify RSS feeds. It exposes an interface for filtering RSS files with high-level operations such as adding, listing, editing, and removing posts. It can treat blogBrowser date-based archives as one big file (with optimizations to find posts quickly based on date). It can get all posts matching given criteria (category, date range, and/or post id), or add, modify, or delete posts.

Specifications

Designed to be compatible with all versions of RSS (including 0.94, 1.0, 2.0, and 2.0 in a namespace) for reading. Modified RSS files will be saved in RSS 2.0 format.
Passing a path to a folder instead of a file activates blogBrowser support, with one subfolder for each year, consisting of up to 12 monthly RSS files.
Designed to be compatible with all versions of Python from 1.5.2 to 2.3, though the current version has only been tested with Python 2.2 and 2.3.

It is distributed under a Python license.

Dependencies

Uses W3CDate and XMLFilter.

RSSFilter Download

[Download rssweblog.py; 48K]

Included Classes

class RSSFilter(XMLFilter.XMLFilter):
    """XMLFilter that (optionally) parses each item into an RSSItem instance
    instead of passing the xml code through. At the start of the item,
    self.shouldParseItem() returns a boolean, which if true causes
    all XML to be diverted to a new post object stored as self._currentitem.
    While self._currentitem is None, the XML is passed through as usual."""
    def __init__(self, nextFilter):
        ...
        
    def shouldParseItem(self):
        """overrideable"""
        return 1
        
    def itemFinished(self, item):
        """overrideable"""
        pass
        
class RSSAdder(XMLFilter.XMLFilter):
    """Prepend a post to an RSS XML stream. Not necessary to inherit from
    RSSFilter because we don't need to parse any RSS items."""
    def __init__(self, out, newPost):
        ...
        
class RSSEditor(RSSFilter):
    """Filter an XML RSS stream, replacing a particular post with an updated version.
    The new post is substituted when a target postid comes along.
    """
    
    def __init__(self, out, postid, newPost):
        ...
        
class RSSReplacer(XMLFilter.XMLFilter):
    """Filter an XML RSS stream, dropping all posts and replacing them with
    the given posts, if any.
    The channel info is preserved, making this useful for making a new empty file
    from a 'sample' RSS file.
    """
    
    def __init__(self, nextFilter, items = []):
        ...
        
class RSSLister(RSSFilter):
    """Accumulate the parsed RSS items into a big Python list, up to an optional
    maximum number of items."""
    
    def __init__(self, maxposts = None):
        ...
        
    def getResult(self):
        """Return the list of accumulated posts."""
        ...

class RSSFilteredLister(RSSFilter):
    """Accumulate the parsed RSS items into a big Python list, up to an optional
    maximum number of items."""
    
    def __init__(self, minDate = None, maxDate = None,
        minNumber = None, maxNumber = None,
        category = None):
        ...

    def getResult(self):
        """Return the list of accumulated posts."""
        ...
    

class RSSGetPostID(RSSFilter):
    def __init__(self, postid = None, guid = None):
        """postid is a string that looks like an integer, for Blogger API clients,
        which may not handle anything more. guid is the actual value from the RSS
        file, which may happen to contain the postid. Only specify one."""
        ...
            
    def getResult(self):
        ...
    
class RSSPostIDChecker(RSSFilter):
    """count the number of occurrences of the given postid in an RSS file"""
    def __init__(self, postid):
        ...
    
    def getResult(self):
        ...
    

class RSSException(Exception):
    pass

class NoPostIDException(RSSException):
    pass

class NoChannelForNewItemException(RSSException):
    pass

class MissingRSSFileException(RSSException):
    pass

class NoSampleRSSFileException(RSSException):
    pass

Usage

import rssweblog

weblogInfo = {
      'path': r'/Users/testuser/Sites/rss.xml',
      'permaLinkFormat': 'http://www.example.com/%Y/%m/%d#p%m%d%H%M%S',
      'categories': [
            {'name': ''},
            {'name': 'Software', 'description': ''},
            {'name': 'Technology'},
            {'name': 'Mac OS X'},
            {'name': 'Politics'},
            {'name': 'Outdoors'},
            {'name': 'Pictures'},
       ]}
       
# weblogInfo dict can contain 'path' (string) or 'stream' (file-like object)
# as well as optional 'categories', 'guidFormat', and 'permaLinkFormat' members
# and, for blogBrowser archives, 'recent-file' and 'recent-max'.

# The permaLinkFormat, guidFormat and categories items are purely for the use
# of clients; rssweblog only provides accessors.

# The most important item is 'path', which can be a folder or a file.
# (Or, use 'stream' instead and assign it a file-like object.)

# When 'path' points to a folder, 'recent-file' and 'recent-max' are
# also used. If 'recent-max' is set to, say, 15, the last 15 posts
# in the blogBrowser archive are also saved to the file specified by
# 'recent-file', after any operation which modifies the archive.

testWeblog = rssweblog.WeblogFactory(weblogInfo)
count = testWeblog.readRSS(rssweblog.RSSPostIDChecker, (postid,), postid = postid)

# For this example, content is a variable containing a MetaWeblog post, 
# as passed directly from xmlrpclib. postid is the post id to change.
item = rssweblog.RSSItem(testWeblog)
item.setFromMetaWeblogFormat(content)
item.setModificationDate()
item.setPostID(postid)
success = testWeblog.modifyRSS(rssweblog.RSSEditor, (postid, item))
if not success:
    raise PostNotFoundError,         "The edited post not be saved, because the original post with"         " the specified ID could not be found."

Revision History


1.5.2 2003-07-16 Andrew Shearer

Prepend filename (minus dir path) to UnicodeError messages when parsing a
weblogArchive; xmllib sometimes throws UTF-8 errors and otherwise there's no
way to tell which file they came from.

1.5.1 2003-07-15 Andrew Shearer

Accept W3CDate, not just xmlrpclib.DateTime, in the MetaWeblog API's
dateCreated member. This allows the caller (the XML-RPC newPost handler)
to pass the current date with timezone and DST intact.

1.5   2003-07-07  Andrew Shearer

Read-only support for RSS 1.0, as well as RSS 2.0 in a namespace.
flNotOnHomePage support. Editing a post to remove all categories now
works. Renamed ISO8601Date to W3CDate; moved W3CDate and XMLFilter to
their own modules.