WordPress RSS Import
XML-compliant RSS 0.92, 1.0, 2.0 batch import and synchronization
This is an alternative to the RSS importer in WordPress 1.2, providing
several additional features.
What It Does
- It imports RSS files into your WordPress weblog.
- It handles all major RSS variants, including 0.92, 1.0, and 2.0.
- It imports single files from either your local drive or from a URL
you specify.
- It imports entire folder hierarchies of RSS files (blogBrowser-style:
one folder per year, one file per month),
making it a general-purpose weblog batch import tool using RSS
as the exchange format.
- It aggregates RSS feeds, if you point one or more copies of it at
feeds on the web and set it to run regularly. (Even when run frequently, it won’t import the
same item twice.) You can use this to maintain more than one WordPress
site that shares the same content, such as a test site and a production site.
- It handles time zones in a sophisticated way, preserving the
timezone offset so that each item can appear on your weblog under the
author’s original local time, while using GMT for all date comparisons.
- It respects and stores modification dates if given in the RSS file.
- If modification dates are given in the RSS file, it can optionally
import only new or changed posts, leaving posts alone that haven’t been
changed or that have been changed more recently on the local machine.
- Using the above feature and two copies of WordPress, it can synchronize
two or more weblogs, bidirectionally or multi-directionally. New and changed posts on any one weblog will automatically show up on the others.
- It complies with the XML specification, for correct behavior with XML
namespaces with arbitrary prefixes and CDATA sections in arbitrary locations,
both of which can trip up a regular-expression-based parser.
What It Doesn’t Do
- It doesn’t handle malformed XML. Because the standard
XML parser it uses accepts only well-formed XML, some invalid RSS files may be rejected. If your RSS files are not well-formed, you must use a regexp-based parser. In practice, since this script is intended to be used with RSS feeds over which you have control, this is not expected to be a significant limitation.
Note that the RSS file does not have to be strictly valid (according to the Feed Validator) to be parsed; most of the ways in which an RSS file could be invalid would still get past the much less stringent baseline test of XML well-formedness.
The script is called “Bootleg RSS Import”, for lack of a better term.
Downloads
Version 1.2a1
view |
download
Updated for
WordPress 1.2. Put this in your wp-admin folder. This version adds
support for time zones parsed out of the RSS file, in either RFC822 or
W3CDTF (ISO8601) formats. It also removes code for adding the
modification date field, since WordPress 1.2 already includes it, and
replaces all legacy use of addslashes() with mysql_escape_string().
Version 1.2a1 RSS exporter
view |
download
Replaces WordPress’s
RSS generator (wp-rss.php), adding support for modification dates and
time zones.
Version 1.0
view | download
For older versions
of WordPress. Works with WordPress 0.9, 1.0, and possibly 1.1 (not
tested). Optionally adds a modification date field to the WordPress
database if one is missing. This is the version I had originally
contributed to the WordPress project.
History
When I was evaluating WordPress 0.9, I needed to write an import
filter to handle my older posts. (For the personal and work sites I
maintain, I use a combination of WordPress and my own weblogging tool,
and still need to bridge the two. My own tool uses an archive folder of RSS
files as its native data format.)
I contributed it to the project before WordPress 1.0, but the the RSS
import feature that eventually appeared in WordPress 1.2 used a
different approach: a regexp-based parser, rather than a SAX-based one.
The regexp-based parser has the advantage of working with more types of
broken XML feeds, but comes with a cost in correctness in parsing valid
XML feeds.
So I continued to use this SAX-based tool. It also has some other
features I needed, including the ability to synchronize weblogs, parse
folders full of RSS files, parse modification dates, and parse and
preserve time zones.