Andrew Shearer: Home

Also see the list of articles, none to be taken seriously.

Announcing Migraine, a Drupal Site Migration Tool

Thanks to Noosphere Networks, I’m releasing a script that helps developers of web sites built with Drupal to maintain separate development/test and production sites, pushing changes from test to production as needed. This is challenging with a stock Drupal installation. Changes to PHP code are no problem, because it lives in the filesystem and can be copied or committed to a revision-control system like Subversion. But a lot of Drupal’s configuration work take place within its web administration interface and is saved to the database, where production content such as user accounts and comments is also stored.

The desire to do this frequently comes up on Drupal’s forums, and the typical workarounds have some large drawbacks (involving some combination of extended downtime on the production site, duplication of work, and the loss of content, comments, and user account changes made in the interim).

This small script attempts to solve that by categorizing Drupal’s tables and moving only the right ones at the right time, while handling details such as merging sequence numbers. It also dumps Drupal’s databases to disk in a format that works well for checkin to a revision control system.

This is free software, licensed under the GPL.

Theres a more ambitious project called AutoPilot that aims to do this and more in the future, but its ability to merge test sites into production without losing production content isn’t available yet, and I needed something now.

Be warned, though, that this is an alpha release, intended for those with familiarity with MySQL and Drupal’s table layout. If you have CCK fields, there may be some manual work required when you modify your field layout because CCK tends to change your database schema, and Migraine does not currently attempt to automate all of those changes. It will detect them and warn of the problem, however.

See more information at the Migraine project page.

Posted September 04, 2007 at 07:21 AM

Categories: Python, Software, Open Source, PHP, General

Read and Post Comments

Announcing fs2svn: make a Subversion repository from archive folders

fs2svn is a new, free, open-source tool that converts a bunch of archive folders into a Subversion repository.

If you’ve kept a series of historical snapshots of your work in folders, fs2svn can help you upgrade to a full-fledged version control system.

fs2svn goes through all the folders under a given parent folder (in filesystem order) and creates a Subversion revision for each one, backdated to the most recent file’s last modified date. The log message is set to the folder name.

Additions, changes, and deletions between one folder and the next are all recorded in the repository.

The input format is very simple. It only covers the mainline trunk, not any tags or branches (though tags for major versions could be manually created later, if your folder names carry enough information).

The format is so simple it could be used as a common intermediary. If you wanted to migrate a mainline trunk from some exotic version control system to Subversion, you could write a script to export it to regular folders, then use this script to import the result into Subversion.

See the main fs2svn page for information, examples, and to download.

Posted June 24, 2005 at 12:14 AM

Categories: Python, Software, Open Source, General

Read and Post Comments

Slowdown in Mac OS X & Python ftplib fixed

I had a problem where my scripted FTP uploads through ftplib in Python 2.3 would experience long (6 or 7-second) delays before transferring each file. Other FTP programs were fine, except for a similar delay on connect. It turned out to be an interaction with ftplib’s IPv6 support in Python 2.3 and the Mac OS X name resolver, and it finally appears to be fixed in the recently-released Mac OS X 10.3.8, which noted speed improvements in certain network applications.

In case the delay bites anyone else (or in case it’s not really fixed, and some other network change is just fooling me) here’s the workaround I’ve been using until now.

With IPv6 support in Python 2.3 / Mac OS X 10.3, ftplib’s ntransfer function now calls getaddrinfo for every single file tranferred, and the name resolver does a slow timeout each time. Making a local copy of ftplib and replacing the call to getaddrinfo with constants may be ugly, but it worked around the problem.

Original line (multi-second delay), at ftplib.py line 233:

af, socktype, proto, canon, sa = socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM)[0]

Changed line (assumes IPv4 addresses):

af, socktype, proto, canon, sa = (2, 1, 6, '', (host, port))

This change speeds up multi-file FTP transfers immensely (at least to my FTP server) under Mac OS X 10.3.0 through 10.3.7, but early results indicate it’s not necessary on 10.3.8.

Posted February 25, 2005 at 12:09 AM

Categories: Python, Mac OS X, Software, General

Read and Post Comments

Dupinator II

Bill Bumgarner’s useful Dupinator script, for removing duplicate files, recently hit Python-URL. However, it has a logic bug that end up deleting too many files.

If you have several sets of duplicates that happen to share the same file size, all but one of the sets will be wiped out completely. The problem is that within each group of files of identical size, there’s at most a single generated "duplicates" list. The first file on the list is spared; the rest are deleted.

The net effect, when I tested the script on a large corpus of text files, was the program reported it would delete many files that were clearly not identical. (I had commented out the os.remove call for testing.)

There was an additional problem with iPhoto: the posted script follows symbolic links. iPhoto stores its albums as collections of symbolic links, so all photos in albums are flagged as duplicates of the original photos. An islink() test fixes this.

Here’s a modified version of the script. It has only been lightly tested, though the changes did successfully eliminate the false positives. Uncomment the os.remove() line only when you are satisfied with the list of redundant files generated.

Minor optimizations: all files < = 1024 bytes go directly into the dupes list, not potentialDupes, since the whole file has already been checked. Also, Mac OS X’s pesky .DS_Store files are skipped.

(I haven’t heard back from Bill yet on incorporating the fixes into his code, so I’m posting here.)

View Source Code (dupinator.py)

Posted January 14, 2005 at 02:04 AM

Categories: Python, Mac OS X, Software, General

Read and Post Comments

RSSFilter

Now available: RSSFilter, an open source Python module for modifying RSS files and blogBrowser-format RSS archives in place. It builds on XMLFilter. (Speaking of which, thanks to Mark Pilgrim for its recent mention in his b-links.)

The module can also be used an RSS parser for valid XML feeds, though it trades in ultra-liberal parsing for its ability to safely modify files.

Operations such as inserting, modifying, or deleting a post are designed to cause minimal disruption to the rest of the file.

iPhoto 4 has comments no more

I bought the upgrade to the Apple’s iLife suite, released on Friday. Here’s a gotcha for developers who parse iPhoto’s AlbumData.xml file, though it doesn’t directly affect most users. It affects me, because my own code parses AlbumData.xml to generate my web-based photo albums (such as the England trip pictures I just posted).

Though the overall format of iPhoto’s XML file stays the same (and my script had no trouble reading it), the Comments and Date fields are gone! The Date field is renamed and in a different format, which is no problem to work around because the image file’s embedded EXIF data contains the date as well. The missing Comments field is a different story.

From my quick inspection, the comment data seems to be only stored in a newly introduced iPhoto.db file, which is in some binary format. The rationale for this is presumably performance, but that doesn’t completely make sense, since the photo title is still stored in the XML file and it may be changed just as often.

In any case, here’s a workaround that uses AppleScript to write a parallel folder structure holding just the comments, one per text file. Paste the following into a Script Editor window and run. Use this anytime you’d like to protect your comments from the vagaries of software or platform transitions or upgrades. (The parallel folder structure helps this; the script could have used iPhoto’s internal IDs and generated all the files in a single folder, but that wouldn’t have been as forward-compatible.) GPL-licensed.

-- Export iPhoto Comments
-- Creates a parallel folder structure to the iPhoto Library, with a file corresponding to each picture with a comment, containing just the comment.
-- Note: this does not remove files in the parallel folder when a comment disappears (due to deletion of either the comment or the image). To guard against this, you may want to delete the whole comment folder before rerunning this script. (Using a parallel folder structure rather than storing comment files alongside the image makes this easier; you can flush the whole cache at once.)
-- Written to work around the fact that iPhoto 4 no longer stores photo comments in the AlbumData.xml file.
-- For automatic folder creation, requires the BSD subsystem (which is installed by default).
-- by Andrew Shearer, 2004-01-18 <mailto:ashearerw at shearersoftware dot com>

-- config
set commentsFolderName to "iPhoto Library - My Comments Cache"
set stripJPG to false --whether to strip .JPG extension
set openFoldersInFinder to false
set commentFileSuffix to ".comment.txt"
set requiredAlbumPrefix to "Web-"
-- end config

set commonRootPath to POSIX path of (path to pictures folder)
set origFolderName to "iPhoto Library"
set origFolderPath to commonRootPath & origFolderName
set commentsFolderPath to commonRootPath & commentsFolderName

tell application "iPhoto"

repeat with a in (every album whose name starts with requiredAlbumPrefix)

repeat with p in (every photo of a whose comment is not "")

set imagePath to image path of p
set commentText to comment of p as Unicode text

if imagePath does not start with origFolderPath then

-- make sure image is inside iPhoto Library; otherwise we won't know where to put the comment file. This AppleScript comparison is case-insensitive.
error "Image does not appear to be inside iPhoto Library. Image path: \"" & imagePath & "\". Expected library path: \"" & origFolderPath & "\""

else

-- construct new path in parallel folder structure
set commentFilePath to commentsFolderPath & text from (1 + (length of origFolderPath)) to -1 of imagePath
if stripJPG then

-- strip .JPG suffix (optionally)
if commentFilePath does not end with ".JPG" then

error "Error: file does not end with .JPG: \"" & imagePath & "\""

end if

set commentFilePath to "" & text 1 through -5 of commentFilePath

end if
-- add suffix to comment filename (.txt extension, etc.)
set commentFilePath to commentFilePath & commentFileSuffix
-- create intermediate folders as necessary with mkdir shell command. Finder-based alternatives for this are awkward. +++ This code has not been checked for proper shell escaping, though it does at least enclose its arguments in double quotes.
do shell script "mkdir -p \"`dirname \"" & commentFilePath & "\"`\""
if openFoldersInFinder then do shell script "open \"`dirname \"" & commentFilePath & "\"`\""

-- rest of the code writes out the comment file
set commentFile to commentFilePath as POSIX file -- make into Mac filespec

set f to open for access commentFile with write permission
set eof f to 0 -- truncate file, or old data can remain
write commentText to f as Unicode text
close access f

end if

end repeat -- photos in album

end repeat -- albums

end tell

That’s the AppleScript code. The comments are now in a human-readable text format, and you could use a script such as this Python snippet to read a given picture’s comment:

commentCommonBaseDir = os.path.expanduser("~/Pictures/")
commentOrigDir = os.path.join(commentCommonBaseDir,
    "iPhoto Library")
commentParallelDir = os.path.join(commentCommonBaseDir, 
    "iPhoto Library - My Comments Cache")
commentFileSuffix = ".comment.txt"

def getCommentForFile(imagePath):
    if not imagePath.lower().startswith(commentOrigDir.lower()):
        raise ('Error: image does not appear to be in iPhoto Library; ' + 
           'cannot compute comment path. Image: "%s". Library: "%s".' ) \
           % (imagePath, commentOrigDir)
    commentPath = os.path.join(commentParallelDir, 
           imagePath[len(commentOrigDir)+1:]) + commentFileSuffix
    if os.path.isfile(commentPath):
        print "Read comment for " + imagePath
        return open(commentPath, 'r').read()
    return ''

Posted January 18, 2004 at 04:57 PM

Categories: Python, Pictures, Open Source, Mac OS X, Software

Read and Post Comments

The fastest way to resize images with Panther

To continue on the recent image resizing theme (probably of interest to Python scripters only), I made some changes as a result of upgrading to Panther last week. I wanted to use the new built-in Mac OS X version of Python 2.3 (plus the MacPython Extras from Jack Jansen—thanks, Jack!). But a problem with the initial Package Manger distribution of the Python Imaging Library made me look at a new Panther feature that let Python scripts use the native Quartz graphics library directly. (The hitch with PIL was that it was built to require a Fink install of libjpeg for full JPEG support. A quick compile of libjpeg and placement of it and its headers into Fink’s preferred locations didn’t work, and either installing Fink or compiling PIL from source would have taken a while.)

That was as good a reason as any to explore Panther’s new Quartz scripting feature. So I read what I could find on Quartz, and modified my photo album code to use Quartz if available. It still uses PIL to gather EXIF and size information, which works even without libjpeg, but then it uses Quartz to manipulate the actual image content.

The results were terrific, mostly. In real-world testing on an 800 MHz PowerBook G4, the PIL-only version spat out 8 JPEGs per minute, and the Quartz version spat out 65 JPEGs per minute. That’s a welcome improvement, especially when you multiply my typical batch of 100 photos by 3 sizes apiece.

The one problem is that I don’t yet know how to set the quality level. There’s a parameter that should contain this number, but as far as I can tell it isn’t documented anywhere. All of the supplied examples save as PNG or PDF, rather than JPEG, and the function isn’t documented along with the rest of Quartz because it’s not a real Quartz function—the release notes say that image export is actually handled through QuickTime. (This will be the first public mention in the history of the world, as far as Google is concerned, of the Core Graphics function that the API summary says it calls: CGBitmapContextWriteToFile. The last parameter, vaguely named “params” and defaulting to a zero-length string, is where a data structure including the quality level would obviously go.)

So for now it’s using a default JPEG quality level, which, whatever it is, is noticeably worse than the quality=90 setting I used with PIL, especially on thumbnails. Though I haven’t done a controlled side-by-side test, it seemed that lower quality levels resulted in some low-frequency blurriness, which looked much less objectionable than the high-frequency ringing (making macroblock boundaries visible) that PIL tended to show. It looked bad enough that I couldn’t really run PIL with anything below quality=90. And because of the lower quality setting, the file sizes on the Quartz side were half that of the PIL versions.

Here’s all the code the deals with Quartz in the new photo album. newImagesInfo holds a list of destination file paths and pre-calculated pixel dimensions.

def resizeImagesQuartz(origFilename, newImagesInfo):
    # newImagesInfo is a list of 
    # (newFilename, newWidth, newHeight) tuples
    if not newImagesInfo: return
    import CoreGraphics
    origImage = CoreGraphics.CGImageCreateWithJPEGDataProvider(
        CoreGraphics.CGDataProviderCreateWithFilename(origFilename),
        [0,1,0,1,0,1], 1, CoreGraphics.kCGRenderingIntentDefault) 
    for newFilename, newWidth, newHeight in newImagesInfo:
        print "Resizing image with Quartz: ", newFilename, \
            newWidth, newHeight
        cs = CoreGraphics.CGColorSpaceCreateDeviceRGB()
        c = CoreGraphics.CGBitmapContextCreateWithColor(
            newWidth, newHeight, cs, (0,0,0,0))
        c.setInterpolationQuality(CoreGraphics.kCGInterpolationHigh)
        newRect = CoreGraphics.CGRectMake(0, 0, newWidth, newHeight)
        c.drawImage(newRect, origImage)
        c.writeToFile(newFilename, CoreGraphics.kCGImageFormatJPEG)
            # final params parameter?

If you’re on a Panther machine with the Developer Tools installed, you can find the examples I started with in:

/Developer/Examples/Quartz/Python/

Seems obvious where they would be in retrospect. Thanks to the folks on the MacPython channel in iChat for pointing me to them.

Posted November 08, 2003 at 04:04 PM

Categories: Python, Mac OS X, Software, General

Read and Post Comments

“As Seen On xml.com”

My XMLFilter package was mentioned in Uche Ogbuji’s latest Python XML article on xml.com:

XMLFilter is one of those great examples of a unglamorous but extremely valuable program. Based on its description (and I expect to try it out and report on it in this column soon), it is a must-have for anyone building SAX programs. It provides a fallback SAX parser/driver to avoid SAXReaderNotAvailable errors that users encounter on some platforms. It also offers a safety net against the XMLGenerator bug that bit me earlier in this series. Its main feature, however, is a framework for SAX filters. See Andrew Shearer’s announcement.

Thanks, Uche!

Posted October 17, 2003 at 12:12 AM

Categories: Python, Software, General

Read and Post Comments

Pictures. Now 100% Bigger!

A few days ago, I made changes to my photo album software. Now all current and past photo albums have an optional “large” size with double the pixel count, preserving more detail for users with large screens.

(There are also some other minor improvements, such as a photo count for each album, links to the next and previous albums by date, and more links to related sites.)

Posted October 17, 2003 at 12:12 AM

Categories: Python, Pictures, Software, General

Read and Post Comments

Image Size Reduction for the Web

Tim Bray is looking for a better way to post photos to his web site. To judge from the sample photo, his current method doesn’t antialias the image, so sharp edges in the original look jagged when reduced in size.

I went through the same thing with iPhoto, which has an HTML Export feature that is similarly broken—it doesn’t antialias at all. It’s a strange limitation, considering that the Mac OS X graphics system has fast, high-quality antialiasing everywhere else, including fonts and Dock icons. It’s as if Apple turned off a global switch in iPhoto for better performance when displaying large number of images onscreen, but forgot to turn it back on for HTML exporting, where quality should count for much more.

In any case, the quality of iPhoto’s exports was poor, so I wrote a Python script to handle the export using the Python Imaging Library. (Contact me if you’d like the code. So far, I’ve publicly released only the general-purpose plist parser that I wrote to handle the AlbumData.xml file.)

The script reads the titles and comments assigned in iPhoto, and parses them for category and other tagging information I’ve appended to the comments. Then it generates date-based and category-based HTML page hierarchies for all the albums whose names start with "Web-", and generates any thumbnails or medium-sized images that are missing.

The Python Imaging Library, or PIL, is very easy to install with MacPython 2.3’s Package Manager.

There are some drawbacks, though:

I had to push the JPEG quality setting very high to avoid obvious macro-blocking (squares showing up around detailed areas), and pushing the quality any higher caused PIL to fail by throwing an exception.
The BICUBIC setting for image reduction didn’t appear to work at all. The image ended up non-antialiased, the same as Photoshop’s "Nearest Neighbor" setting. Only ANTIALIASED had any effect. This may result in bilinear instead of bicubic interpolation, but the documentation isn’t clear.
The Thumbnail setting produces an image quickly, but they are very low-quality.
The Progressive setting for JPEGs seemed to cause even more exceptions when trying to save at high quality levels, so I was forced not to use it.
It’s not nearly as fast as Mac OS X’s Core Graphics image reduction. But then again, I wouldn’t expect it to be.

On the positive side, the antialiasing looks good, and PIL can also read embedded EXIF data. Images that I’ve tagged as deserving more info automatically get the aperture and shutter speed printed on the page.

The code for actually reducing and saving the image, ignoring the EXIF and album manipulations for now, is as simple as this:

if not os.path.exists(newPath):
    shrunkImage = im.resize(size, resample = PIL.Image.ANTIALIAS)
    shrunkImage.save(newPath, 'JPEG', quality = 90)

You can see samples in my Pictures section. Check out the first batch of Providence photos for some night examples with shutter speeds and apertures shown, and the Providence and Boston kayaking photos for examples of pictures with lots of edges that would have looked much worse without antialiasing.

Posted October 09, 2003 at 12:12 AM

Categories: Python, Mac OS X, Software, Open Source, General

Read and Post Comments