Also see the list of articles, none to be taken seriously.
I had a problem where my scripted FTP uploads through ftplib in Python 2.3 would experience long (6 or 7-second) delays before transferring each file. Other FTP programs were fine, except for a similar delay on connect. It turned out to be an interaction with ftplib’s IPv6 support in Python 2.3 and the Mac OS X name resolver, and it finally appears to be fixed in the recently-released Mac OS X 10.3.8, which noted speed improvements in certain network applications.
In case the delay bites anyone else (or in case it’s not really fixed, and some other network change is just fooling me) here’s the workaround I’ve been using until now.
With IPv6 support in Python 2.3 / Mac OS X 10.3, ftplib’s ntransfer function now calls getaddrinfo for every single file tranferred, and the name resolver does a slow timeout each time. Making a local copy of ftplib and replacing the call to getaddrinfo with constants may be ugly, but it worked around the problem.
Original line (multi-second delay), at ftplib.py line 233:
af, socktype, proto, canon, sa = socket.getaddrinfo(host, port, 0, socket.SOCK_STREAM)[0]
Changed line (assumes IPv4 addresses):
af, socktype, proto, canon, sa = (2, 1, 6, '', (host, port))
This change speeds up multi-file FTP transfers immensely (at least to my FTP server) under Mac OS X 10.3.0 through 10.3.7, but early results indicate it’s not necessary on 10.3.8.
Bill Bumgarner’s useful Dupinator script, for removing duplicate files, recently hit Python-URL. However, it has a logic bug that end up deleting too many files.
If you have several sets of duplicates that happen to share the same file size, all but one of the sets will be wiped out completely. The problem is that within each group of files of identical size, there’s at most a single generated "duplicates" list. The first file on the list is spared; the rest are deleted.
The net effect, when I tested the script on a large corpus of text files, was the program reported it would delete many files that were clearly not identical. (I had commented out the os.remove call for testing.)
There was an additional problem with iPhoto: the posted script follows symbolic links. iPhoto stores its albums as collections of symbolic links, so all photos in albums are flagged as duplicates of the original photos. An islink() test fixes this.
Here’s a modified version of the script. It has only been lightly tested, though the changes did successfully eliminate the false positives. Uncomment the os.remove() line only when you are satisfied with the list of redundant files generated.
Minor optimizations: all files < = 1024 bytes go directly into the dupes list, not potentialDupes, since the whole file has already been checked. Also, Mac OS X’s pesky .DS_Store files are skipped.
(I haven’t heard back from Bill yet on incorporating the fixes into his code, so I’m posting here.)
And, like clockwork, here comes the latest Windows vulnerability:
Internet Explorer Carved Up By Zero-Day Hole:
“Two new vulnerabilities have been discovered in Internet Explorer which allow a complete bypass of security and provide system access to a computer, including the installation of files on someone’s hard disk without their knowledge, through a single click.
Worse, the holes have been discovered from analysis of an existing link on the Internet and a fully functional demonstration of the exploit have been produced and been shown to affect even fully patched versions of Explorer.
It has been rated ‘extremely critical’ by security company Secunia, and the only advice is to disable Active Scripting support for all but trusted websites.”
The article goes on to say that the code exploits three holes in Internet Explorer for Windows, including one that has been known since August 2003, and there’s no patch available for any of them. (You could turn off Active Scripting, which breaks functionality on many sites, or stop browsing web sites you don’t trust completely. If that’s not acceptable, you have to switch another browser such as Mozilla, or switch to a Mac.)
Here’s a way to back up iPhoto’s image comments into an easy-to-read flat directory structure. (Translation: one big folder.) You’d want to do this when archiving your photos to CD or DVD, or when trying to merge photo libraries, or when leaving iPhoto for another program, or at any other time you want your comments saved in a non-proprietary, easily readable format.
As you may have read last week, when I upgraded to iPhoto 4, all the image descriptions temporarily disappeared from my online photo albums. (I caught the problem on my own staging server before it appeared on this site.) The culprit was a change in the way iPhoto stores photo comments. Comments are now entirely gone from the easy-to-parse AlbumData.xml file; iPhoto now stores them in a binary format that appears to be proprietary.
AppleScript to the rescue. Last week’s script saved the comments to text files and generated a directory structure that exactly paralleled iPhoto’s library, with one text file for each comment. These files were in folders for each day, which were in turn inside folders for each month, etc., guaranteeing there would be no name conflicts. I had rejected using the internal ID of each picture (which would have allowed a flat conflict-free directory structure) because the ID wasn’t user-visible anywhere in the iPhoto interface, making comment files named for the ID difficult to map back to the original pictures.
One of the comments on that post asked for a version that generated the comment files in one folder, based on the image’s filename. That was a good idea. Though the filename is not guaranteed to be unique, it often is in practice. Most digital cameras save unique serial numbers for each picture as part of the filename. So this is enough for most people. (The exceptions would be if you have more than one digital camera using a similar naming convention, or if your camera is configured to reset its numbering between rolls.)
If you like guaranteed accuracy, use my original script; if you like simplicity, use the following alternate script. It will only save one of the conflicting comments if photo filenames are duplicated. Dropping the parallel folder structure simplified the script, since this version doesn’t need to employ any POSIX path manipulation.
Copy the following into Script Editor and run. Tested with iPhoto 4.0 on Mac OS X 10.3. (It may also work with earlier versions; drop me a comment below if you’ve tried it.)
I bought the upgrade to the Apple’s iLife suite, released on Friday. Here’s a gotcha for developers who parse iPhoto’s AlbumData.xml file, though it doesn’t directly affect most users. It affects me, because my own code parses AlbumData.xml to generate my web-based photo albums (such as the England trip pictures I just posted).
Though the overall format of iPhoto’s XML file stays the same (and my script had no trouble reading it), the Comments and Date fields are gone! The Date field is renamed and in a different format, which is no problem to work around because the image file’s embedded EXIF data contains the date as well. The missing Comments field is a different story.
From my quick inspection, the comment data seems to be only stored in a newly introduced iPhoto.db file, which is in some binary format. The rationale for this is presumably performance, but that doesn’t completely make sense, since the photo title is still stored in the XML file and it may be changed just as often.
In any case, here’s a workaround that uses AppleScript to write a parallel folder structure holding just the comments, one per text file. Paste the following into a Script Editor window and run. Use this anytime you’d like to protect your comments from the vagaries of software or platform transitions or upgrades. (The parallel folder structure helps this; the script could have used iPhoto’s internal IDs and generated all the files in a single folder, but that wouldn’t have been as forward-compatible.) GPL-licensed.
commentCommonBaseDir = os.path.expanduser("~/Pictures/") commentOrigDir = os.path.join(commentCommonBaseDir, "iPhoto Library") commentParallelDir = os.path.join(commentCommonBaseDir, "iPhoto Library - My Comments Cache") commentFileSuffix = ".comment.txt" def getCommentForFile(imagePath): if not imagePath.lower().startswith(commentOrigDir.lower()): raise ('Error: image does not appear to be in iPhoto Library; ' + 'cannot compute comment path. Image: "%s". Library: "%s".' ) \ % (imagePath, commentOrigDir) commentPath = os.path.join(commentParallelDir, imagePath[len(commentOrigDir)+1:]) + commentFileSuffix if os.path.isfile(commentPath): print "Read comment for " + imagePath return open(commentPath, 'r').read() return ''
To continue on the recent image resizing theme (probably of interest to Python scripters only), I made some changes as a result of upgrading to Panther last week. I wanted to use the new built-in Mac OS X version of Python 2.3 (plus the MacPython Extras from Jack Jansen—thanks, Jack!). But a problem with the initial Package Manger distribution of the Python Imaging Library made me look at a new Panther feature that let Python scripts use the native Quartz graphics library directly. (The hitch with PIL was that it was built to require a Fink install of libjpeg for full JPEG support. A quick compile of libjpeg and placement of it and its headers into Fink’s preferred locations didn’t work, and either installing Fink or compiling PIL from source would have taken a while.)
That was as good a reason as any to explore Panther’s new Quartz scripting feature. So I read what I could find on Quartz, and modified my photo album code to use Quartz if available. It still uses PIL to gather EXIF and size information, which works even without libjpeg, but then it uses Quartz to manipulate the actual image content.
The results were terrific, mostly. In real-world testing on an 800 MHz PowerBook G4, the PIL-only version spat out 8 JPEGs per minute, and the Quartz version spat out 65 JPEGs per minute. That’s a welcome improvement, especially when you multiply my typical batch of 100 photos by 3 sizes apiece.
The one problem is that I don’t yet know how to set the quality level. There’s a parameter that should contain this number, but as far as I can tell it isn’t documented anywhere. All of the supplied examples save as PNG or PDF, rather than JPEG, and the function isn’t documented along with the rest of Quartz because it’s not a real Quartz function—the release notes say that image export is actually handled through QuickTime. (This will be the first public mention in the history of the world, as far as Google is concerned, of the Core Graphics function that the API summary says it calls: CGBitmapContextWriteToFile. The last parameter, vaguely named “params” and defaulting to a zero-length string, is where a data structure including the quality level would obviously go.)
So for now it’s using a default JPEG quality level, which, whatever it is, is noticeably worse than the quality=90 setting I used with PIL, especially on thumbnails. Though I haven’t done a controlled side-by-side test, it seemed that lower quality levels resulted in some low-frequency blurriness, which looked much less objectionable than the high-frequency ringing (making macroblock boundaries visible) that PIL tended to show. It looked bad enough that I couldn’t really run PIL with anything below quality=90. And because of the lower quality setting, the file sizes on the Quartz side were half that of the PIL versions.
Here’s all the code the deals with Quartz in the new photo album. newImagesInfo holds a list of destination file paths and pre-calculated pixel dimensions.
def resizeImagesQuartz(origFilename, newImagesInfo): # newImagesInfo is a list of # (newFilename, newWidth, newHeight) tuples if not newImagesInfo: return import CoreGraphics origImage = CoreGraphics.CGImageCreateWithJPEGDataProvider( CoreGraphics.CGDataProviderCreateWithFilename(origFilename), [0,1,0,1,0,1], 1, CoreGraphics.kCGRenderingIntentDefault) for newFilename, newWidth, newHeight in newImagesInfo: print "Resizing image with Quartz: ", newFilename, \ newWidth, newHeight cs = CoreGraphics.CGColorSpaceCreateDeviceRGB() c = CoreGraphics.CGBitmapContextCreateWithColor( newWidth, newHeight, cs, (0,0,0,0)) c.setInterpolationQuality(CoreGraphics.kCGInterpolationHigh) newRect = CoreGraphics.CGRectMake(0, 0, newWidth, newHeight) c.drawImage(newRect, origImage) c.writeToFile(newFilename, CoreGraphics.kCGImageFormatJPEG) # final params parameter?
If you’re on a Panther machine with the Developer Tools installed, you can find the examples I started with in:
/Developer/Examples/Quartz/Python/
Seems obvious where they would be in retrospect. Thanks to the folks on the MacPython channel in iChat for pointing me to them.
Tim Bray is looking for a better way to post photos to his web site. To judge from the sample photo, his current method doesn’t antialias the image, so sharp edges in the original look jagged when reduced in size.
I went through the same thing with iPhoto, which has an HTML Export feature that is similarly broken—it doesn’t antialias at all. It’s a strange limitation, considering that the Mac OS X graphics system has fast, high-quality antialiasing everywhere else, including fonts and Dock icons. It’s as if Apple turned off a global switch in iPhoto for better performance when displaying large number of images onscreen, but forgot to turn it back on for HTML exporting, where quality should count for much more.
In any case, the quality of iPhoto’s exports was poor, so I wrote a Python script to handle the export using the Python Imaging Library. (Contact me if you’d like the code. So far, I’ve publicly released only the general-purpose plist parser that I wrote to handle the AlbumData.xml file.)
The script reads the titles and comments assigned in iPhoto, and parses them for category and other tagging information I’ve appended to the comments. Then it generates date-based and category-based HTML page hierarchies for all the albums whose names start with "Web-", and generates any thumbnails or medium-sized images that are missing.
The Python Imaging Library, or PIL, is very easy to install with MacPython 2.3’s Package Manager.
There are some drawbacks, though:
On the positive side, the antialiasing looks good, and PIL can also read embedded EXIF data. Images that I’ve tagged as deserving more info automatically get the aperture and shutter speed printed on the page.
The code for actually reducing and saving the image, ignoring the EXIF and album manipulations for now, is as simple as this:
if not os.path.exists(newPath): shrunkImage = im.resize(size, resample = PIL.Image.ANTIALIAS) shrunkImage.save(newPath, 'JPEG', quality = 90)
You can see samples in my Pictures section. Check out the first batch of Providence photos for some night examples with shutter speeds and apertures shown, and the Providence and Boston kayaking photos for examples of pictures with lots of edges that would have looked much worse without antialiasing.
I just released version 1.1 of XMLFilter, which marks the first public standalone release. XMLFilter is an open-source Python module you can include with your programs to provide XML parsing even if the target system lacks a working xml.sax package. You can use it to quickly adapt existing xml.sax-compatible scripts to work out of the box, for example, on Jaguar (Mac OS X 10.2), which lacks expat.
It works by using the older xmllib module as a fallback for xml.sax. A test suite verifies call-by-call compatibility no matter which module ends up being used.
Other features include XML event-stream filtering, writing, and creation, with support for writing CDATA sections. (Using these classes also avoids bugs in some versions of xml.sax.)
Generally, the newer your version of Python, the faster it goes. For example, if xml.sax and expat are working, they give a factor-of-3 speedup over the pure-Python xmllib, and on Python 2.3, Unicode encoding conversions will use xmlcharrefreplace for faster writing of XML numeric entities.
Python-licensed. Tested all the way down to Python 1.5.2 and up to Python 2.3. xml.sax-compatible, Unicode-savvy (wherever Python is), and optionally namespace-aware.