Also see the list of articles, none to be taken seriously.

I’m giving WordPress a spin, replacing my own experimental statically-generated weblog publishing tool. The homegrown system worked well, but I wanted to add more dynamic features such as comments and trackbacks, and there’s so much other work going on with weblogging tools that it wasn’t a good use of time to implement those myself.

So I made some changes to WordPress to make it fit my publishing system, all of which are to be contributed back to the project.

  1. There is now an importer for RSS files and blogBrowser-style archives of RSS feeds (one folder per year, one RSS file per month). This allowed me to import entries from my own weblogging tool, which uses blogBrowser archives as its native data format. The importer supports multiple categories as well as modification dates, which brings me to:
  2. Modification dates. They’re now stored per post, and exported in the RSS feed, using a dcterms:modified element.
  3. UTF-8 Unicode storage of all posts. Minimal changes to the code; it mainly sets the right HTTP headers on the edit and post display pages and passes through the results without messing it up. Though PHP is still ISO-8859-1-centric in its string handling (and, the mbstring module aside, some of the other Unicode "support" just ends up replacing characters outside this set with question marks), it passes UTF-8 through fine.
  4. A few minor fixes (permalink generation from a custom format now works when the weblog is hosted somewhere below the root of the site; it’s possible to have a post in no categories).
  5. Multi-weblog synchronization. This is a consequence of the RSS import and modification date features. I can tie my production and staging servers together and update posts in both directions. (I’m just starting to test this out.)
Read and Post Comments

Tim Bray is looking for a better way to post photos to his web site. To judge from the sample photo, his current method doesn’t antialias the image, so sharp edges in the original look jagged when reduced in size.

I went through the same thing with iPhoto, which has an HTML Export feature that is similarly broken—it doesn’t antialias at all. It’s a strange limitation, considering that the Mac OS X graphics system has fast, high-quality antialiasing everywhere else, including fonts and Dock icons. It’s as if Apple turned off a global switch in iPhoto for better performance when displaying large number of images onscreen, but forgot to turn it back on for HTML exporting, where quality should count for much more.

In any case, the quality of iPhoto’s exports was poor, so I wrote a Python script to handle the export using the Python Imaging Library. (Contact me if you’d like the code. So far, I’ve publicly released only the general-purpose plist parser that I wrote to handle the AlbumData.xml file.)

The script reads the titles and comments assigned in iPhoto, and parses them for category and other tagging information I’ve appended to the comments. Then it generates date-based and category-based HTML page hierarchies for all the albums whose names start with "Web-", and generates any thumbnails or medium-sized images that are missing.

The Python Imaging Library, or PIL, is very easy to install with MacPython 2.3’s Package Manager.

There are some drawbacks, though:

  • I had to push the JPEG quality setting very high to avoid obvious macro-blocking (squares showing up around detailed areas), and pushing the quality any higher caused PIL to fail by throwing an exception.
  • The BICUBIC setting for image reduction didn’t appear to work at all. The image ended up non-antialiased, the same as Photoshop’s "Nearest Neighbor" setting. Only ANTIALIASED had any effect. This may result in bilinear instead of bicubic interpolation, but the documentation isn’t clear.
  • The Thumbnail setting produces an image quickly, but they are very low-quality.
  • The Progressive setting for JPEGs seemed to cause even more exceptions when trying to save at high quality levels, so I was forced not to use it.
  • It’s not nearly as fast as Mac OS X’s Core Graphics image reduction. But then again, I wouldn’t expect it to be.

On the positive side, the antialiasing looks good, and PIL can also read embedded EXIF data. Images that I’ve tagged as deserving more info automatically get the aperture and shutter speed printed on the page.

The code for actually reducing and saving the image, ignoring the EXIF and album manipulations for now, is as simple as this:

if not os.path.exists(newPath):
    shrunkImage = im.resize(size, resample = PIL.Image.ANTIALIAS)
    shrunkImage.save(newPath, 'JPEG', quality = 90)

You can see samples in my Pictures section. Check out the first batch of Providence photos for some night examples with shutter speeds and apertures shown, and the Providence and Boston kayaking photos for examples of pictures with lots of edges that would have looked much worse without antialiasing.

Read and Post Comments

CNN: Microsoft to pay AOL $750M. Tech titans settle Netscape lawsuit, set seven-year licensing pact for AOL to use Internet Explorer.

CNET: Microsoft to abandon standalone IE.

So, AOL can now install IE with their product for no charge, just as MS terminates development of the installable IE. Great deal.

Jeffrey Zeldman has some great analysis.

Aside: does this mean AOL must use IE? From initial reports, the answer appears to be no. The alternative browser would just have to be free to AOL (or strategically valuable enough to justify its cost). However, AOL’s track record has it bundling IE for Windows even without a special agreement, even as it owned Netscape.

Presumably the termination of IE development means that users can only receive major browser updates by buying new versions of Windows. For the majority of users, who are likely to stick with their current OS version for a while, IE 6 SP1 is the end of the road. Its slow march toward compliance with CSS and other standards can go no further, no new web technologies will be added, and no more bugs will be fixed.

This is a huge problem for web programmers and designers. A large majority of web surfers—those using IE on probably all versions of Windows before Longhorn (scheduled for 2005)—have just had their browser orphaned, with no simple upgrade path. With all its warts, it’s going to stick around for a long time.

In other words, Internet Explorer 6 has been Netscape 4-ed.

Whopper of the day (from the CNET article): "Legacy OSes have reached their zenith with the addition of IE 6 SP1," [IE program manager] Countryman said. “Further improvements to IE will require enhancements to the underlying OS.” Is he trying make us believe that bug fixes, CSS3, XForms, etc., are impossible without a new operating system, due to some technical limitation? Maybe the quote was meant to look like a statement of technical possibility, while it was really a marketing dictum. As in: for the users to get further improvements in IE, they must first buy and install an updated OS. (Because we want it that way.)

Tim Bray, wresting with page layout in IE, puts it more strongly:

The problem isn’t that CSS is too hard. The problem isn’t browser incompatibilities in general. The problem is specifically that Microsoft Internet Explorer is a mouldering, out-of-date, amateurish, out-of-date pile of dung. Did I say it’s out-of-date? As in past its sell-by, seen better days, mutton dressed as lamb, superannuated, time-worn. It’s so, like, you know, so twentieth-century.

Ron Green raised the alarm, which echoed through Scripting News and then around the usual hallways: “All this has lead me to ask if IE is dead.”

Firebird [mozilla.org] and Safari [apple.com] are looking really good right now.

Read and Post Comments

Excellent keynote. He started with a simple, obvious thing which we tend to get wrong because we’re blind to it: weblog item doctitles that show up properly in search engines. Then a bunch of specific things we can implement, and a look toward the future. Good, practical stuff.

Talked about the content side of content management. Importance of titles and topic sentences. Communication skills. Don’t hit.

Content is the expression of ideas, request for attention, or attempt to influence. Technologists don’t think hard enough about the effort & the reward of making content.

Showed an entry on Don Box's site that displayed its title perfectly in his aggregator NetNewsWire, but Google didn't see it, because it wasn't in the doctitle. Easy to make this mistake. (Reiterated point: Publishing is essentially engineering. We forget these issues because engineers think from the inside out.) What is the right unit of content? Radio Userland has the day’s posts on one page, with the date as doctitle; Moveable Type one per page, so it can use the item's RSS title. Dave Winer's weblog comes in like an IV drip all day, but the audience for most weblogs isn't like that, and they need titles.

This affects how John Udell uses Radio Userland. Dave Winer interjected to ask if it would help to have a field to choose the day's title.

Brent’s Law of URLs: the more expensive the CMS, the crappier the URL. Showed a bunch of typical CMS & welogging system URLs. Tim Bray’s homegrown site was best: example ended with 2002/02/13/NamingFinishing. Vignette’s > $200K product was worst with an awful, long numeric URL.

Structure in doctitles. Search results pages can parse & group the titles. Example: with doctitle like Magazine Name | Date | Dept | title, group search results by magazine issue. Showed good example of this on O'Reilly's site.

Great example of broken titles in just about every mailing list archive. All the titles are wrong—they are the same as the last message in the thread. Not scannable. Showed a mockup with meaningful titles.

A few of the examples had the common thread of repetition of data in the user interface. Search results kept repeating the site name in document titles. Discussion board forums kept repeating the same subject lines. The mailing list example he showed was pretty much wall-to-wall repetition of the same thing. Only difference between successive lines was indentation and author name. A better interface would strip it all out, summarize, whatever. I've run into all the things he mentioned and just gotten used to them. I have to look at them with new eyes.

Call to implement ThreadsML.

Discussion of SlideML. Showed his method of generating it, but it isn't usable by “civilians”. No help in writing the actual content apart from typing raw XHTML in Emacs.

CMS systems came from publishing & were ported to web. Weblogs are web-first.

Hypertextual writing is still stuck in 1995. Netscape did as much or more than wer're doing today in 1996. We need lightweight web-aware writing tool. Need to advance beyond emacs, TEXTAREAs or the shoddy Windows DHTML edit control. InfoPath still relies on crummy XHTML editor.

Compound documents: tend to explode to meaningless names because the system has to add them (e.g. slide027.html). Discussion of old Netscape cid: protocol.

CMSs solve refactoring problems “in the large”: making consistent changes to many files, access, etc. Refactoring “in the small” suck up a huge amount of time: reformatting email messages, etc.

Categorization is a heavyweight operation; there should be other lightweight ad-hoc ways. Example: All Consuming book aggregator finds book references in blogs.

Showed example of searching his SlideML markup with XPath for code examples.

Update: Here are the slides and notes from Bitflux: part one, part two.

Read and Post Comments

10 Best Features from Commercial CMS

Browser-based image editing, pre-localized interfaces

Extra credit: In-context editing (Edit This Page), dependency reporting, semblance of autoclassification, relational viewing tools

Reporting: such as Never Logged In

Configurable, forms-based workflow (ingest Visio WFML?)

508/WA compliant output — accessibility. Table headings + row headings, alts, etc.

Browser-based content object development (schema, essentially)

OpenCourse educational site. opencourse.org. “It rhymes with open source!” (The presenter avoided saying this, but I'm sure he wanted to.) Slow-moving.

Dublin Core Metadata in CMS

On oscom.org presentation slide show, different DC formats for XHTML, HTML, RDF XML are linked.

Good reference impl.: DC-dot. Another: Reggie

Elements (such as DC.Subject.Keyword) appearing multiple times, yes. Comma-separated value lists, no.

Discussion on thesauri, search engines, etc. Overall, I didn't get a huge amount out of this session, at least not directly. I'll have to find the references impls online.

Read and Post Comments

Provides a standard way to place content on a web server, with metadata, file locking, versioning. Also can decouple filesystem layout from author's view. Uses HTTP for all logins, so no need to create full user accounts.

Very few clients support metadata so far. Cadaver does, but cmd-line based. Kcera? KExplorer? support properties.

To check out: Joe Orton's sitecopy. Twingle.

WebDAV for filesharing tested lighter than SMB on network traffic.

Question on ranged PUTs. WebDAV and mod_dav support it, but some servers don't. The Mac OS X WebDAV client can't use ranged PUTs for this reason, or it would risk replacing the entire file with the tiny part that was changed. They're working toward some kind of solution.

Servers include Apache mod_dav (which the speaker wrote) and Zope, Tomcat. Jakarta Slide requires a lot of work to connect its memory-based store to something. Can even handle WebDAV with CGI except for OPTIONS method.

Subversion supports DeltaV WebDAV. You can mount & copy files from vanilla Windows & Mac OS X. But you can't modify them, because the client don't support DeltaV. (There is an experimental "autoversion" plugin to server to allow this.)

Extensions: ACL. Remote management of ACLs; close to RFC status. DASL (DAV Searching & Locating). Yet another query language. Further off.

MS WebDAV does a little check for FrontPage first, but is pretty much straight WebDAV otherwise.

My question: best/simplest route to implement a change trigger for a WebDAV server, so I could run a script? Can I plug in easily to any of the existing servers?

A. Zope supports WebDAV and is programmable. It uses its own data store, though, not the filesystem. So the whole system would have to use Zope.

Best answer. Could look at logs / an Apache filter to implement change response. Great idea.

Alternative: Author of FS watch & notify utils suggested those. They only run on Unixes, though. (I need Windows support, so I could look into NT's APIs for filesystem notification too.)

Read and Post Comments

Dave Winer (introduced as "King of the Blogging World") said that was a great introduction, and he didn't agree with anything in it. Call to open source & commercial software worlds to work with each other. Speaking as a commercial developers who has also released open source.

Q: "Proprietary" label used to be sold as a good word. Open source just used it to differentiate themselves.

"40-person company" is what he recommends would be best for customers. 2-3 people doesn't cut it. But those 40-person companies don't exist anymore. Users look at Unix-style OS and think it must be very difficult to write. But it's actually much harder to write software that's easy to use, while users won't recognize its complexity.

Halley Suitt: Is she missing the marketing for open source? What does Linux look like? There's something with a penguin. Someone helpfully brought up his laptop and opened it for her. "My Linux virginity is gone," she announced.

Internet Explorer: users are stranded. Has a development team, but they don't fix the bugs.

XML-RPC: Dave did design in 2 weeks, met with Don Box et al once. Secret of success: not overloaded with complexity. Extra features were aggressively not included. Has not changed since 1999.

Audience member disputed the assertion that there were no 40-person software firms. Many CMS packages (shrinkwrapped) come from such companies.

What audience member wants: to be able to fix software. Even if developer goes bankrupt. Dave: What you want is not to be locked in. You want open file formats. Another audience member: retraining is high part of switching cost, not data conversion. Q: Source code escrow?

Q: With IE, doesn't want to be stranded. His weblog won't display properly in IE, and he can't fix it. Dave: Source code for IE should have been put in escrow and released already, because they're not working on it. He had strongly suggested that as a remedy in the MS antitrust trial.

Movivations for Open-Source Developers essay. To do: find link; it scrolled off my NetNewsWire aggregator before I read it.

Q: Audience member complained that Radio Userland has support issues, documentation issues.

Dave: They all do! There's no money in software! It's $39.95; that doesn't pay for a lot of support.

Sound bite about personally not liking Bill Gates or Richard Stallman. Neither of them take baths. This is quoted more accurately elsewhere.

Discussion of unifying variants of RSS.

And here we come to the climactic faceoff of the keynote. Apparently Dave Winer & Bill Kearney have never met in person before. I'll let the record speak for itself (search the web for both their names), but if you've ever seen their online mailing list discussions, you'd expect a matter vs. antimatter reaction if ever they were to meet.

Bill Kearney: I'm Bill Kearney, from Syndic8.

Dave: (no particular reaction) What's Syndic8?

Bill: (explains, happening to mention again that he's Bill Kearney)

Dave: Oh, you're Bill Kearney. My God.

[Bill starts talking about "democracy, rather than benevolent dictatorship"; discussion degenerates into shouting & swearing. Elapsed time: about 15 seconds. The play-by-play doesn't really matter, but if you want one, see Aaron's weblog. After the OSCOM organizer Charlie steps in after a few minutes, Dave is too rattled to move on and ends the session.]

I didn't get to ask my question.

Read and Post Comments

To come.

Read and Post Comments

Interesting panel discussion.

#1 - Sleepycat CEO
#2 - Lisa ?; lawyer
#3 - Aaron Swartz
#4 - Larry Rosen

Open Source (free because it's useful, strategic) vs. Free Software (everything should be free) vantage points.

Q. Creative commons vs. source license? Larry Rosen: Courts have confused the issue of software IP by applying both patents and copyright to it. [I'd wondered about this problem; software is kind of in the middle of both and neither is quite right.]

Q. W3C DTD & Schema copyrightable? W3C says yes. But would content using that schema be copyrighted by the W3C? Lisa: Functionality/methods can't be covered by copyright. --maybe that applies to this case.

OpenOffice person in audience. Teddy Ruxpin case—successful contributory copyright lawsuit. Bootleg cassettes made Ruxpin tell different stories, make different movements.

Q on "Infected" code (could open source contain stealth IP)? Topical; SCO lawsuit.

Aggregation. Aaron: It's obviously illegal to put scraped feed contents on your page without attribution, obviously legal to write a tool that scrapes to generate feeds. Dave Winer: case of someone who didn't know RSS was generated auto by Radio. Got mad when it appeared on someone else's site. After that was explained, problem kind of disappeared.

The RSS topic was starting to get too long and the moderator wanted to switch subjects, before I could get my question in, which was exactly along those lines. He said to defer those questions to Dave Winer’s keynote tomorrow.

Read and Post Comments

« Previous Page