Andrew Shearer: Home

OSCOM Day 3: John Udell Keynote

Excellent keynote. He started with a simple, obvious thing which we tend to get wrong because we’re blind to it: weblog item doctitles that show up properly in search engines. Then a bunch of specific things we can implement, and a look toward the future. Good, practical stuff.

Talked about the content side of content management. Importance of titles and topic sentences. Communication skills. Don’t hit.

Content is the expression of ideas, request for attention, or attempt to influence. Technologists don’t think hard enough about the effort & the reward of making content.

Showed an entry on Don Box's site that displayed its title perfectly in his aggregator NetNewsWire, but Google didn't see it, because it wasn't in the doctitle. Easy to make this mistake. (Reiterated point: Publishing is essentially engineering. We forget these issues because engineers think from the inside out.) What is the right unit of content? Radio Userland has the day’s posts on one page, with the date as doctitle; Moveable Type one per page, so it can use the item's RSS title. Dave Winer's weblog comes in like an IV drip all day, but the audience for most weblogs isn't like that, and they need titles.

This affects how John Udell uses Radio Userland. Dave Winer interjected to ask if it would help to have a field to choose the day's title.

Brent’s Law of URLs: the more expensive the CMS, the crappier the URL. Showed a bunch of typical CMS & welogging system URLs. Tim Bray’s homegrown site was best: example ended with 2002/02/13/NamingFinishing. Vignette’s > $200K product was worst with an awful, long numeric URL.

Structure in doctitles. Search results pages can parse & group the titles. Example: with doctitle like Magazine Name | Date | Dept | title, group search results by magazine issue. Showed good example of this on O'Reilly's site.

Great example of broken titles in just about every mailing list archive. All the titles are wrong—they are the same as the last message in the thread. Not scannable. Showed a mockup with meaningful titles.

A few of the examples had the common thread of repetition of data in the user interface. Search results kept repeating the site name in document titles. Discussion board forums kept repeating the same subject lines. The mailing list example he showed was pretty much wall-to-wall repetition of the same thing. Only difference between successive lines was indentation and author name. A better interface would strip it all out, summarize, whatever. I've run into all the things he mentioned and just gotten used to them. I have to look at them with new eyes.

Call to implement ThreadsML.

Discussion of SlideML. Showed his method of generating it, but it isn't usable by “civilians”. No help in writing the actual content apart from typing raw XHTML in Emacs.

CMS systems came from publishing & were ported to web. Weblogs are web-first.

Hypertextual writing is still stuck in 1995. Netscape did as much or more than wer're doing today in 1996. We need lightweight web-aware writing tool. Need to advance beyond emacs, TEXTAREAs or the shoddy Windows DHTML edit control. InfoPath still relies on crummy XHTML editor.

Compound documents: tend to explode to meaningless names because the system has to add them (e.g. slide027.html). Discussion of old Netscape cid: protocol.

CMSs solve refactoring problems “in the large”: making consistent changes to many files, access, etc. Refactoring “in the small” suck up a huge amount of time: reformatting email messages, etc.

Categorization is a heavyweight operation; there should be other lightweight ad-hoc ways. Example: All Consuming book aggregator finds book references in blogs.

Showed example of searching his SlideML markup with XPath for code examples.

Update: Here are the slides and notes from Bitflux: part one, part two.

Posted May 31, 2003 at 09:09 AM

Categories: Interface, Open Source, Software, OSCOM, General