What's That Noise?! [Ian Kallen's Weblog]

Main | Next month (May 2005) »

20050424 Sunday April 24, 2005

Thinking about Microformats and the Hi-Fi Web There is meaning to be derived from the web. But The Semantic Web has little to do with the web itself; it's more about creating parallel universes. The assumption that there must be a separate structure to identify meaning on the web is given by descriptions of The Semantic Web.

HTML has limited ability to classify the blocks of text on a page, apart from the roles they play in a typical document's organization and in the desired visual layout.
OK, but those assumptions may be flawed. Yes, often markup is produced that only browsers "understand" to the extent that their responsibility is to render a visual layout. But it doesn't have to be that way.

For instance, right now, many web applications that display user profiles do so in a way that other applications can't understand. The data is flattened in a way that it can't be consumed and meaningfully reused. Perhaps the markup functions properly in web browsers; how the layout elements are identified and therefore stylable for proper display works. But if the markup can't be remarshalled into data, it's low-grade ore. The data becomes markup mojibake. The Semantic Websters say: RDF to the rescue! Just maintain a parallel universe of data! Sure, if the data is marked up in some random ad-hoc fashion without regard to the actual data relationships, it's a problem. Application developers seeking to mine that mis-HTML-ified data are forced to write custom parsers to grok that data. Usually, the remarshalling can't be done losslessly, it's a low-fidelity roundtrip.

Web applications typically do this:

Inside the markup, there is structure and embedded bits of meaning, microformats.

But the round trip is hard. Taking markup and deriving semantic meaning from document elements usually requires understanding a lot about specific implementations of data renderings.

The one-web is easy. The two-way web is hard.

When I talk about the one-way web, I'm not referring to protocols, HTTP methods or the "web two dot oh" read-write web. I'm referring to how code handles data to produce pages.

The microformats efforts aim to make the data on the web more understandable, more reusable and therefore more valuable without all of the complexities and problems that pervades The Semantic Web's RDF-centricity. By employing some basic XHTML norms, this data no longer needs to be flattened and lost. A microformat can be embedded in a web page's markup and be remarshalled as data. This is the high fidelity web.

The value of microformats is that your application could already be generating them and you're not even aware of it; there may be data that can be parsed, understood and reused waiting to have value unlocked. The microformat evangelism seeks to make your use of understandable markup intentional (disclaimer: I don't speak for Tantek but I speak with him frequently and I'm just purveying my current interpretation). Whereas microformats are about making the web natively understandable, The Semantic Web is about alternate formats.

When I've read others speak of microformats and alternate formats, I've seen discussion of RSS and Atom thrown in. By definition, these are not microformats, they are alternate formats. Not that there's anything wrong with the existing parallel universes, I just don't want to build more of them. How many goofy XSLT tricks does the world need to go from structured data with yet-another-vocabulary to renderable markup? The microformats answer is zero. Structured blogging looks like more markup mangling to get around, instead of fixing, the crappy user interface tiers of applications; it just doesn't seem necessary.

There's also a lot of interesting things that could be done to specify the intention of links. We'll have to call these nanoformats. They don't refer to data structures or relationships but they can still ascribe more meaning to links.

Vote Links
Attempts to indicate whether your reference to something is negative, positive or neutral. I have mixed feelings about this specifically, so I 'spose I should <a rel="vote-abstain"> abstain </a> from further commentary but in general, I like the idea of embellishing a link with intentions.
Being able to distinguish between intentional links and accidental (i.e. placed not by the page author but some tool or untrusted third party) links is an important element of making the web more meaningful

I think the adoption of hCalendar, hCard (returning to the user profile case above) and the maturation of other microformats holds out the promise of the high fidelity web.

Internal application communications should, of course, do what is expeditious for development and runtime efficiency. But for the web (i.e. the the world wide one), the adoption of markup norms just makes sense. The diffusion of these formats means exercising patience while the web gets more coherent but I find it much more appealing to try solving the problem once in one rendering that the public can consume versus creating yet more parallel universes.

( Apr 24 2005, 01:41:24 PM PDT ) Permalink

20050419 Tuesday April 19, 2005

Technorati as Tech Support: MacOS X update vs. Java This is a true story: an install of the recent MacOS X update bjorked the existing Java installation on a friend's powerbook. Just invoking "java -version" resulted in a lovely little "Segmentation fault." Thank you Cupertino!

A quick search on Technorati returned a pointer to the resolution from a post as the first result. The fix was to reinstall some security updates but the immediacy of the answer is what was really great.

All Hail The Real Time Web!

( Apr 19 2005, 03:01:57 PM PDT ) Permalink

20050417 Sunday April 17, 2005

NIN's Open Source Music versus Linus and Larry's metadata propriety I was saddened to read of Larry McVoy's stand on Andrew Tridgell's BitKeeper client development (I like Larry, BitKeeper, etc... which is what makes this tough) and the attacks from Linus Torvalds that followed. Contrast this with Trent Reznor.

What's the connection? The source data for Nine Inch Nails' new single, "The Hand That Feeds" is available to download and muck with in GarageBand. This is a very different attitude about openness and derivative works.

From Make:

"For quite some time I've been interested in the idea of allowing you the ability to tinker around with my tracks -- to create remixes, experiment, embellish or destroy what's there," Reznor says. Here's a screenshot of it on my Mac (View image) and here's where to get it (70MB file). Here are a couple of the first remixes!
This came via Joi Ito and he aptly nails it (no pun will go unpunned!):
"Now if only they would put some kind of Creative Commons license on it, it would be perfect."
I'd prefer that BitMover focused more on innovating the platform (perhaps "application lifecycle management" is excessively hi-falootin but it's not freakishly off-base), there's a lot of room for SCM products to add value or integrate with other pieces adding value elsewhere in the application development chain. Closing the door to third party client innovation is a failure of imagination. Larry is pretty much counting on his internal team (talented though they may be), to be wiser than the community at large about how clients should function, how product specification should interoperate with SCM, how bug and issue tracking should work with SCM, etc. Open client development and derivative works is where it's at. It seems like no new service these days is launched without providing some kind of REST API (I just started checking out recently emerged Upcoming yesterday, the API issue is on page one). The ubiquity of Creative Commons is an undenialable force. Well, I'm not on Larry's case per se, I do admire the guy but the absence of cluetrain savvy is disappointing nonetheless. photo

Maybe in my copious spare time I'll figure out how to have some fun with GarageBand.

( Apr 17 2005, 09:21:07 AM PDT ) Permalink

20050416 Saturday April 16, 2005

Cherry Blosson Festival: The Japanese Equivalent of Oktoberfest I hear from my colleagues in Tokyo that the Cherry Blossom Festival is a Big Deal.

If a picture is worth a thousand words, here's a volume.

( Apr 16 2005, 07:56:15 AM PDT ) Permalink

20050413 Wednesday April 13, 2005

Local Search Heating Up While the web seemingly spans all geographical boundaries and has created unprecedented marketplaces, I'll bet that given a choice, you'd rather do business within your community. Looks like the search engines will facilitate that as well.

Yahoo! Local is ramping up their play. They're offering the moms-and-pops free storefront websites. The hosted site gets highlighted placement in their local listing. Doesn't look like it integrates with their domain registration service but that seems like a logical development to expect. Hopefully, they'll spif up the listing's interface, Google's has a cleaner layout with a map front and center. It does the whole Ajax dance to bring up satellite photos of the area. But if there's a coup de grace from Google, it may be Google Local for Mobile Phones.

Blogging proximity will certainly play a role in the marketplaces of the future (though I'm not sure what). There are all kinds of ways for folksographies to be expressed: locations, venues and coordinates. I'm pondering ways to correlate venues with spatial tags because while virtual communities will continue to evolve free of the constraints of space, I'm convinced that sooner or later the will bring it all home.

( Apr 13 2005, 11:30:42 PM PDT ) Permalink

20050412 Tuesday April 12, 2005

Hello Berkeley DB This morning, I wanted to get familiar with the Berkeley DB "Java Edition" API (that's a mouthful, can't I just call it "sleepycat"?). I was in a carpool and I don't think the dude driving realized I was hacking-in-traffic. While I've happily used used DB_File in Perl for years-n-years, I haven't had time/opportunity to mess with the Java stuff from Sleepycat. However, at work we're cooking up a durable message buffer with an embedded servlet container, fun! As with poking into any new API, I like to start with Hello World.

Here's my Sleepycat Hello World

import com.sleepycat.je.Cursor;
import com.sleepycat.je.Database;
import com.sleepycat.je.DatabaseEntry;
import com.sleepycat.je.DatabaseConfig;
import com.sleepycat.je.DatabaseException;
import com.sleepycat.je.Environment;
import com.sleepycat.je.EnvironmentConfig;
import com.sleepycat.je.LockMode;
import com.sleepycat.je.OperationStatus;
import java.io.File;

public class HelloBdb {

    public static void main(String[] args) throws Exception {
        String key = args[0];
        String value = args[1];

        File dir = new File("db");

        Environment env = new Environment(dir, new EnvironmentConfig());
        Database database = env.openDatabase(null, "foobar", new DatabaseConfig());
            new DatabaseEntry(key.getBytes()), new DatabaseEntry(value.getBytes()));

        DatabaseEntry foundKey = new DatabaseEntry();
        DatabaseEntry foundData = new DatabaseEntry();

        Cursor cursor = database.openCursor(null, null);
        while (cursor.getNext(foundKey, foundData, LockMode.DEFAULT) == 
                OperationStatus.SUCCESS) {
            String keyString = new String(foundKey.getData());
            String dataString = new String(foundData.getData());
            System.out.println("Key | Data : " + keyString + " | " + 
                dataString + "");

Of course, the real fun will be running this in a multi-threaded environment and the concurrency issues therein. With Hello World done, it's time to move on to see what else needs to be added to the cookbook.

( Apr 12 2005, 08:57:09 PM PDT ) Permalink

20050410 Sunday April 10, 2005

del.icio.us investment details Joshua Schachter just announced the specifics of the del.icio.us investment. It's an intriguing cast of characters.

Here's the roster:

On the mound: Union Square Ventures
specializes in infomediation
Catcher: AMZN
Amazon is certainly interested in how you tag the products your interested in
1st base: Marc Andreesen
what's he up to now?
2nd base: BV Capital
Bertelsmann Ventures are playahs
short stop: Esther Dyson
another leading light
3rd base: Seth Goldstein
Right field: Tim O'Reilly
book mogul
Center field: Bob Young
founder of RedHat
Left field: Josh Koppelman
The half.com guy
DH: Howard Morgan
Not sure, IdeaLab?
I've been a del.icio.us fan for a long time, this seemed inevitable. Looks like a lot of good folks, congrat's to del.icio.us!

( Apr 10 2005, 06:38:37 PM PDT ) Permalink

20050405 Tuesday April 05, 2005

Talk To The Blog Cause The Hand Isn't Typing If you'd rather dictate audio than type a word composition, check out Audioblogger. It works with a blogger account by posting audio that you call in to a number that Audioblogger provides.

With all of the phones these days supporting crude digital video features, it seems like just a matter of time before vblogging comes of age. Er, hold the phone, Google is archiving video clips now. In the meantime, I'm imagining James T. Kirk rambling philosophically to his blog about a recently visited corner of a previously uncharted galaxy and posting video mashups of interstellar foibles.

( Apr 05 2005, 07:55:31 AM PDT ) Permalink

20050404 Monday April 04, 2005

How Not to Learn Japanese I've been collaborating with a team in Tokyo a lot over the previous few months. They've been very gracious about speaking English with me but I've been unable to reciprocate. Anyway, I've had my eye on the w3c conference in Chiba. I should probably learn to speak a little Japanese.
This is what I'm not going to be doing to learn Japanese:
Kanji Quiz Toilet Paper
This is an extremely extraordinary item, toilet paper with the power to teach! For those who prefer to "sit in the library" this great paper can provide... hours (?) of fun and prepare them for that pop quiz in Japanese class the next morning. Three different methods of learning are provided "multiple choice", "reading", and "philosophical fill-in-the-blank". Written in a soft blue color on white. A superb item for anyone interested in studying Japanese, it's really cool on many different levels.
- eBay blurb where this was offered for $4.80
Maybe I'll just take the phrase book or something.

( Apr 04 2005, 01:43:29 AM PDT ) Permalink