What's That Noise?! [Ian Kallen's Weblog]

All | LAMP | Music | Java | Ruby | The Agilist | Musings | Commute | Ball
« Previous page | Main | Next page »

20070629 Friday June 29, 2007

Power Bet

Powerset Last night I was among an invited group that Powerset brought in to witness how their natural language search sausage is made. It was actually kinduva cold cut platter: not exactly a meal but an interesting variety was offered for consumption.

When I was a kid, I thought that by 2007 we'd all have flying cars, rocket packs and computers would be all-seeing/all-knowing accoutrements on our wrists. I think all of us who ever watched Scotty verbally ask the Enterprise questions and get responsive answers in English sentences has had hunger pangs for satisfying natural language search. Powerset is trying to advance human-computer interfaces a little closer to that satisfaction, leap frogging previous efforts, by licensing Xerox PARC's technology and hiring a buncha heavy hitters to make it real.

Powerset COO Steve Newcomb introduced some of the sluggers in their line-up, walked attendees through the thinking behind their PR and release strategy and provided a peek into their search capabilities.

Among the impressive powersetters are people who have been-there/done-that with scaled-up search such as x-Yahoo!'s Chad Walters and Tim Converse (read Tim's post the other day about term proximity and linguistics, great stuff), as well as experts in natural language search with backgrounds at PARC and Ask Jeeves. As a company, they're not just-another-web2.0 rails app built by 2 guys and trying to get to the next level. Powerset is more of a bold bottled-lightning science experiment embracing ruby n' rails as a way to get it in front of people.

Powerset has signed up 10K people since announcing the availability of updates and previews on PowerLabs a few weeks ago. Newcomb characterized their labs preview effort as a way to use social software to guide product management decisions, "a mashup of Digg, Facebook and Google apps." I'm a big fan of transparency and community inclusion, it will be interesting to see how inclusive/closed this effort is.

OK, so after all of that, the "Where's the beef?" moment arrived. A side-by-side comparison interface was demonstrated with Powerset results on the left and Google results on the right. Explaining that the test index was scoped to Wikipedia, the goog results were similarly scoped down. The Powerset use case was demonstrated with a query like "What politicians were killed by disease?" On goog, the results are matching terms (and variants on their stems), "politicians", "killed" and "disease". Powerset matches semantically similar tokens and their grammatical relationships.

So Powerset's top result for that query highlighted Sir Edward Heath died from pneumonia on Wikipedia's page for Edward Heath. Highlighting a completely different snippet (none of the query terms were matched but the semantics were) that accurately answers the query is very impressive. Powerset is using Freebase's ontology and WordNet's synonym mappings to connect indexed sentence structures to the query. They do all of this analysis and mapping at index time, which undoubtedly raises the cost of indexing tremendously. They're making a big bet that the raised search results quality will pay those costs back.

When asked about the computational horsepower required to index web documents with the sentence structure decomposition and semantics mappings, Newcomb hedged at first ("Barney's gonna kill me", referring to CEO Barney Pell). But alas, he convinced himself (or did a good job method-acting conviction) that it was safe to reveal that it takes them about a second to grammatically analyze and index a typical document. Lamenting again about his confession, someone from the audience quipped the query, "Which CEO killed Steve Newcomb?" Yea, he didn't search their index for that.

On the subject of Google comparisons, Newcomb kinda squirmily described Powerset as reverent of ("not cocky about") what Google has accomplished but taking a different approach to web search. Doing side-by-side comparisons with Google as their demo does is pretty ballsy and it seems to get them in trouble; being positioned as a "Google killer" by their audience of search wonks and journalists when things are still very much at a proof-of-concept level seems rather premature. I think Powerset needs to reel that in lest they awaken a sleeping giant and fill him with a terrible resolve while they're still on the tarmac. If you've designed a new aircraft, you don't trumpet about revolutionizing aeronautics before the test pilots have taken off. Particularly if folks are proclaiming that Boeing is in trouble. When Powerset indexes a real web corpus, it will be interesting to see how successfully they can overlay web graph, clustering/disambiguation, time and other relevance components. I think that will provide a real moment-of-truth.

Powerset is making a big bet on natural language search as a transformative technology. They've got a lot of great people and a lot of great technology. All in all, the presentation felt a little dog-and-ponyish with the limited corpus but I'm looking forward to hearing more from them later this year when they release a major iteration. See also:


( Jun 29 2007, 10:41:46 AM PDT ) Permalink

20070616 Saturday June 16, 2007

Natural Hackasters

Hack Day: London, June 16/17 2007 I'm reading with amusement and wonder the events that unfolded at the Yahoo! Hackday in London. Apparently the Alexandra Palace main hall (the BBC's venue for this) has a roof that opens up. And it did. This was precipitated by a lightning strike on the building as a storm blew over (precipitated, storm: no pun left shall be unpunned). Yes, audience member laptops are open, PA system all setup... and it's raining inside the hall. Not to worry, all Londoners are equipped with umbrellas at all times. That's a fact. "I thought a bomb went off", sez Chad of the lightning strike when he was on IM a few hours later. Is the roof there like Chase Field where the Diamondbacks play baseball in Phoenix? I dunno, I'm checking out pictures of "Ally Pally" to assess. Anyway, power and wifi are back and the show goes on.

Follow along with Hackday London Lightning on Technorati's hackdaylondon tag stream.

( Jun 16 2007, 10:45:59 AM PDT ) Permalink

20070609 Saturday June 09, 2007

Disappearance of the Desktop Interface

I was sick of various computer OS desktop metaphors 10-12 years ago. At the time, I thought virtual reality technologies were gonna take over (anybody else remember VRML?). I remember the Windows 95/98 releases, lauded by Microsoft as such great advancements, striking me as just laughable in their utter lack of imagination (even if they were big upgrades from the Windows 3.x mess). When that "innovation" made it to Windows XP, I realized that Microsoft was hopelessly lost as far as OS interface design.

Since then, I've seen a lot of technology changes that I view as the harbingers of the desktop metaphor's demise. Graphics card technology that was once only found on $15-50k SGI pizza boxes workstations are now cheap as pizza. Jeff Han's demonstration of high resolution multi-touch applications at eTech and TED last year was fantastic. At TED again this year, the photosynth demonstration got a big round of "oohs" and "aahs" from a rapt audience (you must see the detail zooming, also check out this photosynth demo reel).

So when are we gonna see these technologies in our everyday lives? Apparently, soon. It's funny how different Apple and Microsoft's foray into this is. In a few weeks, Apple is coming out with a $500 phone (the multi-touch usage is demonstrated at 3:55 into this MacWorld TV report from last January). By the end of the year, we will reportedly see Microsoft's $10k coffee table appearing in hotel lobbies. Can't wait? Fishing in your pocket for an extra $10k? Into starcraft? There are some folks working on a multi-touch DIY kit (Microsoft: 0, Hackers: 1).

Putting on my futurist hat: Five years from now, Intel's 80-cores-on-a-fingernail chip, voice recognition audio inputs and multi-touch screens on commodity devices will make the desktop metaphor seem like a quaint joke. Kids born today will shake their heads in disbelief that desktops we're productive tools. I've yet to explain a command line interface to my kids, who are grade school age; as familiar and comfortable as those interfaces are to me, the youngins look at me typing in a shell window with puzzlement. In their youthful eyes, I may as well be composing vulcan legal tracts (the reality is probably more frightful, it might really be perl). Computing interfaces will fade away into our intuition.

I just wish the iPhone was coming out in time for father's day (yes, honey, that's a hint). In the meantime, I'm still putting up with Apple and Microsoft's OS interfaces, wincing at the trash cans, recycle bins, folder icons, etc. It'll be good riddance.


( Jun 09 2007, 10:23:46 AM PDT ) Permalink

20070601 Friday June 01, 2007

Web Spam As Signs of the Times

There was a time not long ago when Findory offered a credible value proposition for participants and consumers of the blogosphere. The idea of a blog recommendation and reader personalization service is a good one. I guess things didn't work out as planned at Findory. Earlier this year, Greg Linden announced that Findory was riding into the sunset.

The old Findory blog (@ http://findory.blogspot.com/) has been dormant for some time (the last posts from Greg were in 2005), now it's been taken over by a splogger who has been grabbing abandoned blogspot URLs (this one has PageRank of 3) and posting link farm links and German keywords to them. Sad.

I'd recommend holding on to your blogspot URLs forever; even if you're not using 'em anymore it's better to maintain the museum piece than contribute to the web spam problem.


( Jun 01 2007, 12:55:10 PM PDT ) Permalink

20070509 Wednesday May 09, 2007

No splogs, ay

I had to take a few days off of work last week because of my aching back, it was really a fog-of-pain for a few days but this week I'm on the mend and in beautiful Banff for the WWW 2007 conference. Actually, I'm mostly here for the AIRweb workshop but staying a few extra days to hear what folks are thinking about regarding the future of the web, online information retrieval, humanity, and so on.

The AIRweb submissions included a lot of web graph related research. Some of it makes quite intuitive sense: web spammers will link to their spam sites as well as legitimate sites (camouflage) but legitimate sites don't link to web spam sites. So some of the talks discussed the underlying linear algebra of these phenomenon (Anti-TrustRank and BadRank) or their inapplicability to identifying spam (TrustRank). The presentations about temporal patterns, spam term density, the effects of on-the-fly re-ranking and javascript redirection were quite interesting.

A lot of these rank-demotion and web graph heuristics aren't really central to the efforts we have at Technorati for thwarting splogs. We instrument the data streams for baseline behaviors of various features. It's more like an intrusion detection system because fundamentally, web spammers can't behave like "normal" publishers and still succeed; they have to compensate for their absense of popularity with all kinds of abnormal behaviors and those behaviors are quite intrusive if you're listening for them. And so we are. This is by no means perfect but we're doing way better than 80-20. It's my belief that as the web becomes more participatory and there are incentives and opportunities to inject junk into it, intrusion detection will as much a vital capability as search relevance rank demotion to maintain a high quality experience. At the close of the workshop, I proposed that the web spam research community tell us what they want; what can we do to help? I can only imagine that Technorati's data streams could prove useful for the growing challenges of the participant-driven and temporally sensitive web.

So that was yesterday.

This morning, Tim Berners-Lee kicked off with a keynote that touched on the successive innovations of email, the web, wikis and blogs. On the iterative nature of technological and social change, he drew a cycling diagram of the needs that emerge when changes occur and enjoy widespread adoption and the collaborative/creative forces that drive innovation. He laid out how the Semantic Web was the next iteration and complex meaning will be readily accessible on the web. OK, that's all well and good. However, I just don't buy this idea that the Semantic Web is ... the Web at all. We have a web for people (he ackowledged as much at the beginning of the talk) but the idea of having tons of detailed data representations for generalized browsers of really complex data... I just don't get why folks won't end up building domain specific apps anyway. Building UI's for "general data representation" means that you'll never really be able represent the domain specific qualities within some part of The Ontology. At least, I've never seen those things work. Useful apps need domain experts (champions of the end-user e.g. product managers) and engineers to build something that works for that domain. Generic UI's breakdown when dealing with the nuances of specific domains. I want a data-rich web for humans that is machine consumable (microformats), not a parallel-universe web of machine-oriented RDF. Anyway, thanks for inventing the web TBL and good luck all you Semantic Webbers. I think you'll need it.

I almost fell out of my chair though when TBL said that blog spam isn't really a problem. I'll surmise that he has a set feed reader repertoire (or, old school bookmarks) and doesn't use blog search much. While I think we've done a pretty good job spam scrubbing Technorati, the fact remains that there is a veritable ocean of pinging rubbish mongers engaging in underhanded payola schemes, kleptotorial and other nefarious endeavors out there. What spam you do see on Technorati is the tip of the ice berg. Tim, use our site, despite the ice berg tip :)

Side notes: when in Canada going to "google.com" gets redirected to "google.ca" which includes a toggle to search "The Web"/"Pages from Canada" ... amusing, ergo the graphic in this post. Also, I can't believe how long the days are here; about 3 hours more daylight than the San Francisco bay area!

So thanks to Brian Davison, Carlos Castillo and Kumar Chellapilla for putting together a great AIRweb program, good work guys! I'm heading home tomorrow.


( May 09 2007, 09:44:35 PM PDT ) Permalink

20070430 Monday April 30, 2007

Blogging Upright

I've been asleep just about all day, the pain killers and muscle relaxants they gave me last night were that good.

It all started a few weeks ago when EBMUD sent me a water bill that indicated over three times our normal water usage (and three times the cost). Everything seemed fine with all of the household plumbing. I called for an inspection, their inspector didn't show up on the day I expected them. But we got a note left on the door saying that, while nobody is home, the water meter runs continuously and that our usage continues to be unusually high.

Over the weekend, I checked around the house more diligently. What I thought may have been a wet spot by the side of the garage (not far from a spigot) seemed like a good candidate, so I got the shovel and started digging. The soil didn't get much softer as I dug deeper. There was no specific motion or event that I recall being more vigorous than others but in the hours that followed, a pain in my lower back grew. And grew. And grew to a point of intensity that everything I did hurt in my lower back. Sitting down. Getting up from a sitting position. Laying down. Everything hurt, intensely! A doctor friend of mine told me that I musta skipped charter 2 of the "You're over 40 now" manual where it is specified not to do any more shoveling. Doh!

At the emergency room, they gave me a cocktail of toradol, dilaudid and phenergan and a prescription for soma and percocet. The shot last night really knocked me out, I've been asleep off and on most of the day today. I'm gonna be doing a lot of laying down with ice on my back. A lot of walking around. But not a lot of sitting. So, I'm writing this post woozy from the drugs but standing upright with the pooter on the kitchen counter. Gonna go for a walk next. I need to resolve things with the water company and the plumbing on our premises.


( Apr 30 2007, 04:47:38 PM PDT ) Permalink

20070421 Saturday April 21, 2007

The Users of Your Service Are Your Best Friends

I try keep my ride on the cluetrain rolling by listening to what users of the services I help maintain have to say. The Technorati support forums have provided me with a great opportunity to hear what problems Technorati's members are experiencing. For the uninitiated, Technorati's crawler analyzes web pages to identify blog posts, make them searchable and identify links that measure what the blogosphere is paying attention to. There are a fair number of blogs that get caught in our automated blog flagging; the service processes several million pings per day and amidst that throughput, there are going to be mistakes in the flagging heuristics (flagged blogs are, naturally called "flogs", sometimes they end up demoted as "splogs" but others, turn out to be legit blogs). I'm trying to reduce the mistake rate; the indexing hazards that folks run into are a source of much grief (it doesn't take much to find folks who are very vocal about such lapses).

So, I've been on a tear over the last few weeks chasing down problems in Technorati's crawler and identifying its failure conditions. It's code that, until recently, I've not been too intimate with but inheriting responsibility for its functioning has forced me to study it more closely and grasp a firmer command of python programming. A peculiar failure case that had me puzzled for a while involved blogs that had (sufficiently) well formed pages and feeds, there didn't seem to be anything wrong with the data that'd prevent us from indexing them and yet they consistently failed to get indexed. I first became aware of it in this topic

The issue moved to a new topic where an initial diagnosis I offered (corrupted gzip encoding from Apache 2.2's mod_deflate, I thought) didn't quite pan out. But follow-ups from Technorati users KilRoY66 and wa7son helped clarify that the culprit was the gzip encoding that wordpress was configured to do. Apache 2.2/mod_deflate, you're off the hook. Their blogs (TNTVillage blog and justaddwater.dk | Instant Usability & Web Standards, respectively) both used Apache 2.2 but they both are also hosted on wordpress.org installations. For reasons yet to be explained, python's gzip library detects the encoding returned by wordpress as corrupted. Thank you, Technorati members, for helping identify this issue!

I'm going to patch the code (based on Mark Pilgrim's openanything) to recover from encoding errors and raise a proper exception if it's truly unrecoverable (as it is currently, the code catches any exceptions from decompressing the bytes, prints a message and moves along, essentially swallowing a fundamental error). In the meantime, if you're not getting indexed by Technorati and you have wordpress' compression on, try turning it off and see if that makes a difference.


( Apr 21 2007, 02:15:01 PM PDT ) Permalink

20070413 Friday April 13, 2007

Throw Out The Bums

The Bush administration and their friends run the gamut from "that's fishy" (WMD's? In Iraq?) to "that's wrong" (poor judgement of intelligence) to "more corrupt than any Presidential regime in history" -- there's no salvaging this presidency, it's an unmitigated train wreck. The judgement is not just regarding the subterfuge of warring on Iraq premised on phoney Al-queda links, the misinformation around the "we're winning in Iraq" meme, nor the recently illuminated goose-stepping in the Justice Department. George Bush, with the aid of Dick Cheney and Karl Rove, will certainly be judged by future retrospectives as the worst president in American history. Let's pile on the recently divulged shenanigans of Paul Wolfowitz, was one of the primary architects of the President's Big Lies of Foreign Policy. Wolfowitz has been perking up his personal dalliances on the tax payer's dime! The details emerging in the news this week (see Wolfowitz Apologizes For 'Mistake' - At World Bank, Boos Over Pay for Girlfriend) underscores what a buncha corrupt and loathsome creeps these hypocritical neo-con bozos are.

Impeachment proceedings? Criminal prosecution? Nuremberg trials? I'm not sure where it should stop but there clearly is much to be done. Throw out the bums!


( Apr 13 2007, 08:50:47 AM PDT ) Permalink

20070410 Tuesday April 10, 2007

Technorati's Blog Top Tags Widget

We've been working on technologies to support scalable widget publishing and serving over at Technorati. One product of that effort is the recently announced Blog Tag Cloud Widget. With this widget, your readers can browse your blog by pivoting on the tags in your posts.

View blog top tags
Some geeky details: The widget technologies leverage Technorati's internal event distribution system to trigger content generation with asynchronous publishing. There's still much to be done but we're shaking out the kinks to enable a broader repertoire of useful, timely and easy to use widgets. We've got boatloads of ideas for more widgets, check out the Technorati Tools page for more information about the Blog Top Tags and the other widget goodies we've cooked up. And we love to see what independent developers such as Doug Karr come up with; if you DIY with the Technorati API, tell us about it. But even if you don't feel like rolling your own, let us know what grabs your fancy. If there's data on Technorati that you think would be great to embed in your blog, let us know; maybe we'll make a widget out of it!


( Apr 10 2007, 05:07:24 PM PDT ) Permalink

20070406 Friday April 06, 2007

Embracing Hats At Technorati

Wow! I'm sure the executive search announcement (Embracing Change) that Dave Sifry just posted will elicit a range of responses from the blogosphere. While I expect a lot of speculation and innuendo around it, I probably have a unique perspective owing to my three years working with Dave at Technorati (this week was my third anniversary). I'd like to share some of that perspective.

It started for me in the Spring of 2004, I was trying to figure out to do next. At the time, I was frankly very skeptical of Technorati when a friend suggested that I make that my next stop. Most of the time that I visited, the website was unavailable or had PHP errors all over the place, what a trainwreck! Maybe I can fix it.

What I expected to be a short conversation with Dave that fateful day in March 2004 turned into one that lasted for hours. After much discussion about the impact technology developments have on publishing and social discourse, I was struck by Dave's insight, inspiration and passion. In turn, I committed myself to taking what I knew about scaling web sites, learning what ever I needed to scale for the blogosphere's unique requirements and fixing the technical problems that plagued Technorati; I joined Dave to make Technorati the real time engine that would provide micropublishers with the connective tissue of community.

In the years since then, I've worn many hats. Software developer, DBA, sysadmin or whatever-it-takes; I came to Technorati determined by-any-means-necessary to sustain and improve Technorati's state of the art. The company has grown (there were about five us back then). The blogosphere has grown (there were only a few millions blogs back then). And all of us working together at Technorati have grown as people. Today, I still collaborate closely with Dave and Adam. I lead the Core Services group and work with our fabulous front end (led by Dorion) and search engineering (led by Brian) teams as an architect of Technorati's evolution. In helping lead the reshaping of Technorati's infrastructure, we've sought the right path between oft conflicted goals of flexibility, economy and run time optimization. We haven't always gotten it right. Technorati's storied episodes of instability and poor performance were often the source of much sleepless grief and perhaps opportunity costs. There have been business directions that have led our attention up some blind alleys. There have been technical errors. Project execution errors. Hiring errors. And so on. But, if I may add without being too immodest, we've done a lot of things well; I think ultimately the right things have happened. It is an honor and privilege to work with these folks, except for how wonderful my kids are, I couldn't be prouder of them!

So that brings us to where we are today and my ongoing working relationship with Dave. Dave's not going anywhere else. He may do some hat trading. But I expect to enjoy the benefits of working with him... for as long as I can!

Through good times and bad, Dave provides vision and inspiration. Synthesizing new ideas, asking tough questions and supporting the creativity and enthusiasm of all us working on Technorati, Dave is a catalyzing force. Among the principles Dave demonstrates that are important to me is internal transparency. Being open with factual matters about the company, the markets we address and the competitive landscape enables myself and others at Technorati to contribute their smarts and creativity in ways that a closed environment would never benefit from. Because that openness isn't always extended to parties outside the company, Technorati has been accused of being secretive. Well, sure. We don't answer rumors or trumpet funding events. But being judicious about external transparency is part of life; we could spend all day fielding the probes and inqueries from outside parties but to what benefit?

Throughout my years working with Dave, we've made it a point to hire only great people at Technorati. "Good" isn't good enough and "can do the job" can't; we only hire great people, period (and we've passed on a lot of good people who could ostensibly do the job if there was doubt about their greatness). It seems entirely plausible that there may be great people other than Dave who can wear the CEO hat better than he can. There might not be. I support Dave's launch of this search to find out. I'm looking forward to meeting this person; if he or she exists, there are some incredible shoes to fill (and hats and a buncha good stuff in between)! However, I expect to continue collaborating with Dave irrespective of if or when there is another CEO. Regardless of what hat he's wearing and which one I am, I'm looking forward to working with Dave towards Technorati's continued success.

So, wow! I'm posting this as an embrace of Dave, a virtual hug and an assurance of my unflagging support.

( Apr 06 2007, 12:16:22 PM PDT ) Permalink

20070401 Sunday April 01, 2007

Test Driven Damnation

The long drive down to Visalia didn't quite go as planned. Since my daughter is doing a Shakespeare theater camp this summer, I figured listening to a "Romeo and Juliet" dramatization for three and half hours on the road would provide enriching entertainment. But by the time we got to The Five, I sayeth unto myself, as if Yoda did Shakespeare, "Patience for lengthy dialog in British accents, thou hath not." Eject that CD. We ended up yacking and partaking in Dave Matthews, Green Day and Queen instead.

Besides the California state Odyssey of the Mind competition this weekend, Visalia seemed to be hosting a soccer event and a gathering of evangelical christians. I'm sorry, but really. The latter are very silly people, they take themselves and their world view way too seriously. But I guess if you're convinced you have a divine and ultimate truth, that could happen.

Odyssey of the Mind is a creative problem solving competition. It's an exercise in questioning boundaries.

Evangelicals seem to think their prophetic fantasy has to unfold, and I... have some problems with that. Having a grand Test Case scripted in advance for all validation to be held against is a set of boundaries that are probably only best applied to technology development. Test-driven damnation is just too wacky for moi. At least some of the evangelicals have a sense of humor, evidenced by the license plate frames and bumper stickers in a Visalia hotel parking lot:

Maybe those were actually Odyssey families who've tired of their "My Boss is a Jewish Carpenter" sloganed regalia. Whatever. Buick, Cadillac, Jaguar: Jahweh or the highway.
Um. Anyway.

On to the outcomes. My daughter's Odyssey team has, for the second year in a row, achieved victory in the state arena and are headed to the World Finals. I understand that the Michigan State campus will be much more of a class act than last year's in Ames, Iowa (there's really not far to go to make that claim).

OK, perhaps I'm not doing too good tonight. Since I've only written a few words but have already managed to offend my Iowan, British and Christian friends, perhaps I should exit my Highway to Hell and call it a night. Forgive me while I grasp at excuses: too much driving and sleep deprivation precipitates rambling snark.

But seriously folks, I'm still blown away by the kids' victory, they're a young team for division II (four 5th graders, 3 in 6th). I was keeping the faith but also expecting to quietly go home and have that be the end of the OM road. However, their long term scores dominated the competition despite a poor showing in spontaneous. East Lansing, here we come!


( Apr 01 2007, 10:06:07 PM PDT ) Permalink

20070328 Wednesday March 28, 2007

I'm calling "Bullshit!"

I've been using social software for a few decades now. It started with dial-up BBS' (300 baud modems worked pretty good when you had tweets of text) and compuserve (user ID's were numbers and commas, IIRC). In the 1990's I used usenet and The WeLL to great delight, chagrin and all sorts of other things. In all of these goings about, the motivations to participate have ranged from voyeuristic curiosity, to chit-chatting and connecting, to utilitarian knowledge sharing, to being of service to others, to showing off in front of others or something else altogether social.

Lately, it seems to me that the blogosphere is buckling under the corrosive forces of greed. You can't trust voices to be authentic. PR firms and marketing departments are hiring bloggers or ghost writing for pretenders. You have buffoons (who may describe themselves as a "Dot Com Moguls") selling links to other blogs with prices based on their Technorati and Alexa ranks. There are all of these schemes like ReviewMe, Algoco and PayPerPost twisting our trust of the things we read. Is that blogger really talking about a great bike or was there some payola slipped in to pimp it? I don't know. My cynicism rises (further).

What's a "pro blogger" anyway? Are folks really discussing or dialing for dollars? SEO is increasingly turning publishing into a perverse sport. If I wanted to watch infomercials, Comcast is already piping a couple of hundred channels of crap through the coax in my living room, I could passively drink the nonsense all day if that's what I wanted.

But that's not what I want. It's a total cluetrain derailment. Instead of using social media to empower meme sharing and synthesis, as a way to expand the collective consciousness, I'm seeing it turned into another exploitation ploy.

Our country is in a brain-dead war, podcasts are published, urban public schools are in shambles, photos are being uploaded, illiteracy is rampant, a funny thing happened at the dog park, baseball season is starting soon, technology innovations are blossoming in our midsts, pandemics threaten us every flu season, tuition is going up (again), omega-3 fatty acids, look at this video of a 4-way tricycle collision, Osama is still who-the-hell-knows-where, Dick Cheney is still satan, Jimi Hendrix is still god, god is still dead, lives are beginning and ending all around us... lives are being lived. The rhythmic thrum of life, that's what I want.

(Disclosure: there are links on this site from AdSense and Amazon. They don't do much for me but then, I don't think they should, that's not why I blog. If you click 'em: thanks, maybe I'll get pizza next week)

Putting a few ads or Amazon links to benefit from the accidental tourist that Google sends you from time to time is one thing. Superficially making it the object of your publishing is just, well, bullshit!

BTW, this is not necessarily the opinion of my employer. In fact, it's probably not. I may be a cynical bastard but this is my authentic voice. You can't buy it.

( Mar 28 2007, 09:52:17 PM PDT ) Permalink

20070314 Wednesday March 14, 2007


Clocks have received (or are overdue for) day-light savings time adjustments, flags with Matt Cain and Barry Bonds are on the lamp posts of 3rd Street, kids are at play; recreational softball/baseball on the weekend and Odyssey of the Mind state championships are coming up. The hills are green and blossoms and blooms abound. Spring has sprung.

Have a nice day!

( Mar 14 2007, 01:30:52 PM PDT ) Permalink

20070225 Sunday February 25, 2007

The OpenID Snowball

In case you haven't noticed, there's been an unmistakable groundswell around OpenID in recent months. The proliferation of new web 2.0 services and the resulting "password fatigue" (except for those who are using OpenID) are contributing mass to the movement but the adoption of OpenID by established services (Technorati, Digg) and big players (AOL, Microsoft) are catalyzing acceleration. I'll cite these headlines as evidence:

Tim Bray raised a lot of great questions about OpenID. I'll summarize a response by simply saying that one of the virtues of using OpenID for URL-based user-centric authentication is the granularity of control available, it's all "opt-in":

  1. relying parties can maintain their own white/black listing policies for identity providers, define what user attributes they require, etc
  2. users are in control of who they allow their identity provider to provide URL ownership verification to, which of their attributes are allowed to be shared, etc
  3. the identity provider can implement policies around which relying parties they'll authenticate for or send user attributes to

Given current implementations, I'm probably not ready to use OpenID for online banking or verification that I'm old enough to buy wine (but I'll be grateful if you ask, I miss getting carded). However, I see no reason why the standards and practices can't be advanced to support those activities. The potential for phishing and man-in-the-middle attacks are a concern but there a lot of applications today where there are many benefits to the opting-in parties but few for the attacker.

Right now, if Tim's comment authentication system was OpenID-enabled, I'd be able to use my Technorati profile URL (http://technorati.com/profile/spidaman) to sign-in to post a comment on his blog. For "low-gravity" authentication requirements (blog comments), OpenID works great, today. For more rigorous authentication , user-attribute verification and trust requirements (like credit score lookups) there's a lot of great discussion underway. What I find heartening is that there seems to be broad acceptance of the Laws of Identity and increasing understanding that there's a big difference between the identity requirements for uploading photos and trading stocks in your IRA account.

URL-based authentication will likely go through the same growing pains raised by using email addresses to identify people. Back in the day when there were email addresses that people paid for and those were distinguishable from free ones, mailing list policies against subscribers with, say, hotmail addresses were often implemented. We may be approaching an era where some URLs are more equal than others, I dunno. But in the meantime, there's a lot of useful services you can use with OpenID today. If you haven't tried OpenID, do so right now by logging in to your Technorati account and then use your profile URL to log into Zooomr; this stuff is easier to use than it is to explain. I wouldn't be surprised if Yahoo! and/or Google get on the bus in the next few months. As the snowball gains mass, you should know how and when to utilize user-centric authentication systems such as OpenID.


( Feb 25 2007, 08:17:17 AM PST ) Permalink

20061115 Wednesday November 15, 2006

Scaling PostgreSQL

Last night, Casey Duncan gave a nice presentation on how Pandora partitions and populates their PostgreSQL data warehouse. The San Francisco PostgreSQL Users Group meeting discussion, Horizontal Scalability with Postgresql: A Case Study, covered how Pandora segments their data set, uses pgpool connection aggregation and Slony for replication and log shipping. I also had great conversation with Greenplum CTO Luke Lonergan, they're definitely pushing the envelope of what's possible with open source database technologies. Kudos to Pandora and Greenplum, great stuff.


( Nov 15 2006, 10:50:56 AM PST ) Permalink