Wednesday July 26, 2006 I missed the first keynotes (I just arrived in time for Tim O'Reilly's "what technologies are hot according to these slices on the data" bit that he does) but enjoyed Greenplum's Scott Yara talk, School of Rock. He highlighted the parallels of open source development and rock and roll. I'll paraphrase his points.
Open source, like rock and roll, has flourished simply because people enjoyed it. Like rock and roll, money has jumped into open source and an industry has swelled around it. Like rock and roll, open source threatens the establishment but also mutually coopts and becomes the establishment. Yara showed a funny "twins separated at birth?" photo pairing of Rick Rubin and Richard Stallman! What will sustain open source's integrity (like rock and roll's) are the intangibles, the real emotions and inspirations the drive innovation. The popularity game isn't a measure of quality... just because it's widely downloaded doesn't mean it's good just as Britney Spears' and N'Sync's sales success aren't validations of "good" music. So, beware of the vogue of open source, people are starting to believe that open source is better but don't let that undermine what's important. For those who are building their business on open source, go for the $$$ but keep your integrity. At that point Yara ran a little excert of Metallica goofing on a radio promo production (from Some Kind of Monster?), the ironies of choosing them as illustrations of how money changes everything, given how they coopted and have become the music establishment, were high humor for me. Nonetheless, Metallica like a lot of successful open source software projects have succeeded by being a little dangerous, by being genuine and not bothering with the constraints of the legacy establishment.
Anil Dash gave a talk about Trying to Suck Less: Making Web 2.0 Mean Something basically outlining that beyond the technology stack (i.e. LAMP), there are higher level tools that developers can employ to suck less (yep, I confess, at Technorati when we can't quite kick the butt that we aspire to, we focus on sucking less). Citing the technologies that have grown out of SixApart's software plumbing, he highlighted that all successful Web 2.0 compnaies are using load balancing, messaging, caching, filesystems and other scalability and performance platform components. In SixApart's case, perlbal, memcached, mogilefs and djabberd are the core technologies that they build on ... and, so the pitch goes, should you if you want to suck less.
Those the high points of the morning (so far).
( Jul 26 2006, 10:00:31 AM PDT ) Permalink View blog reactions
Tuesday July 25, 2006 In case you hadn't heard, we've had a lot of things cooking at Technorati. Besides the engaging new look, the new features and the complete overhaul of URL search and link counts, we've been making great strides in our blog spam mitigation (you wouldn't believe the stuff we catch ... and the shear quantity of it!), our internal caching and messaging infrastructure and our data center network. Of course, there's still much to do but we've been heads down on it; if you haven't checked us out lately I think you'll find that our efforts to improve the front end, the back end and all of the cogs and pullies in between have been moving forward.
I'm really proud of the team I work with at Technorati! If you'd like to join the team, we have a lot of innovation ahead. Grab me this week at OSCON and tell me about how you'd like to materialize the real time web! I'll also be moderating a Microformats BOF, this will be a good opportunity to talk about the implementations for producing and consuming microformats. See ya in Portland!
( Jul 25 2006, 10:37:36 PM PDT ) Permalink View blog reactions
Saturday May 06, 2006 Blog publishing services typically propagate updates about new posts from blogs (ergo, new blogs too) by pinging or publishing a changes.xml file. But what none of the services provide is an "un-ping" -- blog indexing services such as Technorati don't know when a blog has been deleted from a service. I noticed this today when I found http://blogtrarian.blogspot.com/ participating in a link farm infesting Blogger's service. This can happen because Google's Blogger recycles URLs; when a blog is removed from the system, the URL is freed for reuse.
That particular URL is one that dates back to 2004, it was dormant for several months but just came to life recently with spam. The historic posts (until August 2005) look like normal blogging fare but the recent posts are clearly just splog content. We'll have to work on "un-pinging" so it's easier to distinguish dormant blogs and dead ones.
spam splog web spam google blogger ping
( May 06 2006, 03:13:14 PM PDT ) Permalink View blog reactions
Friday May 05, 2006 So Google's CEO Eric Schmidt says his servers are full, hmm. Tying that to SEO'ers griping about their indexing, Andrew Orlowski speculates that it's web spam besetting big daddy. Could be but the hard data isn't out in the wild. The numbers that we can see are that Google is spending several banana republics worth of GDP on capital expenses:
Google continued to make substantial capital investments, mainly in computer servers, networking equipment and its data centers. It spent $345 million on such items in the first quarter, more than double the level of last year. Yahoo, its closest rival, spent $142 million on capital expenses in the first quarter.
Referring to the sheer volume of Web site information, video and e-mail that Google's servers hold, Schmidt said: "Those machines are full. We have a huge machine crisis." (read more)
If the problem is spam, then certainly it's Google's own doing. The elephant in the room is that the acceleration of web spam everyone's talking about is fueled by AdSense, often aided and abetted by Blogger splogs, Google Pages, Google Base, etc. The spam ecosystem is within Google's capacity to reign in but the don't-be-evil company is making too much money on click fraud with plausible deniability to do anything about it. Is Google having problems handling web spam and "filling up" their machines? Cry me a river, all the way to the bank.
spam google adsense splog web spam
( May 05 2006, 02:09:19 PM PDT ) Permalink View blog reactions
Thursday May 04, 2006 When I read the words on
Microsoft yesterday reached a tentative $70 million deal to settle a California class-action antitrust lawsuit, according to a statement by the law firm representing the plaintiffs in the suit.at http://www.satishlive.info/?p=27 I had the distinct sense of deja-vu. So I ran some queries against Technorati's index and sho-nuf, I found the exact same content had already been published by InfoWorld. Ah, there was an attribution at the bottom... but InfoWorld didn't publish under a creative commons license. Looks like blatant theft.
Then I checked the next post (http://www.satishlive.info/?p=28) on that blog and read:
I took a new blog search tool called Sphere for a little spin this morning and found it useful.... hey, didn't I just see that somewhere else? Yep, this time it was PC World and no attribution.
It's safe to surmise that this is kleptotorial laden with AdSense and stuffed into the update stream. I've seen screenscrapes and feedscrapes on splogs before but they're usually easier to identify visually, I had to look more carefully at this to note its spamminess. Is there a market in alerting publishers to copyright infringement? Obviously this stuff should be removed from Technorati's index but is there a more valuable service to publishers that should be provided here? How much would you pay to find out about misappropriations of your content? Is there a market for Technorati to do something like Plagiarism.org to fingerprint blog content?
splog creativecommons copyright spam creative commons plagiarism adsense
( May 04 2006, 09:34:17 PM PDT ) Permalink View blog reactions
Tuesday May 02, 2006
The chatter (even art work on flickr) about it is frantic. Thank You Stephen Colbert has 700 links right now (this is a blog that came into being less than 72 hours ago), it's getting about five or ten links per hour at the moment. The videos are the most linked-to youtube reels on Technorati. How wonderful it is to have an administration that is so bad, the opportunities for high humor are so many. Why did we invade Iraq?
stephencolbert colbert flickr youtube technorati bush cspan whitehousecorrespondentsdinner colbertreport comedycentral politicalhumor iraq blogs blogging
( May 02 2006, 09:27:30 PM PDT ) Permalink View blog reactions
Sunday April 30, 2006 I have developed a great deal of respect for those who do fund raising full time as a profession, it's a tough business. The Happy Valley Odyssey of the Mind teams are trying to raise money to send themselves (the kids) and their coaches to the World Finals and, so far, it's been tough moving that along. With basically three weeks left before the big trip to Ames, Iowa, the thermometer still has quite a ways to go. If you can't donate today, how about linking to their site? Sure links won't pay the bills directly but if getting the word out means that someone who can help with the bills will find out about it, maybe it can help indirectly.
Put a badge on your site with this code
:
A study cited by the pros found that donors say they have more money than time. In this case, the teams are putting in all of the time (that's the point of Odyssey of the Mind, it's all of the kids' creativity and intellect applied to problem solving); now they just need to pay some bills. If you can't donate cash and donating your time won't impact their endeavor, what can you do? Donate attention! OK, admittedly badges aren't the most attractive things, but you can take this one down after the World competition. So for the month of May, if you can't send money, send 'em some links!
fundraising odysseyofthemind ames iowa fund raising
( Apr 30 2006, 09:42:34 PM PDT ) Permalink View blog reactions
Friday April 28, 2006 There are so many weird and wonderful things on the big search services, you need cheat sheets to keep track of the specialized types of search that they provide. The Yahoo! Shortcuts page has a bunch o' tricks for searching Yahoo! The Google Cheat Sheet has coverage on the search operators and parameters that can be fed to their query systems. Well, we don't have crib notes or hacks books about us (yet) at Technorati, but we're working on being that cool, too <g>
( Apr 28 2006, 10:51:05 PM PDT ) Permalink View blog reactions
Thursday April 27, 2006 There are lots of ways to ping Technorati. Your blogging platform may do it already. You could use a slick editing tool like ecto that will do it for you. You can even roll it yourself in c-sharp. But however you do it, it's important that you let Technorati include you in the distributed conversation by notifying that you've posted.
Recently, there have been some problems with Ping-o-Matic that I worked with Matt to unravel. If you use a ping relaying service and are having difficulty getting indexed by Technorati, please ping directly! Of course Technorati will continue working with ping relayers such as Ping-o-Matic, Pingoat and so forth; they're providing a valuable service to the blogging communities that we are grateful for. However, when in doubt take the direct route!
More soon to come on the Technorati Weblog!
technorati ping pingomatic pingoat ecto blogging
( Apr 27 2006, 11:27:31 PM PDT ) Permalink View blog reactions
Tuesday April 25, 2006 Since Niall mentioned GData, I've been meaning to look into it further. Today Otis mentioned that one of the Apache/Summer of Code projects proposed is a lucene-based GData server implementation.
I took a look at the docs and realized that this is actually a really old spec, as old as the epoch as a matter of fact. Check it out:

But seriously folks, the G-man and his crue have done a fine job providing client implementations (as long as your not waiting on Ruby or one of the P-languages, no perl, python or php yet). Even a nice set of examples for the Java implementation. Thanks, G!
google gdata lucene summerofcode ruby perl python php
( Apr 25 2006, 08:17:45 PM PDT ) Permalink View blog reactions
Over the last year and a half, I've spoken extensively to friends, colleagues and audiences about web spam. At eTech I showed how spam blogs behave statistically atypical, as soon as you start looking at the publishing characteristics (such as linking and posting rates), the spam comes percolating up to the top. For instance, this chart is a sample of linked-to domains from blogs hosted by Google's blogspot service (the y-axis is in thousands of links per day):

The highlighted domains are sites that the spammers are trying to put in front of mouse cursors by making them look important. Besides being a nuisance, this is part of a larger hazard to the whole web advertising market.
While Technology Evangelist pointed out Google News' role in the spam ecosystem, Niall has previously noted that Google provides lots of tools for perpetrators of web spam to employ. AdSense is the prime object and subject of spam. A video posted yesterday on YouTube details the anatomy of some typical AdSense abuse:
There has been a recent explosion in abuse of AdWords, Google's PPC (pay per click) advertising platform. It is apparent that the techniques do not follow Google's own clear guidelines. Unfortunately, users and legitimate advertisers pay the price, while Google and the unscrupulous advertisers profit.Web spam isn't new, this has been going on for years and at this point, one must conclude that Google's not serious about doing anything about their spam problems. There used to be a qualification step for AdSense and they'd reject sites that didn't meet some basic criteria as legitimate sources of content. But that stopped about a year and a half ago and the spam has been pouring on ever since. The video is a call to action to complain to Google; they have a fidicuary responsibility to move against the abuse. The video shows a search for forklifts to illustrate rampant ad policy violations. The video narration asks, "Why is google allowing this to happen?" and answers
Google has always put its users first and one would hope that they will continue to abide by their charter. My aim with this video is to urge them to do something about these issues sooner rather than later.
This is the dirty little secret about these tactics: Google profits from every click on it's network. Be it on ads that are clicked on google.com or ads that are clicked on websites that are running google ads throuth the adsense program. This is a very difficult thing for them to self police because doing something about it will effect their short term profits.I'm not condemning AdSense per se, it's a great service and revitalized web advertising after the flame-out a few years ago. But black-hat SEO's have definitely cranked up the game over the last year or so and are putting the whole market at risk again. Google may not be particularly motivated to go out and find the abuse but they have to act against it when you bring it to their attention. Watch this video:
web spam splog google spam adsense adwords splogs
( Apr 25 2006, 12:28:25 PM PDT ) Permalink View blog reactions
Monday April 24, 2006 A few months ago, I mused that we should be able to abandon FastCGI (with extreme prejudice) and use AJP13 with Ruby on Rails instead. Well, unbeknownst to me at the time, someone was hatching just such a plot, the Ruby/AJP Project! I'd heard last month that David Andersen was tinkering on installing it... well, he not only got it online but he blogged how he did it. Take a look at his compile time and run time configuration details using Apache 2.2's native AJP13 protocol plugin for mod_proxy (i.e. no mod_jk, good riddance), it's really cool! Way to go, David!
rubyonrails ruby ajp13 apache mod_proxy mod_jk
( Apr 24 2006, 08:48:30 PM PDT ) Permalink View blog reactions
Sunday April 23, 2006 Just for giggles, I fiddled around with the Google Maps API on the Happy Valley Odyssey of the Mind blog by putting up a map of Ames, Iowa, where the Odyssey of the Mind World Finals take place.
I tried setting polylines to outline the perimeter of the Iowa State University campus but it seems like the Google Maps API is pretty brittle; if you get it wrong, there's no debugging apparent (I checked the Javascript console), you just get a blank map. I'll have to poke at it a bit more some other day. Doing the basic stuff is easy though, check out How to add a Google Map to any web page.
Be sure to stop by the Happy Valley Odyssey of the Mind blog and make a donation, it's a great cause!
googlemaps odysseyofthemind ames iowa
( Apr 23 2006, 07:36:08 PM PDT ) Permalink View blog reactions
Saturday April 22, 2006 If there's anything to be said for the innovations in the tools of creation and distribution of our present day, it's that contemporary political humor has gotten so much funnier!
Thanks, Adam!Sitting on my own brain, waiting for the end of days
Corporation profits, Bloody oil money
I'm above the law and I'll decide what's right or wrongI am the egg head, I'm the Commander, I'm the Decider
(check it out)
Koo-Koo-Kachoo
humor political humor bush beatles parody decider
( Apr 22 2006, 05:26:39 PM PDT ) Permalink View blog reactions
Friday April 21, 2006 The Java backlash that began a few years ago was mostly a J2EE backlash, not against the Java language per se. Too many people took the blueprints too seriously, too literally or just too damned religiously. Too many applications that didn't need EJBs were using them, letting the container manage low level application plumbing invited slow and buggy behaviors that were painful to debug. The backlash has made a lot Perl/Python/PHP enthusiasts express self-righteous vindication and have helped morph the J2EE backlash into a broader Java backlash. Geez, even IBM is getting all spun up on PHP, whodathunk? But I think the dismissal of Java is premature. None of the P languages or Java are without hazards. These days a lot of developers are over the blueprint kool-aid and are standardizing on a simplified and productive stack:
To really bring rapid development and prototyping to a Java environment, there a lot options to look at such as dynamic JVM languages:
I expect in the months ahead to be writing applications with plugin support and that the big win for the dynamic JVM languages for me will be in easing the rapid development of plugins. In other words, I probably wouldn't write an end to end application with them but given a set of interfaces for extension points that can be automatically tested, writing the extensions in JRuby or Groovy sounds compelling.
I actually haven't had time and opportunity to substantially try half the things I've mentioned thus far. Surveying the number of tools, languages and frameworks it's clear that there are a lot of things to consider and that a lot people are concerned with (and working hard on) bringing the down the high ceremony of Java. I'll still be using P languages in the future, too. Down the road, I suspect virtual machines (JVM? parrot? mono/CLR?) will make a lot of these issues fade away and the questions at hand will be around when to use closures and when to use objects, when to annotate and when to externally declare, when to explicitly type or auto-type and so forth. The languages will be incidental as they support shared constructs and virtual machines.
java rubyonrails groovy jruby jython maven eclipse xdoclet springframework programming
( Apr 21 2006, 09:03:42 PM PDT ) Permalink View blog reactions
Thursday April 20, 2006 I've been wondering how Lego will maintain a business around Mindstorms and at last, I think we have an answer: they'll hop on ye olde cluetrain. By enabling the community of Mindstorms enthusiasts to drive innovation openly, I finally feel confident that the Mindstorms technology will enjoy long term viability. From the Gizmodo post:
Jon Lund took some time out from liveblogging the CustomerMade conference in Copenhagen to email in and tell us that according to Soren Lund of Lego, the software behind the upcoming highly anticipated Mindstorms NXT will be published as open source; Lego is currently in the last stage, figuring out which public domain license to use before releasing it. Power to the people! (read on)The dreaded EOL'ing scenario, such as that suffered by the Sony Aibo, would have been a really crappy outcome for Mindstorms. Instead, they're innovating and opening up. Thanks, Lego! Oh, and one hting: BSD/Apache style licenses, please!
lego mindstorms opensource sony aibo
( Apr 20 2006, 03:25:53 PM PDT ) Permalink View blog reactionsI suspect I'll be opting to casual carpool more often with BART eliminating free parking at the Contra Costa county stations I frequent. That could raise my already-not-inconsequential commute costs 15%. And how timely. Gas is already exceeding $3/gallon and the chatter on the radio is to expect $4/gallon! In that scenario, I wouldn't be surprised to see carpool drivers putting a cup out for the riders as their fuel prices put them in the squeeze. Meanwhile, the Big Oil Companies are ringing in record profits...
One possible ray of light for the BART ride option is the report of WiMax service coming:
WiMAX is similar to WiFi but can carry signals across greater distances. WiMax is also being considered by Silicon Valley public transportation officials (free registration) who want to let passengers browse the Internet on local train systems like BART. They want to run a test from July to December. WiMAX, they believe, might be a better technology to do hand-offs as the train rushes through various wireless coverage zones. read on(via burtonator)
Maybe the price hikes will help them pay for a software test harness; BART's bugs have rendered the system unusable in the past.
( Apr 20 2006, 07:19:57 AM PDT ) Permalink View blog reactions
Wednesday April 19, 2006 This is totally amazing! With all of the gazillions of dollars and BTU's of hot air poured out over "homeland security," here comes Marc Ecko laughing in the face of the beast by tagging (as in, the spray paint kind, not folksonomy) Air Force One!
Coming next: "Mark Ecko In Gitmo"
airforceone graffiti homelandsecurity humor tagging
( Apr 19 2006, 09:29:21 AM PDT ) Permalink View blog reactionsHere comes a changing of the guard at the War House: McClellan Out as White House Press Secretary. Now I just wonder what the final parting words from Dubya to exiting Press Secretary Scott McClellan will be, let's see:
politics bush scottmcclellan whitehouse
( Apr 19 2006, 06:43:50 AM PDT ) Permalink View blog reactions
Tuesday April 18, 2006 I posted last month about how winning feels good. With the thrill of victory comes a new challenge: what's next? Well, my daughter's team's second place showing in the Odyssey of the Mind regionals was followed up by first place in the State competition, so next up: the World!
The World competition is in Ames, Iowa. I've never been there. I've flown over Iowa plenty of times, traveling to and from Chicago. But the corn fields, cattle corrals and pig pokes of Iowa ... will be a new to me. All told, we're running up thousands of dollars to pull this off but I'm sure for the kids this will be one of life's great experiences, so it's all worth it. I have an alter-ego running a separate blog to track that endeavor and our challenges. We've got paypal links to accept donations (tax deductible, even) but simply talking about and linking to that blog will help, so please shine a little light on us.
( Apr 18 2006, 05:20:20 PM PDT ) Permalink View blog reactions
Tuesday April 11, 2006 Today, like many days, the phrase "user generated content" left my lips in the course of conversation. It's a habit. OK, maybe it's a bad habit. Since Tim Bray posted about his hatred for the label, I've been increasingly self-conscious about using those words. I agree, it's laden with exploitative connotations. Derek Powazek adeptly decomposed the nastiness further. Yes, not long ago editorial, movie editing, audio mixing and other tools of creation were only accessible to the pros. Yes, the burst of creativity that has accompanied the mass-amateurization of media of all kinds begs for an improvement of the vernacular. However, Scott Rosenberg, lamenting the absence of a credible replacement, reminds us that content from the pro's still has value (Seymour Hersh didn't blog the latest plan of attack, now did he?). Breaking habits often requires conscious adoption of an alternative. So, what? People Contributed Media? Individual Creations? Actually, I'm more intrigued by "user distributed content" but maybe I'll post about that later and then I'll have to wring my hands over a better name for it.
( Apr 11 2006, 09:57:02 PM PDT ) Permalink View blog reactionsI've always been fond of public libraries, they can be a great resource and it's so... non-web (2.0 or 1.0). I've used 'em for youth sports coaching materials, current non-fiction, jazz CDs and such. They can also be real funny; sometimes they have contemporary titles but other times there's no hope in getting what you want without going out to the bookstore to buy it. I did a search for a title, "What Would Buddha Do at Work" on Contra Costa Library catalog and the first hit was for What would Buffy do? : the vampire slayer as spiritual guide. Buh!
( Apr 11 2006, 10:52:37 AM PDT ) Permalink View blog reactions
Saturday March 25, 2006 A common question from newcomers to the blogosphere is "how do I get my blog read?" There are all kinds of ways gather attention to yourself but there seems to be a set of best practices that come out whenever this topic comes up. So here's a rough swipe at a blogosphere visibility FAQ.
Take a look at the top 10 blogs and you'll notice that many of them post dozens of times a day.No, you don't have to be that prolific but if you have something interesting to say, say it early and often. On the other hand, don't prattle. Frequent posters who talk about nothing aren't doing themselves a favor in the over all data stream.
technorati feeds linking ping blogging tagging validators ecto endo
( Mar 25 2006, 09:26:08 AM PST ) Permalink View blog reactions
Thursday March 16, 2006 Years ago I thought AvantGo was sooo cool. I'd sync up my Palm Pilot, get on the bus and read the web sites I'd subscribed to. Ah, how times have changed. Lately, I'd been using Sage to read feeds in Firefox but the interface has always seemed inconvenient and Firefox is kinda slow and leaky under Mac OS X.
Last week I installed Endo (brought to you by the maker of ecto). While I was a little thrown by the way the feed group bar shifts focus, my feed reading has definitely been enhanced. The floating window notifications when it's updating are cool. The way it shows post tags right at the top (under the blog post title) is also very nice. I could imagine improving the feed focus on the left hand side (I'm hitting the scroll bar too much). I should be able to reorder the feeds so I can read them in order of importance (or if Endo calc'd importance on the fly by watching which feeds I go to first and mapped that against their update rates, better). But really no substantial complaints. It integrates nicely with ecto and has hooks to integrate with mail and chat applications but my favorite thing about Endo is: its cache! I spend a few hours everyday commuting, reading feeds on while I'm on the go is great! I basically left the laptop load up feeds before I hit the road and catch up on stuff while in transit.
Thanks Ado!
( Mar 16 2006, 01:14:00 PM PST ) Permalink View blog reactions
Sunday March 12, 2006 A few victories to report:
technorati odysseyofthemind soccer sxsw
( Mar 12 2006, 09:12:32 PM PST ) Permalink View blog reactions