Wednesday September 27, 2006
Colonel Jessup has assumed control of Newsweek:
Ignorance is bliss
How meta:
See ya at the gulag.
media newsweek ministry of truth afghanistan taliban bush
( Sep 27 2006, 04:16:31 PM PDT ) Permalink View blog reactions
Tuesday September 26, 2006 At today's Intel Developer Forum, Google is presenting a paper that argues that the power supply standards that are built into today's PCs are anachronistic, inefficient and costly. With the maturing of the PC industry and horizontal scaling becoming a standard practice in data center deployments, it's time to say good-bye to these standards from the 1980's.
John Markoff reported in the NY Times today
The Google white paper argues that the opportunity for power savings is immense, by deploying the new power supplies in 100 million desktop PC's running eight hours a day, it will be possible to save 40 billion kilowatt-hours over three years, or more than $5 billion at California's energy rates.Nice to see Google taking leadership on the inefficiencies of the PC commodity hardware architectures. ( Sep 26 2006, 06:02:09 PM PDT ) Permalink View blog reactions
Google to Push for More Electrical Efficiency in PCs
Monday September 25, 2006 The other week I reflected on the scaling-web-2.0 theme of the The Future of Web Apps workshop. Another major theme there was how social software is different, how transformative architectures of participation are. There was one talk that stood out from Tom Coates, Greater than the sum of its parts. A few days ago the slides were posted; I poked through 'em since and they jogged some memories loose, I thought I'd share Tom's message, late though it is, and embellish with my spin.
Tom's basic thesis is that social software enables us to do "more together than we could apart" by "enhancing our social and collaborative abilities through structured mediation." Thinking about that, isn't web 1.0 about structured mediation? Centralized services, editors & producers, editorial staff & workflow, bean counting eyeballs, customer relationship management, demographic surveys and all of that crap? Yes, but what's different is that web 2.0 structured mediation is about bare sufficiency in that it's better to have too little than too much, the software should get out of the way of the user, make him/her a participant, not lead him/her around by the nose.
Next, Tom highlighted that valuable social software should serve
Citing The Success of Open Source , he likened social software participants motivations to this ranked list of open source contributor's motivations
Here are some social software "best practices":
futureofwebapps-sf06 social software virtual community
( Sep 25 2006, 10:24:40 AM PDT ) Permalink View blog reactions
Thursday September 21, 2006 I mused about people-powered topic classification for blogs after playing with the Google Image Labeller the other week. It seems like a doable feature for Technorati because the incentives to game topic classification are low.
That same week, Rafe posed a question about community driven spam classification:
Why couldn't Blogger or Six Apart or a firm like Technorati add all of the new blogs they register to a queue to be examined using Amazon's Mechanical Turk service? I'd love to see someone at least do an experiment in this vein. The only catch is that you'd want to have each blog checked more than once to prevent spiteful reviewers from disqualifying blogs that they didn't agree with.The catch indeed is that the incentive is high for a system like this to be gamed. Shortly after blogger implemented their flag, spammers
(read the rest)
"Bloggerbowling" - the practice of having robots flag multiple random blogs as splogs regardless of content to degrade the accuracy of the policing service.As previously cited from Cory, all complex ecosystems have parasites. So I've been thinking about what it would take to do this effectively, what would it take overcome the blogosphere's parasites bloggerbowling efforts? The things that come to mind for any system of community policing are about rewards and obstacles. For example
At the end of the day, I don't have the answers. But I think Rafe, Doc and so many others concerned with splog proliferation are asking great questions. Technorati is currently keeping a tremendous volume of spam out of its search results but, at the end of the day, there's still much to do. And this post is the end of my day, today.
spam splog splogs technorati virtual community blogs web spam
( Sep 21 2006, 11:06:22 PM PDT ) Permalink View blog reactions
Wednesday September 13, 2006 A few weeks ago, Adam mentioned some of the shuffling going on at Technorati's data centers. Yep, we've had our share of operational instability lately, when you have systems that expect consistent network topologies and that has to change, I suppose these things will happen. It seems a common theme I keep hearing in conversations and presentations about web based services: the growing pains.
This morning, Kevin Rose discussed The digg story: from one idea to nine million page views at The Future of Web Apps workshop. Digg has had to overcome a lot of the "normal" problems (MySQL concurrency, data set growth, etc) that growing web services face and have turned to some of the usual remedies, rethinking the data constructs (they hired DBA's) and memcached. This afternoon, Tantek was in fine form discussing web development practices with microformats where he announced updates to the search system Technorati's been cooking, again a growth induced revision. Shortly thereafter, I enjoyed the stats and facts that Steve Olechowski presented in his 10 things you didn't know about RSS talk. And so it goes, this evening it was Feedburner having an episode. "me" time -- heh, know how ya feel <g>
While Feedburner gets "me" time, Flickr gets massages when they have system troubles. Speaking of Flickr, I'm looking forward to Cal Henderson's talk, Taking Flickr from Beta to Gamma at tomorrow's session of The Future of Web Apps. I caught a bit of Scaling Fast and Cheap - How We Built Flickr last spring, Cal knows the business. I've been meaning to check out his book, Building Scalable Web Sites.
Perhaps everybody needs a therapeutic message for the times of choppy seas. When Technorati hurts, it just seems to hurt. Should it be getting meditation and tiger balm (hrm, smelly)? Some tickling and laughter (don't operate heavy machinery)? Animal petting (could be smelly)? Aromatherapy (definitely smelly)? Data center feng shui? Gregorian chants? R.E.M. samples?
futureofwebapps-sf06 palaceoffinearts flickr feedburner digg technorati microformats memcached
( Sep 13 2006, 09:26:42 PM PDT ) Permalink View blog reactions
Monday September 04, 2006
Hey, I'm in Wired! The current Wired has an article about blog spam by Charles Mann that includes a little bit of my conversation with him. Spam + Blogs = Trouble covers a lot of the issues facing blog publishers (and in a broader sense, user generated content participant created artifacts in general). There are some particular challenges faced by services like Technorati that index these goods in real time; not only must our indices have very fast cycles, so must our abilities to keep the junk out. I was in good company amongst Mann's sources, he talked to a variety of folks from many sides of the blog spam problem: Dave Sifry, Jason Goldman, Anil Dash, Matt Mullenweg, Natalie Glance and even some blog spam perps.
I've also had a lot of conversations with Doc lately about blog spam and the problems he's been having with kleptotorial. A University of Maryland study of December 2005 pings on weblogs.com determined that 75% of the pings are spam AKA spings. By excluding the non-English speaking blogosphere and not taking into account the large portions of the blogosphere that don't ping weblogs.com, that study ignored a larger blogosphere but overall, that assessment of the ping stream coming from weblogs.com seemed pretty accurate. As Dave reported last month, by last July we were finding over 70% of the pings coming into Technorati to be spam.
Technorati has deployed a number of anti-spam measures (such as targetting specific Blogger profiles, as Mitesh Vasa has. Of coures there's more that we've done but if I told you I'd have to kill you, sorry). There are popular theories in circulation on how to combat web spam involving blacklists of URLs and text analysis but those are just little pieces of the picture. Of the things I've seen from the anti-splog crusader websites, I think the fighting splog blog has hit one of the key vulnerabilities of splogs: they're just in it to get paid. So, hit 'em in the wallet. In particular, splog fighter's (who is that masked ranger?) targetting of AdSense's Terms of Service violators sounds most promising. Of course, there's more to blog spam than AdSense, Blogger and pings. The thing gnawing at me about all of these measures is their reactiveness. The web is a living organism of events, the tactics to keeping trashy intrusions out should be event driven too.
Intrusion detection is a proven tool in the computer security practice. System changes are a distrurbance in the force, significant events that should trigger attention. Number one in the list of The Six Dumbest Ideas in Computer Security is "Default Permit." I remember the days when you'd take a host out of the box from Sun or SGI (uh, who?) and it would come up in "rape me" mode. Accounts with default passwords, vulnerability laden printing daemons, rsh, telnet and FTP (this continued even long after the arrival of ssh and scp), all kinds of superfluous services in /etc/inetd.conf and so on. The first order of business was to "lock down" the host by overlaying a sensible configuration. The focus on selling big iron (well, bigger than a PC) into the enterprise prevented vendors from seeing the bigger opportunity in internet computing and the web. And so reads the epitaph of old-school Unix vendors (well, in Sun's case Jonathan Schwartz clearly gets it -- reckoning with the "adapt or die" options, he's made the obvious choice). Those of us building public facing internet services had to take the raw materials from the vendor and "fix them". The Unix vendors really blew it in so many ways, it's really too bad. The open source alternatives weren't necessarily doing it better, even the Linux distros of the day had a lot of stupid defaults. The BSD's did a better job but, unless you were Yahoo! or running an ISP, BSD didn't matter (well, I used FreeBSD very successfully in 90's but then I do things differently). Turning on access to everything but keeping out the bad guys by selectively reacting to vulnerabilities is an unwinnable game. When it comes to security matters, the power of defaults can be the harbinger of doom.
The "Default Deny" approach is to explicitly prescribe what services to turn on. It's the obvious, sensible approach to putting hosts on a public network. By having very tightly defined criteria for what packets are allowed to pass, watching for adversarial connections is greatly simplified. I've been thinking a lot about how this could be applied to providing services such as web search while also keeping the bad guys (web spammers) out.
Amongst web indexers, the big search services try to cast the widest net to achieve the broadest coverage. Remember the mine is bigger than yours flap? Search indices seemingly follow a Default Permit policy. On the other extreme from "try to index everything" is "only index the things that I prescribe." This "size isn't everything" response is seen in services like Rollyo. You can even use Alexa Web Search Platform to cobble your own index. But unlike the case of computer security stances, with web search you want opportunities for serendipity; searching within a narrowly prescribed subset of the web greatly limits those opportunities. Administratively managed Default Deny policies will only get you so far. I suspect in the future effective web indexing is going to require more detailed classification, a Default Deny with algorithmic qualification to allow. Publishers will have to earn their way into the search indices through good behavior.
The blogosphere has thrived on openness and ease of entry but indeed, all complex ecosystems have parasites. So, while we're grateful to be in a successful ecosystem, we'd all agree that we have to be vigilant about keeping things tidy. The junk that the bad guys want to inject into the update stream has to be filtered out. I think the key to successful web indexing is to cast a wide net , keep tightly defined criteria for deciding what gets in and to use event driven qualification to match the criteria. The attention hi-jackers need to be suppressed and the content that would be misappropriated has to be respected. This can be done by deciding that whatever doesn't meet the criteria for indexing, should be kept out. Not that we have to bid adieu to the yellow brick road of real time open content but perhaps we do have to setup checkpoints and rough up the hooligans who soil the vistas.
spam web spam splog splogs adsense technorati wired
( Sep 04 2006, 11:10:15 PM PDT ) Permalink View blog reactions
Saturday September 02, 2006 I spent way too much time last night giving Google some free labor. The Google Image Labeler is kinda fun, in a peculiar way. In 90 second stretches that AJAX-ishly links you to someone else out there in the ether, you are shown images and a text box to enter tags ("labels" is apparently Google's preferred term, whatever). Each time you get a match with your anonymous partner, you get 100 points. The points are like the ones on Whose Line Is It Anyway, they don't matter. And yet it was strangely fun. The most I ever got in any one 90 second session was 300 points. Network latency was the biggest constraint, sometimes Google's image loading was slow. Also, the images are way too small on my Powerbook ... this is the kinda thing you want a Cinema Display for (the holidays are coming, now you know what to get me).
So what if Technorati did this? Suppose you and some anonymous cohort could be simultaneously shown a blog post and tag it. Most blogging platforms these days support categories. But there are a lot of blog posts out there that might benefit from further categorization. Author's are already tagging their posts and blog readers can already tag their favorite blogs but enabling an ESP game with blog posts sounds like an intriguing way to refine categorization of blogs and posts.
tagging esp game google image labeler mechanical turk
( Sep 02 2006, 12:31:26 PM PDT ) Permalink View blog reactions
Wednesday August 30, 2006 Last week I was in Albuquerque for some family time and relaxation. It was truly wonderful to see the desert in full bloom; the monsoonal flow of weather coming up from the Gulf of Mexico this time of year has brushed the whole landscape with lovely shades of green. The weather was mild, the raspberries lucious and abundant and, though the trout weren't biting, the rivers roared beautifully; it was really great. No, I didn't accept payola from the New Mexico visitors bureau, really, my gushing is legit.
Anyway, I also took the opportunity to do some geeky oogling at the Eclipse Aviation facility in Albuquerque. I'm not normally an airplane nerd but last Friday, I was. What interested me about this company is that they are producing a truly disruptive technology. Commercial aviation and metropolitan airports are high ceremony affairs; security lines, taking off your shoes and taking out your laptop, finding the right carousel to get your luggage... and praying that it shows up there in tact. The Eclipse jets will commoditize high altitude cruising in a pressurized cabin at speeds that aren't too far behind the big boys (and twice the speed of propeller planes) and do so at a price point on par with the cost of many single family homes in the San Francisco Bay Area. What will that mean for you? What would it mean to you if 2 to 3 hour rides up and down the west coast of the United States are cheap and abundant? If a two day drive or commercial jetliner and airport rigamarole can be replaced with fast, low ceremony travel, then the world gets a lot smaller again. That would mean a lot to me! Welcome to reality, the smaller world where commodization is good.
Eclipse is a newcomer, a startup (I know bit about disruptive startups) in the aviation industry. As you'd expect, they're doing things differently. Which isn't a surprise given founder Vern Raburn's pedigree. Raburn is an early Microsoft and Lotus guy, has been a pilot since he was teenager and has a passion for innovation (along with some cash to throw behind it). Eclipse's friction stir welding process for joining the aluminum shell results in a light but strong hull (without relying on composites); most planes are pieced together with rivets. Eclipse has extensive IT infrastructure that provides flight plans, collects metrics on the planes while they're in flight, detecting component failures and poised to assist from a state-of-the-art operations center. The avionics are displayed on redundant touch screens and the controls are vastly simplified over what you find in traditional aircraft. Here are some specs on the Eclipse 500:
So, do the math and this works out a lot cheaper than flying most piston engine planes (costs per hour may be lower but you're in the air twice as long with those). OK, I admit I don't have one and half mill to drop for one of these babies (however, I have aspirations to be a "qualified buyer") but still, the potential to bring this kind of travel within easy reach is at hand. Even if you don't buy one of them, using one like you'd use a cab seems like huge improvement over current modes of air travel. On your next trip, as you endure the TSA confiscating that toothpaste you forgot was in our carry-on luggage, imagine jet travel that operates more like a car service, like a cab. DayJet is going to provide exactly that using fleets of Eclipse jets. On the factory floor, I saw a few DayJet-logo'd planes getting prepped for delivery. Apparently a gaggle of "air taxi" services similar to DayJet are in the works, they'll also be powered by fleets of Eclipse 500's. We'll embark on the era of very light jets (VLJ), when the first customers start taking delivery of their aircraft within the next month. This may not be Kitty Hawk but I do think this will be rank high in the list of significant aviation events.
aviation jets civil aviation eclipse aviation disruptive technologies
( Aug 30 2006, 09:09:06 PM PDT ) Permalink View blog reactions
Tuesday August 29, 2006 In this corner: Doc is going to attack kleptotorial splogs by employing cleaner living through better licensing (a creative commons flavor). And in this corner: Elliott Back says he is a victim. He has been slammed by Scoble (and Scoble was gracious enough to apologize). I have no sympathy for Elliott Back. Sure, he's just the gun maker, not the shooter. But weapon makers producing wares without safeties get sued for negligence. Basically, any tool that programmatically harvests and posts other people's feeds should at least have the common decency to not ping. If you re-inject something into the update stream that you've appropriated from someone else, you're scamming the update stream. This isn't about quoting or citing, this is about fraudulent pings, "I've updated my blog (nevermind the fact it's with OPP)" -- keep your feed harvesting to yourself, please.
( Aug 29 2006, 09:51:57 AM PDT ) Permalink View blog reactions
Monday August 28, 2006 The MySQL query cache has rarely been of much use to me since it's a pretty much just an optimization for read-heavy data. Furthermore, if you have a pool of query hosts (e.g. you're using MySQL replication to provide a pool of slaves to select from), each with its own query cache in a local silo, there's no "network effect" of benefitting from a shared cache. MySQL's heap tables are a neat trick for keeping tabular data in RAM but they don't work well for large data sets and suffer from the same siloization as the query cache. The standard solution for this case is to use memcached as an object cache. The elevator pitch for memcached: it's a thin distributed hash table in local RAM stores accessible by a very lightweight network protocol and bereft of the featuritus that might make it slow; response times for reads ands writes to memcached data stores typical clock in at single digits of milliseconds.
RDBMS-based caches are often a glorified hash table; a primary key'd column and value column. Using an RDBMS as a cache works but it's kinda overkill; you're not using the "R" in RDBMS. Anyway, transacting with a disk based storage engine that's concerned with ACID bookkeeping isn't an efficient cache. MySQL has the peculiar property of supporting pluggable storage backends. MyISAM, InnoDB and HEAP backends are the most commonly used ones. Today, Brian Aker (of Slashdot and MySQL AB fame) announced his first cut release of his memcache_engine backend.
Here's Brian's example usage:
mysql> INSTALL PLUGIN memcache SONAME 'libmemcache_engine.so' ; create table foo1 (k varchar(128) NOT NULL, val blob, primary key(k)) ENGINE=memcache CONNECTION='localhost:6666';
mysql> insert into foo1 VALUES ("mine", "This is my dog");
Query OK, 1 row affected (0.01 sec)
mysql> select * from foo1 WHERE k="mine";
+------+----------------+
| k | val |
+------+----------------+
| mine | This is my dog |
+------+----------------+
1 row in set (0.01 sec)
mysql> delete from foo1 WHERE k="mine";
Query OK, 1 row affected (0.00 sec)
mysql> select * from foo1 WHERE k="mine";
Empty set (0.01 sec)
Brian's release is labelled a pre-alpha, some limitations apply, your milage my vary, prices do not include taxes, customs or agriculture inspection fees.
Sunday August 27, 2006 When I wrote about OSCON last month, I mentioned Perrin Harkins's session on Low Maintenance Perl, which was a nice review of the do's and don'ts of programming with Perl, I really didn't dig into the substance of his session. Citing Andy Hunt (from Practices of an Agile Developer):
When developing code you should always choose readability over convenience. Code will be read many, many more times than it is written. (see book site)Perrin enumerated a lot of the basic rules of engagement for coding Perl that doesn't suck. Some of the do's and don'ts highlights:
package Foo;
sub new {
my $class = shift;
my $data = shift || {};
return bless $data, $class;
}
package main;
my $foo = Foo->new;
print ref $foo, "\n";
bless $foo, 'Bar';
print ref $foo, "\n";
For the non-Perl readers, create an instance of Foo ($foo), then change it to an instance of Bar, printing out the class names as you go. The output is:
Foo BarAnyone caught doing this will certainly come back as a two headed cyclops in the next life.
I've been trying to increase my python craftiness lately. I first used python about 10 years ago (1996) at GameSpot, we used it for our homebrewed ad rotation system. I fiddled with python some more at Salon as part of the maintenance of our ultraseek search system. But basically, python has always looked weird to me and I've avoided doing anything substantial with it. Well, my interest in it is renewed because there is a substantial amount of legacy code that I'm presently eyeballing and, anyway, I'm very intrigued by JVM scripting languages such as Jython (and JRuby). I'm looking for a best-of-both-worlds environment, things-are-what-you-expect static typing and compile time checking on the one hand and rapid development on the other. I was really astonished to learn that chameleon class assignment like Perl's is supported by Python. Python is strongly typed in that you have to explicitly cast and coerce to change types (very unlike Perl's squishy contextual operators which does a lot of implicit magic). But Python is also dynamically typed, an object's type is a runtime assignment. This is gross:
class Foo:
def print_type(self):
print self.__class__
class Bar:
def print_type(self):
print self.__class__
if __name__ == "__main__":
foo = Foo();
foo.print_type();
foo.__class__ = Bar
foo.print_type();
In English, create an instance of Foo (foo), then change it to an instance of Bar, printing out the class names as you go. The output is:
__main__.Foo __main__.Bar(Python prefices the class name with the current namespace, __main__) Anyone caught doing this will certainly come back as a reptilian jackalope in the next life.
Of course, Java doesn't tolerate any of these shenanigans. Compile time complaints of "what, are you crazy?!" would surely come hither from javac. There's no setClass(Class):void method in java.lang.Object, thank goodness, even though there is getClass():Class. One of the key characteristics of a language's usefulness for agile development has to be its minimalization of astonishing results, quirky idioms and here-have-some-more-rope-to-hang-yourself behaviors. If you can't read your own code from last month without puzzling over it, how the hell are you going to refactor it quickly and easily next month? Will your collaborators have an easier time with it? Perl has rightly acquired the reputation of a "write once, puzzle forevermore" language. I haven't dug into whether Ruby permits runtime object type changing (that would be really disappointing). I'll dig into that next, clearly the rails developers emphasis on convention and configuration over code is aimed at reducing the surprises that coders can cook up. But that doesn't necessarily carry back to Ruby itself.
python perl ruby java code agile programming jython jruby oscon oscon06
( Aug 27 2006, 08:51:11 AM PDT ) Permalink View blog reactions
Wednesday August 02, 2006 ...just once when a passenger wearing too much perfume or cologne boards the metro, it would prompt the driver (who would be Samuel L. Jackson) to stand up, turn to the passengers and demand, "Get those mother effin' stinks off this mother effin' train!"
Perhaps for once I'd get my money's worth from Muni.
san francisco muni samuel l jackson stinks on a train
( Aug 02 2006, 09:46:20 AM PDT ) Permalink View blog reactionsSam Ruby's Teenagers on the go slide deck is an interesting prognosis on the future impact of the protocols, formats and form factors in our midst on publishing, sharing and participating on the web.
( Aug 02 2006, 07:13:37 AM PDT ) Permalink View blog reactions
Sunday July 30, 2006 Since the universally understood (at least among the intelligentsia) descriptor user generated content continues to nag at people (Tim raised it again during his OSCON session) and the alternatives have been difficult to pin down (was Tim suggesting people contributed experiences?), it's my caffeinated Sunday morning aspiration to consider the alternatives.
Having a label is important, we're making a distinction between published artifacts that are developed by editors and/or paid staff and the stuff created by Normals who are contributing the artifacts of their creative process to the web. Yes, the term user is definitely sterile, generated too mechanical and content seems so... vacuous. Does participant created artifacts work as a descriptor for all of the photos we're uploading, blog posts we're posting and so forth?
( Jul 30 2006, 08:44:40 AM PDT ) Permalink View blog reactions
Saturday July 29, 2006 Had a great time at OSCON! Besides the previously noted keynotes and sessions, my faves were Perrin Harkins' Low-Maintenance Perl (a good discussion of best practices in Perl as the simple practices, Perl as the sole domain of wizards is so old school), Moazam Raja's Troubleshooting the JVM and the Applications That Run Within It (a good survey of the built in runtime diagnostics available for java), Tim Bray's The Atom Publishing Protocol as Universal Web Glue (a good example of using vi and curl for bare metal wire protocol demos as well as how slow JRuby's start-up time is!) and Damian Conway's Friday keynote was suitably humorous! If there was anything that I wish I coulda rearranged it was the time slots when there were more than one session I wanted to be in. But the timeslots with nothing interesting going on were good opportunities for hallway conversations; which are often the most important activities at these events, so I won't complain vigorously.
Enjoyed hanging out Friday afternoon for "OSCON decompression" at Urban Grind ("Coffee should be black as night, hot as hell, and strong as love.") with James, David, Josh, David and Scott. Heh, I got PostGIS running on my powerbook, which gave me something to play with on the flight home!
Portland is a really nice town, the Disneyland-like lightrail system (complete with automaton announcements in english and espanol), the neighborhood ambiance, the surrounding greenery... I dig it. For next year's OSCON trip, I'll be bringing the family along!
( Jul 29 2006, 11:20:55 AM PDT ) Permalink View blog reactions
Thursday July 27, 2006 At Greg Stein's talk, A Google Service for the Open Source Community, he outlined how Google's following up on it's Summer of Code project with a new contribution to open source. No, it's not a dating service or personal trainer service for geeks (that'll be Google's 2007 contribution). And no, it's not source code search (does Krugle have that covered?). This is project hosting on Google Code Project Hosting.
Yes, there is nothing new about project hosting; there's long been things like Source Forge, Tigris, java.net and so forth. Things like Sourceforge are doing a great job, but there are strengthes that Google has that could be brought to bear on the project hosting space. Some of the unique features of Google's project hosting that Greg cited are
Creating a new hosted project is simple enough, you fill out a form with the project name, a summary description, a full description, select a license (Apache, Artistic + GPL, GPL v2, LGPL, BSD, MIT and Mozilla are choices ... dual licensing is not permitted) and apply some labels (tags). If you don't have one yet, a subversion password is created for you (it's *not* your gmail password). Your project will have a tabbed interface for the main page, issues, browsing the soruce and an administrative page. Project creators and administrators must use a GMail account. If not using GMail, bug reporters must have some Google account (Picasa, Groups, etc). The "Issues" screen provides a tabular view of bugs, the columns are ajax enabled for parameterization. The neat thing is that instead of using a big form with tons of check boxes and selectors, the issue tracking uses query expressions to refine issue search results. Status field for a bug can be free text; while a static vocabulary is defined and selectable in an ajax drop down the vocabulary is unconstrained. Status isn't the only metadata that's open-ended, instead of having "release version", "milestone", "component", etc the system uses labels. The issue list column repertoire is adaptable so that you can select labels you've defined as listing criteria. All of the open endedness may be an invitation to pandemoneum but the focus is on having the user interface make it easy for the user to do the right thing.
Some of the administratively defined aspects of a project include the issue creation template (defines the prompts that issue creators will see), project links, project discussion groups (using Google Groups), project blogs and activity notification email addresses. The system will support issue tracking feeds. Most of the metadata that will be visible on the project summary page that newcomers to the project will see.
There's currently no "tarball download" service and integration with other Google services is in the works. For the time being, any downloads made available must be done within the limit of the quotas on your subversion repository (100 MB). Plans for importing and exporting, creating APIs and so forth are underway (the issue tracking seems like a natural fit for Atom and Atom Publishing Protocol).
Congrats to Greg and the Google Code team on a great launch!
( Jul 27 2006, 03:46:06 PM PDT ) Permalink View blog reactions
Wednesday July 26, 2006 I missed the first keynotes (I just arrived in time for Tim O'Reilly's "what technologies are hot according to these slices on the data" bit that he does) but enjoyed Greenplum's Scott Yara talk, School of Rock. He highlighted the parallels of open source development and rock and roll. I'll paraphrase his points.
Open source, like rock and roll, has flourished simply because people enjoyed it. Like rock and roll, money has jumped into open source and an industry has swelled around it. Like rock and roll, open source threatens the establishment but also mutually coopts and becomes the establishment. Yara showed a funny "twins separated at birth?" photo pairing of Rick Rubin and Richard Stallman! What will sustain open source's integrity (like rock and roll's) are the intangibles, the real emotions and inspirations the drive innovation. The popularity game isn't a measure of quality... just because it's widely downloaded doesn't mean it's good just as Britney Spears' and N'Sync's sales success aren't validations of "good" music. So, beware of the vogue of open source, people are starting to believe that open source is better but don't let that undermine what's important. For those who are building their business on open source, go for the $$$ but keep your integrity. At that point Yara ran a little excert of Metallica goofing on a radio promo production (from Some Kind of Monster?), the ironies of choosing them as illustrations of how money changes everything, given how they coopted and have become the music establishment, were high humor for me. Nonetheless, Metallica like a lot of successful open source software projects have succeeded by being a little dangerous, by being genuine and not bothering with the constraints of the legacy establishment.
Anil Dash gave a talk about Trying to Suck Less: Making Web 2.0 Mean Something basically outlining that beyond the technology stack (i.e. LAMP), there are higher level tools that developers can employ to suck less (yep, I confess, at Technorati when we can't quite kick the butt that we aspire to, we focus on sucking less). Citing the technologies that have grown out of SixApart's software plumbing, he highlighted that all successful Web 2.0 compnaies are using load balancing, messaging, caching, filesystems and other scalability and performance platform components. In SixApart's case, perlbal, memcached, mogilefs and djabberd are the core technologies that they build on ... and, so the pitch goes, should you if you want to suck less.
Those the high points of the morning (so far).
( Jul 26 2006, 10:00:31 AM PDT ) Permalink View blog reactions
Tuesday July 25, 2006 In case you hadn't heard, we've had a lot of things cooking at Technorati. Besides the engaging new look, the new features and the complete overhaul of URL search and link counts, we've been making great strides in our blog spam mitigation (you wouldn't believe the stuff we catch ... and the shear quantity of it!), our internal caching and messaging infrastructure and our data center network. Of course, there's still much to do but we've been heads down on it; if you haven't checked us out lately I think you'll find that our efforts to improve the front end, the back end and all of the cogs and pullies in between have been moving forward.
I'm really proud of the team I work with at Technorati! If you'd like to join the team, we have a lot of innovation ahead. Grab me this week at OSCON and tell me about how you'd like to materialize the real time web! I'll also be moderating a Microformats BOF, this will be a good opportunity to talk about the implementations for producing and consuming microformats. See ya in Portland!
( Jul 25 2006, 10:37:36 PM PDT ) Permalink View blog reactions
Saturday May 06, 2006 Blog publishing services typically propagate updates about new posts from blogs (ergo, new blogs too) by pinging or publishing a changes.xml file. But what none of the services provide is an "un-ping" -- blog indexing services such as Technorati don't know when a blog has been deleted from a service. I noticed this today when I found http://blogtrarian.blogspot.com/ participating in a link farm infesting Blogger's service. This can happen because Google's Blogger recycles URLs; when a blog is removed from the system, the URL is freed for reuse.
That particular URL is one that dates back to 2004, it was dormant for several months but just came to life recently with spam. The historic posts (until August 2005) look like normal blogging fare but the recent posts are clearly just splog content. We'll have to work on "un-pinging" so it's easier to distinguish dormant blogs and dead ones.
spam splog web spam google blogger ping
( May 06 2006, 03:13:14 PM PDT ) Permalink View blog reactions
Friday May 05, 2006 So Google's CEO Eric Schmidt says his servers are full, hmm. Tying that to SEO'ers griping about their indexing, Andrew Orlowski speculates that it's web spam besetting big daddy. Could be but the hard data isn't out in the wild. The numbers that we can see are that Google is spending several banana republics worth of GDP on capital expenses:
Google continued to make substantial capital investments, mainly in computer servers, networking equipment and its data centers. It spent $345 million on such items in the first quarter, more than double the level of last year. Yahoo, its closest rival, spent $142 million on capital expenses in the first quarter.
Referring to the sheer volume of Web site information, video and e-mail that Google's servers hold, Schmidt said: "Those machines are full. We have a huge machine crisis." (read more)
If the problem is spam, then certainly it's Google's own doing. The elephant in the room is that the acceleration of web spam everyone's talking about is fueled by AdSense, often aided and abetted by Blogger splogs, Google Pages, Google Base, etc. The spam ecosystem is within Google's capacity to reign in but the don't-be-evil company is making too much money on click fraud with plausible deniability to do anything about it. Is Google having problems handling web spam and "filling up" their machines? Cry me a river, all the way to the bank.
spam google adsense splog web spam
( May 05 2006, 02:09:19 PM PDT ) Permalink View blog reactions
Thursday May 04, 2006 When I read the words on
Microsoft yesterday reached a tentative $70 million deal to settle a California class-action antitrust lawsuit, according to a statement by the law firm representing the plaintiffs in the suit.at http://www.satishlive.info/?p=27 I had the distinct sense of deja-vu. So I ran some queries against Technorati's index and sho-nuf, I found the exact same content had already been published by InfoWorld. Ah, there was an attribution at the bottom... but InfoWorld didn't publish under a creative commons license. Looks like blatant theft.
Then I checked the next post (http://www.satishlive.info/?p=28) on that blog and read:
I took a new blog search tool called Sphere for a little spin this morning and found it useful.... hey, didn't I just see that somewhere else? Yep, this time it was PC World and no attribution.
It's safe to surmise that this is kleptotorial laden with AdSense and stuffed into the update stream. I've seen screenscrapes and feedscrapes on splogs before but they're usually easier to identify visually, I had to look more carefully at this to note its spamminess. Is there a market in alerting publishers to copyright infringement? Obviously this stuff should be removed from Technorati's index but is there a more valuable service to publishers that should be provided here? How much would you pay to find out about misappropriations of your content? Is there a market for Technorati to do something like Plagiarism.org to fingerprint blog content?
splog creativecommons copyright spam creative commons plagiarism adsense
( May 04 2006, 09:34:17 PM PDT ) Permalink View blog reactions
Tuesday May 02, 2006
The chatter (even art work on flickr) about it is frantic. Thank You Stephen Colbert has 700 links right now (this is a blog that came into being less than 72 hours ago), it's getting about five or ten links per hour at the moment. The videos are the most linked-to youtube reels on Technorati. How wonderful it is to have an administration that is so bad, the opportunities for high humor are so many. Why did we invade Iraq?
stephencolbert colbert flickr youtube technorati bush cspan whitehousecorrespondentsdinner colbertreport comedycentral politicalhumor iraq blogs blogging
( May 02 2006, 09:27:30 PM PDT ) Permalink View blog reactions
Sunday April 30, 2006 I have developed a great deal of respect for those who do fund raising full time as a profession, it's a tough business. The Happy Valley Odyssey of the Mind teams are trying to raise money to send themselves (the kids) and their coaches to the World Finals and, so far, it's been tough moving that along. With basically three weeks left before the big trip to Ames, Iowa, the thermometer still has quite a ways to go. If you can't donate today, how about linking to their site? Sure links won't pay the bills directly but if getting the word out means that someone who can help with the bills will find out about it, maybe it can help indirectly.
Put a badge on your site with this code
:
A study cited by the pros found that donors say they have more money than time. In this case, the teams are putting in all of the time (that's the point of Odyssey of the Mind, it's all of the kids' creativity and intellect applied to problem solving); now they just need to pay some bills. If you can't donate cash and donating your time won't impact their endeavor, what can you do? Donate attention! OK, admittedly badges aren't the most attractive things, but you can take this one down after the World competition. So for the month of May, if you can't send money, send 'em some links!
fundraising odysseyofthemind ames iowa fund raising
( Apr 30 2006, 09:42:34 PM PDT ) Permalink View blog reactions
Friday April 28, 2006 There are so many weird and wonderful things on the big search services, you need cheat sheets to keep track of the specialized types of search that they provide. The Yahoo! Shortcuts page has a bunch o' tricks for searching Yahoo! The Google Cheat Sheet has coverage on the search operators and parameters that can be fed to their query systems. Well, we don't have crib notes or hacks books about us (yet) at Technorati, but we're working on being that cool, too <g>
( Apr 28 2006, 10:51:05 PM PDT ) Permalink View blog reactions
Thursday April 27, 2006 There are lots of ways to ping Technorati. Your blogging platform may do it already. You could use a slick editing tool like ecto that will do it for you. You can even roll it yourself in c-sharp. But however you do it, it's important that you let Technorati include you in the distributed conversation by notifying that you've posted.
Recently, there have been some problems with Ping-o-Matic that I worked with Matt to unravel. If you use a ping relaying service and are having difficulty getting indexed by Technorati, please ping directly! Of course Technorati will continue working with ping relayers such as Ping-o-Matic, Pingoat and so forth; they're providing a valuable service to the blogging communities that we are grateful for. However, when in doubt take the direct route!
More soon to come on the Technorati Weblog!
technorati ping pingomatic pingoat ecto blogging
( Apr 27 2006, 11:27:31 PM PDT ) Permalink View blog reactions