Thursday October 19, 2006 As I announced on the Technorati Weblog, we rolled out support for blog claiming with OpenID. I'm really proud of the work that Chris and the team have done to make this a reality. If you're not familiar with OpenID, here is one good place to start. Sure, I'm well aware of the concerns about phishy user interface vulnerabilities. The idea of logging in without a password may seem weird.
One weird thing, for new users, is that instead of logging into an OpenID-using site (like Zooomr) with a user name and password, you just give it your personal OpenID URL -- and no password. Then your browser pops over to your authenticating site (like myopenid.com) to verify that you want to use your persona on the new site. This is bound to initially confuse people, and since users may not be asked for a password, it can also appear to be less secure, although it is not.Frankly, I'm not certain what the best resolutions are for those concerns. However I'm more comfortable with adopting OpenID "as-is" and evolving as the technology advances then sitting around waiting for it to be perfected. Welcome to now.
ZDNet: OpenID has a potential cure for Website password overload - Rafe Needleman
Distributed identity ideas have been gestating for a long time while identity cathedrals have been built and fallen. If your blog is your voice, your URL can be your identity.
( Oct 19 2006, 11:42:04 PM PDT ) Permalink View blog reactions
Whenever I look at page to page, post to post, blog to blog and domain to domain relationship statistics (and permutations across them) interesting things often emerge. Microsoft's Live Search recently released a linkfromdomain operator that can help dig into these linking relationships. For instance, linkfromdomain:arachna.com ruby returns the pages that I've linked to that have ruby in the text. Combined with the site operator, I can do a search of the pages I've linked to on Technorati with linkfromdomain:arachna.com site:technorati.com.
Looks like the blogosphere is noticing, within the last two days Technorati has seen 57 links to the linkfromdomain announcement blog post. Kudos to MSN's search team for a cool innovation.
One apparent problem with their crawls is javascript/flash-plugin handling, the site:youtube.com linkfromdomain:technorati.com SERP shows pages referenced from Technorati's most linked-to YouTube videos, however all of the SERP items have the text
Hello, you either have JavaScript turned off or an old version of Macromedia's Flash Player. Click here to get the latest flash player.heh!
search livesearch msn technorati
( Oct 19 2006, 06:56:16 AM PDT ) Permalink View blog reactions
Wednesday October 18, 2006
This was on NASA's Astronomy Picture Of The Day site a few days ago, I haven't been able to close the browser tab with it... I just keep gazing at the surreality of it.
In the shadow of Saturn, unexpected wonders appear. The robotic Cassini spacecraft now orbiting Saturn recently drifted in giant planet's shadow for about 12 hours and looked back toward the eclipsed Sun. Cassini saw a view unlike any other. First, the night side of Saturn is seen to be partly lit by light reflected from its own majestic ring system. read onNASA goes on to explain that the eclipse revealed newly detected strata of rings around Saturn. ( Oct 18 2006, 10:45:16 AM PDT ) Permalink View blog reactions
Tuesday October 17, 2006 Between Google's extensive use of employee shuttles, their green data centers proposal last month and yesterday's announcement Google to Convert HQ to Solar Power, I'm really impressed with the ecologically conscientious initiatives they're taking! Personal note: the solar installation will be led by Energy Innovations, EI president Andrew Beebe is a friend from years ago who I've long lost touch with but I was very pleased to see his name associated to this project.
( Oct 17 2006, 06:52:10 AM PDT ) Permalink View blog reactions
Saturday October 07, 2006
It's broadly appreciated how scaling up is usually driven by business demand, but the requirements for scaling down are rarely as appreciated. Questions about how web 2.0 business scale up abound these days. As the challenges of service growth and business plans stress technical infrastructure, startups try to squeeze everything they can out of their architecture with a number of widely accepted practices. However, scaling considerations for the other direction are oft neglected.
End-to-end testing that doesn't require duplication of production infrastructure is a strategic advantage. I know of a financial analytics system run by a large institution that is untestable. This system has cron jobs, data feeds and query systems built on top of Perl code going back at least a decade. The inputs and outputs are so convoluted, that the system is untestable. So if this code is making the bank that owns it tens of millions of dollars every day (it is!), what's wrong with that? Well, it could be probably be more profitable if it could be changed and optimized safely. As it stands, the folks maintaining the code don't really know what modifications might break the system and with income produced at that scale, who wants to risk it? So look at the systems you're working on now, think about the "scaling up" considerations you've made and ask yourself: Is a system testable in a developer's environment? Can they unit test? Can they perform functional tests? Do the tests require access to resources only available at the data center? Is "now" hardcoded to the present in your code? Using scaled down database, messaging, caching and application runtimes that have no dependencies on a connected network and production infrastructure should be considered up front in your design consideration.
If a system makes assumptions about the process space it runs in that allows for functionality to be accessed from other runtimes, bravo: you may be headed in the right direction of service oriented architecture and horizontal scaling. But can the application stack be collapsed? This is like the OMG-moment when folks first started running J2EE application tiers over remote interfaces and realized that they've ended up with so much complexity and overhead, they have no choice but to scale up. That complexity can have all kinds of expensive side effects with how effectively systems can be triaged when they ail.
Businesses are run be people. People make mistakes. Wetware is imperfect. When you buy a long term commitment to a data center, you may be assuming liabilities that will outlive the business proof. Make sure the hardware footprint you're signing up for is one you can sustain it or you can get out of it. When you build gratuitous tiers, the costs of taking them out when it's time to consolidate functionality can be stifling. So ask yourself: If systems scaled up to meet business objective that aren't met, can you "retreat" from the scale-up offensive?
Every time I see a system that's hard to test, has sysadmins overwhelmed or are not meeting business objectives and has to be reeled in, I'm reminded of the importance of thinking about scaling in both directions. No, I haven't read the book yet but as someone burdened with too much stuff at home, I've got it on my list.
web 2.0 unit testing functional testing technical operations system architecture software
( Oct 07 2006, 03:53:58 PM PDT ) Permalink View blog reactions
Saturday September 30, 2006 I find it really fascinating to see the acceptance of a publishing paradigm that lies in between the micropublishing realm of blogging, posting podcasts and videos and "old school" megapublishing. There are of course magazines; your typical piece in the New Yorker is longer than a blog post but shorter than a traditional book. But there's something else on the spectrum, for lack of a better term I'll call it minipublishing.
If you want to access expertise on a narrow topic, wouldn't it be cool to just get that, nothing more, nothing less? For instance, if you want to learn about the user permissions on Mac OS X, buy Brian Tanaka's Take Control of Permissions in Mac OS X. TidBITS Publishing has a whole catalog of narrowly focused publications that are bigger than a magazine article but smaller than your typical book. O'Reilly has gotten into the act too with their Short Cuts series. You can buy just enough on Using Microformats to get started; for ten bucks you get 45 pages of focused discussion of what microformats are and how to use them. Nothing more, nothing less. That's cool!
What if you could buy books in part or in serial form? Buy the introductory part or a specific chapter, if it seems well written, buy more. Many of us who've bought technical books are familiar with publish bloat, dozens of chapters across hundreds of pages that you buy even though you were probably only interested in a few chapters. Sure, sometimes publishers put a a few teaser chapters online hoping to entice you to buy the whole megilla. Works for me, I've definitely bought books after reading a downloaded PDF chapter. But I'm wondering now about buying just the chapters that I want.
publishing microformats macosx media micropublishing minipublishing
( Sep 30 2006, 07:04:31 PM PDT ) Permalink View blog reactions
Wednesday September 27, 2006
Colonel Jessup has assumed control of Newsweek:
Ignorance is bliss
How meta:
See ya at the gulag.
media newsweek ministry of truth afghanistan taliban bush
( Sep 27 2006, 04:16:31 PM PDT ) Permalink View blog reactions
Tuesday September 26, 2006 At today's Intel Developer Forum, Google is presenting a paper that argues that the power supply standards that are built into today's PCs are anachronistic, inefficient and costly. With the maturing of the PC industry and horizontal scaling becoming a standard practice in data center deployments, it's time to say good-bye to these standards from the 1980's.
John Markoff reported in the NY Times today
The Google white paper argues that the opportunity for power savings is immense, by deploying the new power supplies in 100 million desktop PC's running eight hours a day, it will be possible to save 40 billion kilowatt-hours over three years, or more than $5 billion at California's energy rates.Nice to see Google taking leadership on the inefficiencies of the PC commodity hardware architectures. ( Sep 26 2006, 06:02:09 PM PDT ) Permalink View blog reactions
Google to Push for More Electrical Efficiency in PCs
Monday September 25, 2006 The other week I reflected on the scaling-web-2.0 theme of the The Future of Web Apps workshop. Another major theme there was how social software is different, how transformative architectures of participation are. There was one talk that stood out from Tom Coates, Greater than the sum of its parts. A few days ago the slides were posted; I poked through 'em since and they jogged some memories loose, I thought I'd share Tom's message, late though it is, and embellish with my spin.
Tom's basic thesis is that social software enables us to do "more together than we could apart" by "enhancing our social and collaborative abilities through structured mediation." Thinking about that, isn't web 1.0 about structured mediation? Centralized services, editors & producers, editorial staff & workflow, bean counting eyeballs, customer relationship management, demographic surveys and all of that crap? Yes, but what's different is that web 2.0 structured mediation is about bare sufficiency in that it's better to have too little than too much, the software should get out of the way of the user, make him/her a participant, not lead him/her around by the nose.
Next, Tom highlighted that valuable social software should serve
Citing The Success of Open Source , he likened social software participants motivations to this ranked list of open source contributor's motivations
Here are some social software "best practices":
futureofwebapps-sf06 social software virtual community
( Sep 25 2006, 10:24:40 AM PDT ) Permalink View blog reactions
Thursday September 21, 2006 I mused about people-powered topic classification for blogs after playing with the Google Image Labeller the other week. It seems like a doable feature for Technorati because the incentives to game topic classification are low.
That same week, Rafe posed a question about community driven spam classification:
Why couldn't Blogger or Six Apart or a firm like Technorati add all of the new blogs they register to a queue to be examined using Amazon's Mechanical Turk service? I'd love to see someone at least do an experiment in this vein. The only catch is that you'd want to have each blog checked more than once to prevent spiteful reviewers from disqualifying blogs that they didn't agree with.The catch indeed is that the incentive is high for a system like this to be gamed. Shortly after blogger implemented their flag, spammers
(read the rest)
"Bloggerbowling" - the practice of having robots flag multiple random blogs as splogs regardless of content to degrade the accuracy of the policing service.As previously cited from Cory, all complex ecosystems have parasites. So I've been thinking about what it would take to do this effectively, what would it take overcome the blogosphere's parasites bloggerbowling efforts? The things that come to mind for any system of community policing are about rewards and obstacles. For example
At the end of the day, I don't have the answers. But I think Rafe, Doc and so many others concerned with splog proliferation are asking great questions. Technorati is currently keeping a tremendous volume of spam out of its search results but, at the end of the day, there's still much to do. And this post is the end of my day, today.
spam splog splogs technorati virtual community blogs web spam
( Sep 21 2006, 11:06:22 PM PDT ) Permalink View blog reactions
Wednesday September 13, 2006 A few weeks ago, Adam mentioned some of the shuffling going on at Technorati's data centers. Yep, we've had our share of operational instability lately, when you have systems that expect consistent network topologies and that has to change, I suppose these things will happen. It seems a common theme I keep hearing in conversations and presentations about web based services: the growing pains.
This morning, Kevin Rose discussed The digg story: from one idea to nine million page views at The Future of Web Apps workshop. Digg has had to overcome a lot of the "normal" problems (MySQL concurrency, data set growth, etc) that growing web services face and have turned to some of the usual remedies, rethinking the data constructs (they hired DBA's) and memcached. This afternoon, Tantek was in fine form discussing web development practices with microformats where he announced updates to the search system Technorati's been cooking, again a growth induced revision. Shortly thereafter, I enjoyed the stats and facts that Steve Olechowski presented in his 10 things you didn't know about RSS talk. And so it goes, this evening it was Feedburner having an episode. "me" time -- heh, know how ya feel <g>
While Feedburner gets "me" time, Flickr gets massages when they have system troubles. Speaking of Flickr, I'm looking forward to Cal Henderson's talk, Taking Flickr from Beta to Gamma at tomorrow's session of The Future of Web Apps. I caught a bit of Scaling Fast and Cheap - How We Built Flickr last spring, Cal knows the business. I've been meaning to check out his book, Building Scalable Web Sites.
Perhaps everybody needs a therapeutic message for the times of choppy seas. When Technorati hurts, it just seems to hurt. Should it be getting meditation and tiger balm (hrm, smelly)? Some tickling and laughter (don't operate heavy machinery)? Animal petting (could be smelly)? Aromatherapy (definitely smelly)? Data center feng shui? Gregorian chants? R.E.M. samples?
futureofwebapps-sf06 palaceoffinearts flickr feedburner digg technorati microformats memcached
( Sep 13 2006, 09:26:42 PM PDT ) Permalink View blog reactions
Monday September 04, 2006
Hey, I'm in Wired! The current Wired has an article about blog spam by Charles Mann that includes a little bit of my conversation with him. Spam + Blogs = Trouble covers a lot of the issues facing blog publishers (and in a broader sense, user generated content participant created artifacts in general). There are some particular challenges faced by services like Technorati that index these goods in real time; not only must our indices have very fast cycles, so must our abilities to keep the junk out. I was in good company amongst Mann's sources, he talked to a variety of folks from many sides of the blog spam problem: Dave Sifry, Jason Goldman, Anil Dash, Matt Mullenweg, Natalie Glance and even some blog spam perps.
I've also had a lot of conversations with Doc lately about blog spam and the problems he's been having with kleptotorial. A University of Maryland study of December 2005 pings on weblogs.com determined that 75% of the pings are spam AKA spings. By excluding the non-English speaking blogosphere and not taking into account the large portions of the blogosphere that don't ping weblogs.com, that study ignored a larger blogosphere but overall, that assessment of the ping stream coming from weblogs.com seemed pretty accurate. As Dave reported last month, by last July we were finding over 70% of the pings coming into Technorati to be spam.
Technorati has deployed a number of anti-spam measures (such as targetting specific Blogger profiles, as Mitesh Vasa has. Of coures there's more that we've done but if I told you I'd have to kill you, sorry). There are popular theories in circulation on how to combat web spam involving blacklists of URLs and text analysis but those are just little pieces of the picture. Of the things I've seen from the anti-splog crusader websites, I think the fighting splog blog has hit one of the key vulnerabilities of splogs: they're just in it to get paid. So, hit 'em in the wallet. In particular, splog fighter's (who is that masked ranger?) targetting of AdSense's Terms of Service violators sounds most promising. Of course, there's more to blog spam than AdSense, Blogger and pings. The thing gnawing at me about all of these measures is their reactiveness. The web is a living organism of events, the tactics to keeping trashy intrusions out should be event driven too.
Intrusion detection is a proven tool in the computer security practice. System changes are a distrurbance in the force, significant events that should trigger attention. Number one in the list of The Six Dumbest Ideas in Computer Security is "Default Permit." I remember the days when you'd take a host out of the box from Sun or SGI (uh, who?) and it would come up in "rape me" mode. Accounts with default passwords, vulnerability laden printing daemons, rsh, telnet and FTP (this continued even long after the arrival of ssh and scp), all kinds of superfluous services in /etc/inetd.conf and so on. The first order of business was to "lock down" the host by overlaying a sensible configuration. The focus on selling big iron (well, bigger than a PC) into the enterprise prevented vendors from seeing the bigger opportunity in internet computing and the web. And so reads the epitaph of old-school Unix vendors (well, in Sun's case Jonathan Schwartz clearly gets it -- reckoning with the "adapt or die" options, he's made the obvious choice). Those of us building public facing internet services had to take the raw materials from the vendor and "fix them". The Unix vendors really blew it in so many ways, it's really too bad. The open source alternatives weren't necessarily doing it better, even the Linux distros of the day had a lot of stupid defaults. The BSD's did a better job but, unless you were Yahoo! or running an ISP, BSD didn't matter (well, I used FreeBSD very successfully in 90's but then I do things differently). Turning on access to everything but keeping out the bad guys by selectively reacting to vulnerabilities is an unwinnable game. When it comes to security matters, the power of defaults can be the harbinger of doom.
The "Default Deny" approach is to explicitly prescribe what services to turn on. It's the obvious, sensible approach to putting hosts on a public network. By having very tightly defined criteria for what packets are allowed to pass, watching for adversarial connections is greatly simplified. I've been thinking a lot about how this could be applied to providing services such as web search while also keeping the bad guys (web spammers) out.
Amongst web indexers, the big search services try to cast the widest net to achieve the broadest coverage. Remember the mine is bigger than yours flap? Search indices seemingly follow a Default Permit policy. On the other extreme from "try to index everything" is "only index the things that I prescribe." This "size isn't everything" response is seen in services like Rollyo. You can even use Alexa Web Search Platform to cobble your own index. But unlike the case of computer security stances, with web search you want opportunities for serendipity; searching within a narrowly prescribed subset of the web greatly limits those opportunities. Administratively managed Default Deny policies will only get you so far. I suspect in the future effective web indexing is going to require more detailed classification, a Default Deny with algorithmic qualification to allow. Publishers will have to earn their way into the search indices through good behavior.
The blogosphere has thrived on openness and ease of entry but indeed, all complex ecosystems have parasites. So, while we're grateful to be in a successful ecosystem, we'd all agree that we have to be vigilant about keeping things tidy. The junk that the bad guys want to inject into the update stream has to be filtered out. I think the key to successful web indexing is to cast a wide net , keep tightly defined criteria for deciding what gets in and to use event driven qualification to match the criteria. The attention hi-jackers need to be suppressed and the content that would be misappropriated has to be respected. This can be done by deciding that whatever doesn't meet the criteria for indexing, should be kept out. Not that we have to bid adieu to the yellow brick road of real time open content but perhaps we do have to setup checkpoints and rough up the hooligans who soil the vistas.
spam web spam splog splogs adsense technorati wired
( Sep 04 2006, 11:10:15 PM PDT ) Permalink View blog reactions
Saturday September 02, 2006 I spent way too much time last night giving Google some free labor. The Google Image Labeler is kinda fun, in a peculiar way. In 90 second stretches that AJAX-ishly links you to someone else out there in the ether, you are shown images and a text box to enter tags ("labels" is apparently Google's preferred term, whatever). Each time you get a match with your anonymous partner, you get 100 points. The points are like the ones on Whose Line Is It Anyway, they don't matter. And yet it was strangely fun. The most I ever got in any one 90 second session was 300 points. Network latency was the biggest constraint, sometimes Google's image loading was slow. Also, the images are way too small on my Powerbook ... this is the kinda thing you want a Cinema Display for (the holidays are coming, now you know what to get me).
So what if Technorati did this? Suppose you and some anonymous cohort could be simultaneously shown a blog post and tag it. Most blogging platforms these days support categories. But there are a lot of blog posts out there that might benefit from further categorization. Author's are already tagging their posts and blog readers can already tag their favorite blogs but enabling an ESP game with blog posts sounds like an intriguing way to refine categorization of blogs and posts.
tagging esp game google image labeler mechanical turk
( Sep 02 2006, 12:31:26 PM PDT ) Permalink View blog reactions
Wednesday August 30, 2006 Last week I was in Albuquerque for some family time and relaxation. It was truly wonderful to see the desert in full bloom; the monsoonal flow of weather coming up from the Gulf of Mexico this time of year has brushed the whole landscape with lovely shades of green. The weather was mild, the raspberries lucious and abundant and, though the trout weren't biting, the rivers roared beautifully; it was really great. No, I didn't accept payola from the New Mexico visitors bureau, really, my gushing is legit.
Anyway, I also took the opportunity to do some geeky oogling at the Eclipse Aviation facility in Albuquerque. I'm not normally an airplane nerd but last Friday, I was. What interested me about this company is that they are producing a truly disruptive technology. Commercial aviation and metropolitan airports are high ceremony affairs; security lines, taking off your shoes and taking out your laptop, finding the right carousel to get your luggage... and praying that it shows up there in tact. The Eclipse jets will commoditize high altitude cruising in a pressurized cabin at speeds that aren't too far behind the big boys (and twice the speed of propeller planes) and do so at a price point on par with the cost of many single family homes in the San Francisco Bay Area. What will that mean for you? What would it mean to you if 2 to 3 hour rides up and down the west coast of the United States are cheap and abundant? If a two day drive or commercial jetliner and airport rigamarole can be replaced with fast, low ceremony travel, then the world gets a lot smaller again. That would mean a lot to me! Welcome to reality, the smaller world where commodization is good.
Eclipse is a newcomer, a startup (I know bit about disruptive startups) in the aviation industry. As you'd expect, they're doing things differently. Which isn't a surprise given founder Vern Raburn's pedigree. Raburn is an early Microsoft and Lotus guy, has been a pilot since he was teenager and has a passion for innovation (along with some cash to throw behind it). Eclipse's friction stir welding process for joining the aluminum shell results in a light but strong hull (without relying on composites); most planes are pieced together with rivets. Eclipse has extensive IT infrastructure that provides flight plans, collects metrics on the planes while they're in flight, detecting component failures and poised to assist from a state-of-the-art operations center. The avionics are displayed on redundant touch screens and the controls are vastly simplified over what you find in traditional aircraft. Here are some specs on the Eclipse 500:
So, do the math and this works out a lot cheaper than flying most piston engine planes (costs per hour may be lower but you're in the air twice as long with those). OK, I admit I don't have one and half mill to drop for one of these babies (however, I have aspirations to be a "qualified buyer") but still, the potential to bring this kind of travel within easy reach is at hand. Even if you don't buy one of them, using one like you'd use a cab seems like huge improvement over current modes of air travel. On your next trip, as you endure the TSA confiscating that toothpaste you forgot was in our carry-on luggage, imagine jet travel that operates more like a car service, like a cab. DayJet is going to provide exactly that using fleets of Eclipse jets. On the factory floor, I saw a few DayJet-logo'd planes getting prepped for delivery. Apparently a gaggle of "air taxi" services similar to DayJet are in the works, they'll also be powered by fleets of Eclipse 500's. We'll embark on the era of very light jets (VLJ), when the first customers start taking delivery of their aircraft within the next month. This may not be Kitty Hawk but I do think this will be rank high in the list of significant aviation events.
aviation jets civil aviation eclipse aviation disruptive technologies
( Aug 30 2006, 09:09:06 PM PDT ) Permalink View blog reactions
Tuesday August 29, 2006 In this corner: Doc is going to attack kleptotorial splogs by employing cleaner living through better licensing (a creative commons flavor). And in this corner: Elliott Back says he is a victim. He has been slammed by Scoble (and Scoble was gracious enough to apologize). I have no sympathy for Elliott Back. Sure, he's just the gun maker, not the shooter. But weapon makers producing wares without safeties get sued for negligence. Basically, any tool that programmatically harvests and posts other people's feeds should at least have the common decency to not ping. If you re-inject something into the update stream that you've appropriated from someone else, you're scamming the update stream. This isn't about quoting or citing, this is about fraudulent pings, "I've updated my blog (nevermind the fact it's with OPP)" -- keep your feed harvesting to yourself, please.
( Aug 29 2006, 09:51:57 AM PDT ) Permalink View blog reactions
Monday August 28, 2006 The MySQL query cache has rarely been of much use to me since it's a pretty much just an optimization for read-heavy data. Furthermore, if you have a pool of query hosts (e.g. you're using MySQL replication to provide a pool of slaves to select from), each with its own query cache in a local silo, there's no "network effect" of benefitting from a shared cache. MySQL's heap tables are a neat trick for keeping tabular data in RAM but they don't work well for large data sets and suffer from the same siloization as the query cache. The standard solution for this case is to use memcached as an object cache. The elevator pitch for memcached: it's a thin distributed hash table in local RAM stores accessible by a very lightweight network protocol and bereft of the featuritus that might make it slow; response times for reads ands writes to memcached data stores typical clock in at single digits of milliseconds.
RDBMS-based caches are often a glorified hash table; a primary key'd column and value column. Using an RDBMS as a cache works but it's kinda overkill; you're not using the "R" in RDBMS. Anyway, transacting with a disk based storage engine that's concerned with ACID bookkeeping isn't an efficient cache. MySQL has the peculiar property of supporting pluggable storage backends. MyISAM, InnoDB and HEAP backends are the most commonly used ones. Today, Brian Aker (of Slashdot and MySQL AB fame) announced his first cut release of his memcache_engine backend.
Here's Brian's example usage:
mysql> INSTALL PLUGIN memcache SONAME 'libmemcache_engine.so' ; create table foo1 (k varchar(128) NOT NULL, val blob, primary key(k)) ENGINE=memcache CONNECTION='localhost:6666';
mysql> insert into foo1 VALUES ("mine", "This is my dog");
Query OK, 1 row affected (0.01 sec)
mysql> select * from foo1 WHERE k="mine";
+------+----------------+
| k | val |
+------+----------------+
| mine | This is my dog |
+------+----------------+
1 row in set (0.01 sec)
mysql> delete from foo1 WHERE k="mine";
Query OK, 1 row affected (0.00 sec)
mysql> select * from foo1 WHERE k="mine";
Empty set (0.01 sec)
Brian's release is labelled a pre-alpha, some limitations apply, your milage my vary, prices do not include taxes, customs or agriculture inspection fees.
Sunday August 27, 2006 When I wrote about OSCON last month, I mentioned Perrin Harkins's session on Low Maintenance Perl, which was a nice review of the do's and don'ts of programming with Perl, I really didn't dig into the substance of his session. Citing Andy Hunt (from Practices of an Agile Developer):
When developing code you should always choose readability over convenience. Code will be read many, many more times than it is written. (see book site)Perrin enumerated a lot of the basic rules of engagement for coding Perl that doesn't suck. Some of the do's and don'ts highlights:
package Foo;
sub new {
my $class = shift;
my $data = shift || {};
return bless $data, $class;
}
package main;
my $foo = Foo->new;
print ref $foo, "\n";
bless $foo, 'Bar';
print ref $foo, "\n";
For the non-Perl readers, create an instance of Foo ($foo), then change it to an instance of Bar, printing out the class names as you go. The output is:
Foo BarAnyone caught doing this will certainly come back as a two headed cyclops in the next life.
I've been trying to increase my python craftiness lately. I first used python about 10 years ago (1996) at GameSpot, we used it for our homebrewed ad rotation system. I fiddled with python some more at Salon as part of the maintenance of our ultraseek search system. But basically, python has always looked weird to me and I've avoided doing anything substantial with it. Well, my interest in it is renewed because there is a substantial amount of legacy code that I'm presently eyeballing and, anyway, I'm very intrigued by JVM scripting languages such as Jython (and JRuby). I'm looking for a best-of-both-worlds environment, things-are-what-you-expect static typing and compile time checking on the one hand and rapid development on the other. I was really astonished to learn that chameleon class assignment like Perl's is supported by Python. Python is strongly typed in that you have to explicitly cast and coerce to change types (very unlike Perl's squishy contextual operators which does a lot of implicit magic). But Python is also dynamically typed, an object's type is a runtime assignment. This is gross:
class Foo:
def print_type(self):
print self.__class__
class Bar:
def print_type(self):
print self.__class__
if __name__ == "__main__":
foo = Foo();
foo.print_type();
foo.__class__ = Bar
foo.print_type();
In English, create an instance of Foo (foo), then change it to an instance of Bar, printing out the class names as you go. The output is:
__main__.Foo __main__.Bar(Python prefices the class name with the current namespace, __main__) Anyone caught doing this will certainly come back as a reptilian jackalope in the next life.
Of course, Java doesn't tolerate any of these shenanigans. Compile time complaints of "what, are you crazy?!" would surely come hither from javac. There's no setClass(Class):void method in java.lang.Object, thank goodness, even though there is getClass():Class. One of the key characteristics of a language's usefulness for agile development has to be its minimalization of astonishing results, quirky idioms and here-have-some-more-rope-to-hang-yourself behaviors. If you can't read your own code from last month without puzzling over it, how the hell are you going to refactor it quickly and easily next month? Will your collaborators have an easier time with it? Perl has rightly acquired the reputation of a "write once, puzzle forevermore" language. I haven't dug into whether Ruby permits runtime object type changing (that would be really disappointing). I'll dig into that next, clearly the rails developers emphasis on convention and configuration over code is aimed at reducing the surprises that coders can cook up. But that doesn't necessarily carry back to Ruby itself.
python perl ruby java code agile programming jython jruby oscon oscon06
( Aug 27 2006, 08:51:11 AM PDT ) Permalink View blog reactions
Wednesday August 02, 2006 ...just once when a passenger wearing too much perfume or cologne boards the metro, it would prompt the driver (who would be Samuel L. Jackson) to stand up, turn to the passengers and demand, "Get those mother effin' stinks off this mother effin' train!"
Perhaps for once I'd get my money's worth from Muni.
san francisco muni samuel l jackson stinks on a train
( Aug 02 2006, 09:46:20 AM PDT ) Permalink View blog reactionsSam Ruby's Teenagers on the go slide deck is an interesting prognosis on the future impact of the protocols, formats and form factors in our midst on publishing, sharing and participating on the web.
( Aug 02 2006, 07:13:37 AM PDT ) Permalink View blog reactions
Sunday July 30, 2006 Since the universally understood (at least among the intelligentsia) descriptor user generated content continues to nag at people (Tim raised it again during his OSCON session) and the alternatives have been difficult to pin down (was Tim suggesting people contributed experiences?), it's my caffeinated Sunday morning aspiration to consider the alternatives.
Having a label is important, we're making a distinction between published artifacts that are developed by editors and/or paid staff and the stuff created by Normals who are contributing the artifacts of their creative process to the web. Yes, the term user is definitely sterile, generated too mechanical and content seems so... vacuous. Does participant created artifacts work as a descriptor for all of the photos we're uploading, blog posts we're posting and so forth?
( Jul 30 2006, 08:44:40 AM PDT ) Permalink View blog reactions
Saturday July 29, 2006 Had a great time at OSCON! Besides the previously noted keynotes and sessions, my faves were Perrin Harkins' Low-Maintenance Perl (a good discussion of best practices in Perl as the simple practices, Perl as the sole domain of wizards is so old school), Moazam Raja's Troubleshooting the JVM and the Applications That Run Within It (a good survey of the built in runtime diagnostics available for java), Tim Bray's The Atom Publishing Protocol as Universal Web Glue (a good example of using vi and curl for bare metal wire protocol demos as well as how slow JRuby's start-up time is!) and Damian Conway's Friday keynote was suitably humorous! If there was anything that I wish I coulda rearranged it was the time slots when there were more than one session I wanted to be in. But the timeslots with nothing interesting going on were good opportunities for hallway conversations; which are often the most important activities at these events, so I won't complain vigorously.
Enjoyed hanging out Friday afternoon for "OSCON decompression" at Urban Grind ("Coffee should be black as night, hot as hell, and strong as love.") with James, David, Josh, David and Scott. Heh, I got PostGIS running on my powerbook, which gave me something to play with on the flight home!
Portland is a really nice town, the Disneyland-like lightrail system (complete with automaton announcements in english and espanol), the neighborhood ambiance, the surrounding greenery... I dig it. For next year's OSCON trip, I'll be bringing the family along!
( Jul 29 2006, 11:20:55 AM PDT ) Permalink View blog reactions
Thursday July 27, 2006 At Greg Stein's talk, A Google Service for the Open Source Community, he outlined how Google's following up on it's Summer of Code project with a new contribution to open source. No, it's not a dating service or personal trainer service for geeks (that'll be Google's 2007 contribution). And no, it's not source code search (does Krugle have that covered?). This is project hosting on Google Code Project Hosting.
Yes, there is nothing new about project hosting; there's long been things like Source Forge, Tigris, java.net and so forth. Things like Sourceforge are doing a great job, but there are strengthes that Google has that could be brought to bear on the project hosting space. Some of the unique features of Google's project hosting that Greg cited are
Creating a new hosted project is simple enough, you fill out a form with the project name, a summary description, a full description, select a license (Apache, Artistic + GPL, GPL v2, LGPL, BSD, MIT and Mozilla are choices ... dual licensing is not permitted) and apply some labels (tags). If you don't have one yet, a subversion password is created for you (it's *not* your gmail password). Your project will have a tabbed interface for the main page, issues, browsing the soruce and an administrative page. Project creators and administrators must use a GMail account. If not using GMail, bug reporters must have some Google account (Picasa, Groups, etc). The "Issues" screen provides a tabular view of bugs, the columns are ajax enabled for parameterization. The neat thing is that instead of using a big form with tons of check boxes and selectors, the issue tracking uses query expressions to refine issue search results. Status field for a bug can be free text; while a static vocabulary is defined and selectable in an ajax drop down the vocabulary is unconstrained. Status isn't the only metadata that's open-ended, instead of having "release version", "milestone", "component", etc the system uses labels. The issue list column repertoire is adaptable so that you can select labels you've defined as listing criteria. All of the open endedness may be an invitation to pandemoneum but the focus is on having the user interface make it easy for the user to do the right thing.
Some of the administratively defined aspects of a project include the issue creation template (defines the prompts that issue creators will see), project links, project discussion groups (using Google Groups), project blogs and activity notification email addresses. The system will support issue tracking feeds. Most of the metadata that will be visible on the project summary page that newcomers to the project will see.
There's currently no "tarball download" service and integration with other Google services is in the works. For the time being, any downloads made available must be done within the limit of the quotas on your subversion repository (100 MB). Plans for importing and exporting, creating APIs and so forth are underway (the issue tracking seems like a natural fit for Atom and Atom Publishing Protocol).
Congrats to Greg and the Google Code team on a great launch!
( Jul 27 2006, 03:46:06 PM PDT ) Permalink View blog reactions
Wednesday July 26, 2006 I missed the first keynotes (I just arrived in time for Tim O'Reilly's "what technologies are hot according to these slices on the data" bit that he does) but enjoyed Greenplum's Scott Yara talk, School of Rock. He highlighted the parallels of open source development and rock and roll. I'll paraphrase his points.
Open source, like rock and roll, has flourished simply because people enjoyed it. Like rock and roll, money has jumped into open source and an industry has swelled around it. Like rock and roll, open source threatens the establishment but also mutually coopts and becomes the establishment. Yara showed a funny "twins separated at birth?" photo pairing of Rick Rubin and Richard Stallman! What will sustain open source's integrity (like rock and roll's) are the intangibles, the real emotions and inspirations the drive innovation. The popularity game isn't a measure of quality... just because it's widely downloaded doesn't mean it's good just as Britney Spears' and N'Sync's sales success aren't validations of "good" music. So, beware of the vogue of open source, people are starting to believe that open source is better but don't let that undermine what's important. For those who are building their business on open source, go for the $$$ but keep your integrity. At that point Yara ran a little excert of Metallica goofing on a radio promo production (from Some Kind of Monster?), the ironies of choosing them as illustrations of how money changes everything, given how they coopted and have become the music establishment, were high humor for me. Nonetheless, Metallica like a lot of successful open source software projects have succeeded by being a little dangerous, by being genuine and not bothering with the constraints of the legacy establishment.
Anil Dash gave a talk about Trying to Suck Less: Making Web 2.0 Mean Something basically outlining that beyond the technology stack (i.e. LAMP), there are higher level tools that developers can employ to suck less (yep, I confess, at Technorati when we can't quite kick the butt that we aspire to, we focus on sucking less). Citing the technologies that have grown out of SixApart's software plumbing, he highlighted that all successful Web 2.0 compnaies are using load balancing, messaging, caching, filesystems and other scalability and performance platform components. In SixApart's case, perlbal, memcached, mogilefs and djabberd are the core technologies that they build on ... and, so the pitch goes, should you if you want to suck less.
Those the high points of the morning (so far).
( Jul 26 2006, 10:00:31 AM PDT ) Permalink View blog reactions
Tuesday July 25, 2006 In case you hadn't heard, we've had a lot of things cooking at Technorati. Besides the engaging new look, the new features and the complete overhaul of URL search and link counts, we've been making great strides in our blog spam mitigation (you wouldn't believe the stuff we catch ... and the shear quantity of it!), our internal caching and messaging infrastructure and our data center network. Of course, there's still much to do but we've been heads down on it; if you haven't checked us out lately I think you'll find that our efforts to improve the front end, the back end and all of the cogs and pullies in between have been moving forward.
I'm really proud of the team I work with at Technorati! If you'd like to join the team, we have a lot of innovation ahead. Grab me this week at OSCON and tell me about how you'd like to materialize the real time web! I'll also be moderating a Microformats BOF, this will be a good opportunity to talk about the implementations for producing and consuming microformats. See ya in Portland!
( Jul 25 2006, 10:37:36 PM PDT ) Permalink View blog reactions
Saturday May 06, 2006 Blog publishing services typically propagate updates about new posts from blogs (ergo, new blogs too) by pinging or publishing a changes.xml file. But what none of the services provide is an "un-ping" -- blog indexing services such as Technorati don't know when a blog has been deleted from a service. I noticed this today when I found http://blogtrarian.blogspot.com/ participating in a link farm infesting Blogger's service. This can happen because Google's Blogger recycles URLs; when a blog is removed from the system, the URL is freed for reuse.
That particular URL is one that dates back to 2004, it was dormant for several months but just came to life recently with spam. The historic posts (until August 2005) look like normal blogging fare but the recent posts are clearly just splog content. We'll have to work on "un-pinging" so it's easier to distinguish dormant blogs and dead ones.
spam splog web spam google blogger ping
( May 06 2006, 03:13:14 PM PDT ) Permalink View blog reactions