When I was comparing notes with Kevin Burton, it looks like we each independently found the same A-lister (who shall remain nameless here) that had fallen victim to the WordPress vulnerability on a secondary blog. I think we each independently had passed a "heads-up", I know I was in touch with this blogger a few times in the last two weeks about it. The blog has since been taken down (the URL redirects to a different blog and that redirect target is not vulnerable). This phenomenon is hitting blogs up and down the blogosphere's power curve -- it's neither the A-listers nor the Z-listers who are targetted. Any old vulnerable WordPress installation will do. And as can be seen in the metrics I've posted recently, the number of potential targets is vast.
Bokardo had fallen into the link-spam hole in Technorati's system because of spam defacement (I've since corrected the flagging, we're indexing Bokardo again). Ironically, the same day that Bokardo posted about being zapped in the Google index, the Google Webmaster Central Blog posted My site's been hacked - now what? which details the process of getting out of their purgatory. Unlike the aforementioned A-lister's silence on the matter, Bokardo author Joshua Porter posted about it, to which I say, "Yay, brother!" His case clearly illustrated the basic point: if you haven't upgraded your vulnerable WordPress installation, you're operating an insecure wiki -- any jackass with the exploit can re-write your pages (and worse). And they will.
Shift gears. I've been participating in online community on The WeLL for almost 14 years (yea, I'm paleolithic but I'm young at heart). One of the central ethical underpinnings on the WeLL is YOYOW: You Own Your Own Words. Other people can't quote/repost your words outside of the system without your permission and you need to be responsible for the things you say. In that spirit, I suggest that quality open source projects should adopt a collective You Own Your Own Code ethic. If you release code for other people to do great things with, mazel tov! But take pride in your products by keeping that usage fulfilling and secure. Where are the WordPress folks in getting the word out about the hack pandemic? Why isn't there a Big Red Banner on wordpress.org alerting people to the hazards of not upgrading? Waxing on about all of the groovy features in v2.5 is fine but really, they should be shouting: URGENT! YOUR INSTALLATION WILL BE HACKED UNLESS YOU UPGRADE TO ONE OF THIS FIXED RELEASES OR APPLY A PATCH. It's not like they don't know, both Kevin and I have talked to WordPress developers and posted very publicly about what's going.
Perhaps if Bokardo or the aforementioned A-lister migrated to Movable Type or some other platform and trumpetted about it, WordPress-land would hear the message. Instead of urging people to upgrade, maybe we should be urging them to migrate.( Apr 09 2008, 10:37:03 AM PDT ) Permalink
I've been conversing with Kevin Burton about the WordPress pandemic. We're in agreement that the WordPress community's response to this security issue has been excessively lax. Most of the feedback I've received about yesterday's crawler changes have been supportive; folks generally want more hygienic social media. Kevin is also implementing a change to block spam-infected blogs from Spinn3r's crawls. We're both going to be keeping tabs on this. I'll be developing metrics on the blogs that Technorati is not indexing when they appear symptomatic so that the efficacy (or not) of yesterday's changes are measured. In the meantime, here's an updated trailing 90 days of WordPress updates:
|Version||Count (in thousands)||Change|
Some of the feedback that I've heard from bloggers that haven't upgraded is that the upgrade is a big PITA. Some have asked me for referrals for WordPress consultants to help them get their theme and plugin data rolled forward to a newer version. If anybody has suggestions about where to find reputable consultants knowledgeable about WordPress, please blog about it. If you link to this post and you're not using a vulnerable version of WordPress, I'll even find it on Technorati( Apr 08 2008, 11:53:20 PM PDT ) Permalink
The blogosphere has had its share of maladies before. Comment spam, trackback spam, splogs and link trading schemes are the colds and flus that we've come to know and groan about. But lately, a cancer has afflicted the ecosystem that has led us at Technorati to take some drastic measures. Thousands of WordPress installations out in the wilds of the web are vulnerable to security compromises, they are being actively exploited and we're not going to index them until they're fixed.
We know about them at Technorati because part of what we do is count links. Compromised blogs have been coming to our attention because they have unusually high outbound links to spam destinations. The blog authors are usually unaware that they've been p0wned because the links are hidden with style attributes to obscure their visibility. Some bloggers only find out when they've been dropped by Google, this WordPress user wrote
My 2.2 installation was being hacked into and spam hidden links dumped into index.php. I didn't notice until google decided to ban me (they have now reincluded my site).
To their credit, the WordPress developers have been fixing the issues. They released v2.3.3 in February and patches for older releases to thwart this exploit. More recently, they released v2.5, which in addition to having the flawed XML-RPC code fixed, boasts a number of new features. But from what I can tell, despite brisk uptake many blogs remain obliviously vulnerable and the occurrence of compromised blogs seems to only be accelerating. As of today, here is the count of blogs running WordPress installs that have pinged Technorati in the last 90 days:
|Version||Count (in thousands)|
So at Technorati today, I posted that we deployed an update to the crawlers to abort the crawl if the blog appears to have symptoms of being compromised. We'll probably rescind this measure when the number of vulnerable installations in the distribution above looks a little better (some of the false positives I've found are patched but still have unusual metrics associated with the crawl, so they look fishy). However for the time being, these are just creating a lot of noise and instability in our systems and enough is enough. If you're running an old WordPress installation and you're not getting indexed, stop what you're doing and upgrade. Just Do It. The docs on the WordPress site seem to cover what you need to know and the WordPress Forums should help fill in the gaps.
Digging through the lore, it looks like there have been a procession of security problems with WordPress installations:
Using Technorati membership information, I have personally contacted several hundred of bloggers about this issue. These have included blogs with no authority as well as blogs belonging to A-listers. Many have been grateful for the heads up but none (that I have spotted) have posted about this issue. The blogs that are unclaimed are SOL, I don't have any way to reach them (without groping around their site to find a contact email, though I've done a little of that too). Kevin Burton has made a public plea, Anyone Want to Help Fix these Compromised Wordpress Blogs? One blog that did break the silence (Deep Jive Interests) did so in response to tweets about the issue that Kevin's been facing on TailRank.
But is outreach to bloggers going to be enough to stop the spread of this cancer? Probably not. I think the best way to get the word out is to spread the word, tell bloggers you know to post about it. For their part, what I'd really like to see from the WordPress folks (and all blog CMS developers) are
Building a team of rock stars is cheaper than a team of lower-salaried, less experienced programmers. It's also harder. The notion that there is more economy in the enthusiasm of project contributors and having "more hands on deck", even if they're cheaper hands, is naive. Martin Fowler
If the cost premium for a more productive developer is less than the higher productivity of that developer, then it's cheaper to hire the more expensive developer.You might assume that there's a positive scaling effect with a larger team. Fowler continues
The trouble is that that assumption assumes productivity scales linearly with team size, which again observation indicates isn't the case. Software development depends very much on communication between team members. The biggest issue on software teams is making sure everyone understands what everyone else is doing. As a result productivity scales a good bit less than linearly with team size. As usual we have no clear measure, but I'm inclined to guess at it being closer to the square root.Keep reading the Cheaper Talent Hypothesis.
Trouble is, finding the highly capable and seasoned talent can be a long search. Weeding out the fakers is time consuming, finding the right fit for those who are for real takes longer. And so the search goes on. Technorati is searching; if you're the real deal, call us.( Feb 09 2008, 08:48:53 AM PST ) Permalink
I've worked on a number of different web service and enterprise software products before but never gave one its external name until today. Our release of the Technorati Percolator is the culmination of months of work to harness the vast flow of raw data coming through Technorati to distill a palatable data volume and it's named for the internal moniker I'd been using for it during its development (after all, names with "buzz" and "meme" in them just wouldn't do). While you're looking around at the things we've cooked up in the percolator, make sure you also check out rising links of the day on Blogger Central and today in photos. Today we released them and I mentioned a bit about what goes into them on the Technorati blog. What I didn't elaborate on is what this release means to me on a personal level.
I originally came to Technorati in 2004 after a conversation with Dave fired up my creative sparks about the blogosphere. He had all of these rich conceptualizations about the technology changes in our midst, the social significance of decentralized events, the basic human drives that motivates them, the power of the long tail and the peculiar phenomenon that when you work in the service of others you reap the rewards manifold. I knew I had to work with him to build the ultimate air-traffic-control radar, real time search and meta-CMS systems. The 2004 political season provided an opportunity to work on those problems; the zeitgeist applications that we built to work with CNN's election coverage were thrilling accomplishments.
Since then, Technorati has undergone tremendous growth (regularly chronicled in Dave's "State of the Blogosphere" posts) on the foundation of a search vertical that had no precedent: the real time search of distributed micropublisheds sources. A number of technology changes were necessitated to scale us up; those changes have been likened to rebuilding your jet aircraft's engines at 40,000 feet. A lot has happened since 2004 (the growing pains have been regularly chronicled by the blogosphere) but until now, few of our outward facing accomplishments have excited me as much as the percolator.
There a lot of great sites out there using votes, comments, ratings and other explicit actions that are taken as representative of social gestures. There are also a lot of great sites that use implicit social gestures such as links to identify significant publishes, these are much closer to Technorati's heart. However, our aspirations are to look further along the long tail than most of these other sites can. Bloggers have said they want to see more than "all of the usual suspects", in an October 2007 post, ParisLemon said he wanted
a 'backpage' of sorts where some of us "B-listers" who are on ... everyday under the headlines, could have a chance to have some of our other tech stories showcased
Everyday the percolator is surfacing thousands of things that the blogosphere is talking about; blog posts, news stories and other stuff. It's true, the "A-list" percolates more posts and they bubble up higher; this is basic social software physics and classic power law stuff. But we have put a stake in the ground; we're going to serve bloggers across the power curve spectrum who are producing quality posts and acquiring attention from other bloggers as well as identify where the other attention magnets are by enabling an application that highlights them. When you walk into a crowded party and there are a myriad of conversations going on, you want to find the conversations that are pertinent to your interests and who the thought leaders are in those conversations. For me, today's release marks a new beginning of Technorati playing the role of connector and catalyzer. I hope you enjoy it!( Dec 04 2007, 11:45:40 PM PST ) Permalink
Every family and every community has them. Addicts. Lives twisted by chemical dependency and the accompanying mental illnesses. Maybe I'll never fully understand how lives can wind down into oblivion in that way, given the many opportunities to I consider myself lucky to have never succumbed to such an existence.
From his sister, here's a short contribution to understanding the life, decline and death of one of my teenage cohorts Sherwood Brewer. For now and evermore, I imagine he's partying with Skitchie: boot-a-doot-doot!( Oct 29 2007, 08:59:13 AM PDT ) Permalink
I needed to clear a cache entry from a memcached cluster of 5 instances. Since I didn't know which one the client had put it in, I concocted a command line cache entry purger. netcat AKA nc(1) is my friend.
Let's say the cache key is "shard:7517" and the memcached instances are running on hosts ghcache01, ghcache02, ghcache03, ghcache04 and ghcache05 on port 11111 the incantation to spray them all with a delete command is
$ for i in 1 2 3 4 5 > do echo $i && echo -e "delete shard:7517\r\nquit\r\n" | nc -i1 ghcache0$i 11111 > doneand the output looks like
1 NOT_FOUND 2 DELETED 3 NOT_FOUND 4 NOT_FOUND 5 NOT_FOUNDwhich indicates that the memcached instance on ghcache02 had the key and deleted it (note the memcached protocol response: DELETED), the rest didn't have it and returned NOT_FOUND.
For more information on the memcached protocol, see the docs under source control.( Oct 26 2007, 12:29:37 PM PDT ) Permalink
I've been hearing about folks using the LightSpeed web server instead of Apache for its supposed performance gains and ease of use. OK, so maybe if you're not familiar with the subtleties and madness of Apache, it can seem complicated. But the performance issues are often red herrings. Granted, it's been a few years since I've done any web server benchmarking but from my previous experience with these things, the details really matter for the outcomes and in the real world, the outcomes themselves matter very little.
The benchmark results published on the LightSpeed Technologies web site raised a flag for me right away: their comparison to Apache 2.0 was with the pre-forked MPM instead of the worker MPM. Is it any wonder that the results are pretty close to those for Apache 1.3? Either they had no idea what they were doing when they performed this benchmark or they knew exactly what they were doing and were burying the superior scalability of the worker MPM. Pitting a threaded or event driven process model against a forked one is just stupid. However, the evidence leans more towards willful sloppiness or fraud than ignorance. For instance, they claim to have raised the concurrency on Apache above 10k connections ... but they link to an httpd.conf that has MaxClients set to 150. RTFM, that can't happen.
Why don't these things matter in the real world? In benchmark world, there aren't varying client latencies (slow WAN links, etc), varying database response times (for instance MySQL's response times are very spikey), the vagueries of load balancers ebbing and flowing the load and logging configurations aren't set up for log data management. In the real world, application design and these various externalities are the culprits in application performance, not CPU bottlenecks in the web server runtime. The PHP interpreter itself is likely not your bottleneck either. If you're writing crap-assed code that performs unnecessary loops or superfluous database calls, it's going to run like crap no matter what web server is driving it (I've had to pick through a lot of error-ridden PHP code in my day). With Apache's support for sendfile() static file serving and all of the flexibility you get from mod_proxy, mod_rewrite and the rest of the toolkit, I don't understand the appeal of products like LightSpeed's.( Oct 14 2007, 11:43:08 PM PDT ) Permalink
There are blogs that don't take comments (like this one: I don't have time to moderate spam). There are mainstream media sites that are adopting reader comments. There are blogs being published by independent companies with editorial staff. There are big media organizations publishing columns and event streams as blogs. So I'm finding myself asking some basic questions about blogging of late: Is it an indication of maturation or mutation of the blogosphere that there's quibbling about what's a blog and what isn't? Is main stream media's co-opting of blogospheric mores a harbinger of a thermador to some un-televised revolution? Has the little town become too much of a metropolis that twitter, facebook and other social media are the destinations of urban flight?
The basic existential questions of the blogosphere and where its boundaries reside have been open to consideration (and re-consideration) for quite some time. Not a day goes by on the Technorati support forums without a splogger showing up to complain that their spam isn't getting indexed (Note: I'm not saying everyone who has indexing problems is a spammer, I'm saying spammers come rolling in to complain about it). A few weeks ago, Scoble melodramatically lamented that the TechMeme leaderboard heralds the death of blogging":
I was just looking at the TechMeme Top 100 List and noticed that it has very few bloggers on it. I can only see about 12 real blogs on that list. Blogging being defined as 'single voice of a person.' Most of the things on the list are now done by teams of journalists - that isn't blogging anymore in my book.It's true, a lot of the many of the successful blogs have a prolific editorial staff. But death? Really? Why is blogging as an individual practice more or less than blogging as part of a collaborative enterprise? The existence of the weblogsincs, gawkers and huffington posts of the world are manifestations of blogging as a format but are from what I can tell are no less or more blogs than any others. New blogs continue to be created every second, and some of them will eventually develop thriving audiences.
The line between micropublishing and macropublishing is blurring. Reuters recently announced they they're taking comments on stories and Ally Insider's revelation that the New York Times is posting reader comments got a lot of play. In his post about Technorati rankings, Doug Karr doesn't feel that CNN Political Ticker should be considered a blog. So I'm asking myself, when is a blog not a blog?
Sometimes blogs (the narrower Scoble definition kind) provide the primary source for the facts of our times. Other times, it's main stream media that is bringing forth those facts. As the emergence of blogs that operate like main stream media continues and main stream media adopts blogging as a technology and practice, perhaps this is the ultimate outcome of a leveled publishing playing field: changes will flow along many vectors, cross bred practices are inevitable and Darwinistic rules will prevail such that a lot of things that you'd previously not have considered blogs are morphing into them.( Oct 13 2007, 11:22:11 PM PDT ) Permalink
Listen up kids, crime doesn't pay.
Trying to follow a link to Linus Torvalds' railing against subversion, the irony of getting this error heightens the humor:
Microsoft OLE DB Provider for ODBC Drivers error '80004005' [Microsoft][ODBC SQL Server Driver][SQL Server]Transaction (Process ID 134) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction. /efytimes/lefthome.asp, line 193Sure, database problems happen regardless of the enabling technology, Microsoft is not unique to this. However, I seem to run into completely fubarred application degradation like this (essentially a BSOD on the web) far more often with ASP and .Net based sites than those enabled by other technologies. Of course, any site architected to require a database transaction to serve a content page (without any user data transaction) is a firing offense any place I'll ever work. ( Aug 19 2007, 09:46:22 AM PDT ) Permalink
Last night I was among an invited group that Powerset brought in to witness how their natural language search sausage is made. It was actually kinduva cold cut platter: not exactly a meal but an interesting variety was offered for consumption.
When I was a kid, I thought that by 2007 we'd all have flying cars, rocket packs and computers would be all-seeing/all-knowing accoutrements on our wrists. I think all of us who ever watched Scotty verbally ask the Enterprise questions and get responsive answers in English sentences has had hunger pangs for satisfying natural language search. Powerset is trying to advance human-computer interfaces a little closer to that satisfaction, leap frogging previous efforts, by licensing Xerox PARC's technology and hiring a buncha heavy hitters to make it real.
Powerset COO Steve Newcomb introduced some of the sluggers in their line-up, walked attendees through the thinking behind their PR and release strategy and provided a peek into their search capabilities.
Among the impressive powersetters are people who have been-there/done-that with scaled-up search such as x-Yahoo!'s Chad Walters and Tim Converse (read Tim's post the other day about term proximity and linguistics, great stuff), as well as experts in natural language search with backgrounds at PARC and Ask Jeeves. As a company, they're not just-another-web2.0 rails app built by 2 guys and trying to get to the next level. Powerset is more of a bold bottled-lightning science experiment embracing ruby n' rails as a way to get it in front of people.
Powerset has signed up 10K people since announcing the availability of updates and previews on PowerLabs a few weeks ago. Newcomb characterized their labs preview effort as a way to use social software to guide product management decisions, "a mashup of Digg, Facebook and Google apps." I'm a big fan of transparency and community inclusion, it will be interesting to see how inclusive/closed this effort is.
OK, so after all of that, the "Where's the beef?" moment arrived. A side-by-side comparison interface was demonstrated with Powerset results on the left and Google results on the right. Explaining that the test index was scoped to Wikipedia, the goog results were similarly scoped down. The Powerset use case was demonstrated with a query like "What politicians were killed by disease?" On goog, the results are matching terms (and variants on their stems), "politicians", "killed" and "disease". Powerset matches semantically similar tokens and their grammatical relationships.
When asked about the computational horsepower required to index web documents with the sentence structure decomposition and semantics mappings, Newcomb hedged at first ("Barney's gonna kill me", referring to CEO Barney Pell). But alas, he convinced himself (or did a good job method-acting conviction) that it was safe to reveal that it takes them about a second to grammatically analyze and index a typical document. Lamenting again about his confession, someone from the audience quipped the query, "Which CEO killed Steve Newcomb?" Yea, he didn't search their index for that.
On the subject of Google comparisons, Newcomb kinda squirmily described Powerset as reverent of ("not cocky about") what Google has accomplished but taking a different approach to web search. Doing side-by-side comparisons with Google as their demo does is pretty ballsy and it seems to get them in trouble; being positioned as a "Google killer" by their audience of search wonks and journalists when things are still very much at a proof-of-concept level seems rather premature. I think Powerset needs to reel that in lest they awaken a sleeping giant and fill him with a terrible resolve while they're still on the tarmac. If you've designed a new aircraft, you don't trumpet about revolutionizing aeronautics before the test pilots have taken off. Particularly if folks are proclaiming that Boeing is in trouble. When Powerset indexes a real web corpus, it will be interesting to see how successfully they can overlay web graph, clustering/disambiguation, time and other relevance components. I think that will provide a real moment-of-truth.
Powerset is making a big bet on natural language search as a transformative technology. They've got a lot of great people and a lot of great technology. All in all, the presentation felt a little dog-and-ponyish with the limited corpus but I'm looking forward to hearing more from them later this year when they release a major iteration. See also:( Jun 29 2007, 10:41:46 AM PDT ) Permalink
I'm reading with amusement and wonder the events that unfolded at the Yahoo! Hackday in London. Apparently the Alexandra Palace main hall (the BBC's venue for this) has a roof that opens up. And it did. This was precipitated by a lightning strike on the building as a storm blew over (precipitated, storm: no pun left shall be unpunned). Yes, audience member laptops are open, PA system all setup... and it's raining inside the hall. Not to worry, all Londoners are equipped with umbrellas at all times. That's a fact. "I thought a bomb went off", sez Chad of the lightning strike when he was on IM a few hours later. Is the roof there like Chase Field where the Diamondbacks play baseball in Phoenix? I dunno, I'm checking out pictures of "Ally Pally" to assess. Anyway, power and wifi are back and the show goes on.
Follow along with Hackday
London Lightning on Technorati's hackdaylondon tag stream.
I was sick of various computer OS desktop metaphors 10-12 years ago. At the time, I thought virtual reality technologies were gonna take over (anybody else remember VRML?). I remember the Windows 95/98 releases, lauded by Microsoft as such great advancements, striking me as just laughable in their utter lack of imagination (even if they were big upgrades from the Windows 3.x mess). When that "innovation" made it to Windows XP, I realized that Microsoft was hopelessly lost as far as OS interface design.
Since then, I've seen a lot of technology changes that I view as the harbingers of the desktop metaphor's demise. Graphics card technology that was once only found on $15-50k SGI
pizza boxes workstations are now cheap as pizza. Jeff Han's demonstration of high resolution multi-touch applications at eTech and TED last year was fantastic. At TED again this year, the photosynth demonstration got a big round of "oohs" and "aahs" from a rapt audience (you must see the detail zooming, also check out this photosynth demo reel).
So when are we gonna see these technologies in our everyday lives? Apparently, soon. It's funny how different Apple and Microsoft's foray into this is. In a few weeks, Apple is coming out with a $500 phone (the multi-touch usage is demonstrated at 3:55 into this MacWorld TV report from last January). By the end of the year, we will reportedly see Microsoft's $10k coffee table appearing in hotel lobbies. Can't wait? Fishing in your pocket for an extra $10k? Into starcraft? There are some folks working on a multi-touch DIY kit (Microsoft: 0, Hackers: 1).
Putting on my futurist hat: Five years from now, Intel's 80-cores-on-a-fingernail chip, voice recognition audio inputs and multi-touch screens on commodity devices will make the desktop metaphor seem like a quaint joke. Kids born today will shake their heads in disbelief that desktops we're productive tools. I've yet to explain a command line interface to my kids, who are grade school age; as familiar and comfortable as those interfaces are to me, the youngins look at me typing in a shell window with puzzlement. In their youthful eyes, I may as well be composing vulcan legal tracts (the reality is probably more frightful, it might really be perl). Computing interfaces will fade away into our intuition.
I just wish the iPhone was coming out in time for father's day (yes, honey, that's a hint). In the meantime, I'm still putting up with Apple and Microsoft's OS interfaces, wincing at the trash cans, recycle bins, folder icons, etc. It'll be good riddance.( Jun 09 2007, 10:23:46 AM PDT ) Permalink
There was a time not long ago when Findory offered a credible value proposition for participants and consumers of the blogosphere. The idea of a blog recommendation and reader personalization service is a good one. I guess things didn't work out as planned at Findory. Earlier this year, Greg Linden announced that Findory was riding into the sunset.
The old Findory blog (@ http://findory.blogspot.com/) has been dormant for some time (the last posts from Greg were in 2005), now it's been taken over by a splogger who has been grabbing abandoned blogspot URLs (this one has PageRank of 3) and posting link farm links and German keywords to them. Sad.
I'd recommend holding on to your blogspot URLs forever; even if you're not using 'em anymore it's better to maintain the museum piece than contribute to the web spam problem.( Jun 01 2007, 12:55:10 PM PDT ) Permalink