What's That Noise?! [Ian Kallen's Weblog]

20060115 Sunday January 15, 2006

Surfacing Microformats in Firefox

Calvin Yu has published his Tails Firefox Extension that will surface the microformats on a page, looks good! The plugin shows the hCals and hCards in a page, he's got a nice screenshot of the hCard renderings. Adding contacts (like Smartzilla) and events would be nice but the plugin could just point at the gateways for contacts and events on Technorati. BTW, for Firefox you can one-click install Technorati search in the browser search box on the Technorati Tools page.

( Jan 15 2006, 04:29:28 PM PST ) Permalink View blog reactions


20060114 Saturday January 14, 2006

Distributed Conversations with Microformats

Last summer, Ryan King and Eran Globen blogged about citeVia and citeRel as a means of denoting conversation semantics between blog posts. A good summary and subsequent brainstorming is on the Microformats wiki. The blogosphere is currently rich with implicit distributed conversations. A little explicit microformat boost is, IMNSHO, exactly what's needed to nail the coffin in a lot of the crufty old centralized group systems (like Yahoo!'s and Google's). The future of virtual community is here and it is in conversing with blog posts.

There's a lot of discussion of primary citations and secondary props ("Via") but there's not as much on reply-to semantics except for in a few of Eran's posts. Isn't reply-to central to a conversation? Citations are more bibliographic (like when you're linking for a definition, a quote or to identify a source). On the other hand, conversations are about exchanged replies. This is as old as the Internet. Email clients put Reply-to headers in messages when you reply to them. RFC 850 defined it for NNTP over twenty years ago. Reply-to has been the binding for conversations for years, why stop now? That doesn't mean not to use cite and via, those are cool too but they're orthogonal to conversing and more pertinent to definition, quotation and source identification. I'm not entirely sure how I'd like to use via since it's kinda like citing a cite -- maybe it's not necessary at all. If you think of a via as a degenerative quote, then use quote. For instance, I think this makes sense (but then, I had a few glasses of wine earlier... I might not feel the same way in the morning):

I might agree that <a href="http://theryanking.com/blog/archives/2006/01/08/blogging-while-adult/" rel="reply-to">negative sarcasm</a>
happens (and worse) wherever there is <a href="http://en.wikipedia.org/wiki/Anonymity" rel="cite">anonymity</a> it is one of an inductively provable 
aspect of human nature. Countless discussion boards have failed (and continue to) due to participant anonymity. However, it's also important to weigh 
in with the benefits of anonymity, would citizens of censored and oppressed societies be able to  engage in progressive debate without it? 
Take a look at the Global Voices' <blockquote cite="http://joi.ito.com/archives/2005/05/23/second_draft_of_anonymous_blogging_guide.html">
<a href="http://cyber.law.harvard.eduglobalvoices/?p=179" rel="cite">Anonymous Blogging Guide</a</blockquote>.
Wine bottle is corked now. Does that make sense?

( Jan 14 2006, 01:44:44 AM PST ) Permalink View blog reactions


Better, Faster Technorati Blog Embed

Willie Dixon was built for comfort, Technorati embeds were built for speed!

Here's an inside tip: if you are a Technorati member and you claimed your blog a while ago, you can likely optimize how your Technorati embed is served and thus speed up how fast your page renders. Go to your account page for your first claimed blog (or go through them all one by one and click Configure Blog). Does the blog embed code match what's in your template? Load your blog page and View Source to compare. The old school embed code looked like this:

<script type="text/javascript" src="http://technorati.com/embed/[BLOG-CLAIM-ID].js"> </script>
What you'll find on your account page is this:
<script type="text/javascript" src="http://embed.technorati.com/embed/[BLOG-CLAIM-ID].js"> </script>
How is this an optimization? Why should you bother updating your blog template from the old to the new style? It's faster! We optimized serving the blog embeds with some additional infrastructure not too long ago. The old way works (built for comfort) but the new one works better (built for speed)!

( Jan 14 2006, 12:23:30 AM PST ) Permalink View blog reactions


20060113 Friday January 13, 2006

Technorati Is Hiring

Technorati is hiring engineers for the website. You should be expert with PHP (including OO constructs, PEAR libraries, templating and application frameworks -- what works and what doesn't), savvy with XHTML and CSS -- be ready with referencable URLs to demonstrate, experienced with web 2.0 services (i.e. even if you don't blog you podcast or addictively use technorati, flickr, del.icio.us, digg, reddit, rollyo, squiddoo, etc) as well as having programmed in at least one language other than PHP and Javascript. Lotsa bonus points for using microformats and Ruby on Rails!

This position is full time, requires US work eligibility and is on-site (San Francisco, 3rd and Brannan). So, is it you? Check out the job listing and send your resume!

( Jan 13 2006, 01:35:28 PM PST ) Permalink View blog reactions


20060111 Wednesday January 11, 2006

Google Earth on Mac OS X

I'd played around on a Windows box with Google Earth a bit last summer and was both enamored with technology and saddened by the absence of Mac OS X support. Well, happy days are here again: a Mac version is out now!

The satellite definitely took new pictures of my neck of the woods, last time I'd checked you could see our car in the driveway of our house. Now there's a long shadow over it like they shot the picture very early in the morning, can't see the car but you can see the garbage cans (well, the resolution isn't that good, they look like little blips).

Thanks, G!

( Jan 11 2006, 08:48:32 PM PST ) Permalink View blog reactions


20060110 Tuesday January 10, 2006

Ramblings on the Tension between Simplicity and Extensibility

There is widespread frustration with standards that try to boil the ocean of software problems that are out there to solve. Tim Bray has a sound advice:

If you're going to be designing a new XML language, first of all, consider not doing it.
In his discussion of Minimalism vs. Completeness he quotes Gall's Law:
A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.
The tendency to inflate standards is similar to software development featuritus. I'm oft heard to utter the refrain, "Let's practice getting above the atmosphere before shooting for the moon." The scope of what is "complete" is likely to change 20% along the way towards getting there. The basic idea is to aim for sufficiency, not completeness; simplicity and extensibility are usually divergent. Part of the engineering art is to find as much of both as possible.

On the flip side, where completeness is an explicit upfront goal, there are internal tensions there as well. Either building for as many of the anticipated needs as possible or a profound commitment to refactoring has to be reckoned with. The danger of only implementing the simplist thing without a commitment to refactoring is that expediency tends to lead people, particularly if they haven't solved that type of problem before, to do the easy but counter-productive thing: taking short cuts, cutting and pasting and hard coding random magic doodads. As long as there is a commitment to refactoring, software atrophy can be combatted. Reducing duplication, separating concerns and coding to interfaces enables software to grow without declining in comprehensibility. Throw in a little test-driven development and you've got a lot of the standard shtick for agility.

Even though there's a project at work that I've been working on mostly solo, it's built for agility. The build system is relatively minimal thanks to maven. The core APIs and service interfaces (which favors simplicity: REST) are unit tested and the whole thing is monitored under CruiseControl to keep it all honest. This actually saved us the other day when a collaborator needed additional data included in the API's return values. He did the simplest thing (good) but I promptly got an email from CruiseControl that the build was broken. I reviewed his check-in and refactored it by moving the code that was put in-line in the method and moving it do it's own. I wrote a test for the method that fetches the additional data. And then wrote one for the original method's responses to include the additional data. The original method then acquired a flag to indicate whether the responses should be dressed up with this additional data; not all clients need it and it requires a round-trip to another data repository, making it a parameter makes sense since the applications that don't need it are performance sensitive. Afterwards, the code enjoyed additional benefits in that the caching had granularity that matched the distibution of the data sources. Getting the next mail from CruiseControl that it was happy with the build was very gratifying. I need to test-infect my colleagues so they learn to enjoy the same pavlovian response.

Anyway. I'm short on sleep and long on rambles this morning.

There are times when simple problems are mired in seemingly endless hand wringing and you have to stand up to shout JFDI. The Java software world, like RDF theorists and other parochial ivory tower clubs, seems to have a bad case of specificationitus. There are over 300 JSR's. Do we need all of those? On the other hand, great software is generally not created in a burst of a hackathon. There's no doubt that when a project has fallen into quicksand, getting all parties around a table and getting it out is an important way to clear the path. Rapid prototyping is often best accomplished in a focused push. I like prototyping to be used as a warm up exercise. If you can practice getting lift-off on a problem and you can attain high altitudes with some simple efforts, you're likelihood of making it to the moon increases.

( Jan 10 2006, 07:59:45 AM PST ) Permalink View blog reactions


20060109 Monday January 09, 2006

Claim Your Blog and Put Technorati Pinging On Your Browser Bookmark Bar

A lot of people blog on platforms that don't ping for them. They could just use ecto, it'll help with the post formatting, tagging, media integration as well as pinging. One of the features for Technorati members is that the ping page will render a link to initiate a ping for each of the blogs you've claimed.

If your blog platform won't ping on your behalf, drag those links up to your bookmark bar and click them whenever you publish a new post. The world is changing all around us. When you post, you're part of that change. When you use Technorati, you can watch it change. Welcome to the Real Time Web!

( Jan 09 2006, 10:40:47 PM PST ) Permalink View blog reactions


OOM and HotSpot Bombs

Looks like I better hasten my effort to upgrade to Roller 2.x. This (v1.1) installation hit an OutOfMemoryError a little while ago and crashed the JVM in all of its hotspot glory. I'm suspicious of the caching implementation in Roller (IIRC, it's OSCache). For a non-clustered installation, plain-old-filesystem caches JFW. For distributed caches, JFW applies to memcached. We've been using the Java clients (and Perl and Python) for memcached productively for a long time now. Interestingly, some one was inspired to write a Java port of the memcached server. Crazy! And I think to myself, what a wonderful world.

( Jan 09 2006, 10:20:03 PM PST ) Permalink View blog reactions


More Kudos

We must be doing something right. More kudos, this time from Jason Calacanis.

( Jan 09 2006, 12:46:48 AM PST ) Permalink View blog reactions


20060108 Sunday January 08, 2006

Kudo's For Technorati's Anti-Spam Effort

Props from Jeremy on our anti-blog spam efforts are certainly appreciated. I know we don't have a spam-free index, however the amount of spam we keep out of the index is truly astonishing. Our ping interface is deluged with a torrent of rubbish but we do our best to scrub the nasty stuff out of our update stream. The problem defies conventional mail spam or even blog comment spam analytic techniques as the structure of blog spam is very different. Deep examination of the content and structure across a pattern of web sites is often required to distinguish it as spam but in the end, the indicators are there. Most spammers' publishing behaviors are statistical outliers by nature; the numbers speak for themselves.

We have a lot to do, on this and on many fronts but we try to pay attention to the gripes as a measure of priorities. The kudos are nice, too!

( Jan 08 2006, 08:29:31 PM PST ) Permalink View blog reactions


Character Set Encoding Detection in Java

The levers and dials of character set encoding can be overwhelming, just looking at the matrix supported by J2SE 1.4.2 gives me vertigo. Java's encoding conversion support is simple enough, if not garrulous:

String iso88591String = request.getParameter("q");
String utf8String = new String(iso88591String.getBytes("UTF-8"));
But what do you do if you don't know what encoding you're dealing with to begin with? It looks as though there are a couple of ways to do it: The docs for Java 1.5.0 look similar but I'm still using Java 1.4.2 (old habits die hard).

( Jan 08 2006, 04:42:53 PM PST ) Permalink View blog reactions


20060107 Saturday January 07, 2006

AJP13 for Ruby on Rails?

Let's call the CGI specification what it is: a burned out and anemic teenager. While it seems kinda cool that Apache 2.2's is going to get mod_proxy_fcgi, I've long wondered about using AJP13 to interface with web application runtimes other than servlet containers.

Brian McCallister did a kick butt cut-to-the-chase preso on Ruby on Rails at ApacheCon in San Diego. I can imagine why he's gung-ho to get a FastCGI support upto date, it seems to be the the way to run RoR. But since learning that AJP13 was going to be (and now is) built in to Apache 2.2's mod_proxy framework, I've been thinking how much nicer it'd be for other application frameworks to also be able to run outside the HTTP request handling process/thread.

We have some services that run under mod_perl that I've been taking second (and third) looks at. Wouldn't it be nice to deploy that application independent of the HTTP server runtime as one can with a Java webapp? Essentially, when it's boiled down to bare metal, perhaps that's all FastCGI is but it, it... it's CGI! Isn't it just setting/getting global environment variables? STDIN/STDOUT/STDERR? Isn't that so, well, 1994? Maybe I need to think about it some more but that was my take away last time I built anything with FastCGI (admittedly, in the 1990's).

I found what looks like AJP13 protocol support for Perl. Even though I don't read Japanese I'll infer from the context that he was/is interested in the same thing. Though whenever I see "use threads" in Perl, I fear the worst. Anyway, the likelihood of me finding myself with the time on my hands to implement AJP13 in Ruby is low; first, I still need to learn Ruby enough to get crafty.

( Jan 07 2006, 01:20:50 PM PST ) Permalink View blog reactions


MSN bows to China

As I expected to hear about after first reading of Microsoft's policies were reported last summer, MSN has (as reported by msnbc.com) censored a Chinese blog at Beijing's request.

IMO, it behooves the Chinese speaking blogosphere outside of China to vigorously discuss this. Beijing will have to adapt or retreat into isolation, they (and the world) can't afford the latter.

( Jan 07 2006, 08:49:20 AM PST ) Permalink View blog reactions


20060106 Friday January 06, 2006

Open Source Language Detection

No, not a typo. OSDL is something else. I'm interested in OSLD. I've used Language::Guess to detect languages in arbitrary text with Perl, it works pretty well. But how are folks solving the problem in Java?

It looks like Oracle has language detection as part of their "Globalization Development Kit" ... but what about open source? Sadly, the Nutch Language Identifier Plugin only supports European languages, no CJK. What are the other options?

( Jan 06 2006, 02:22:54 PM PST ) Permalink View blog reactions


20060105 Thursday January 05, 2006

Regexp'ing simple XML

I ran a test to prove to myself that for simple XML documents, the best way to parse them may be to skip capital P parsing altogether and just use a plain-old regular expression pattern match.

The XML format I wanted to test is the response from the Technorati /bloginfo API. I threw together a Perl based benchmark quickly enough and here are the results:

Benchmark: timing 10000 iterations of regexp, xpath...
    regexp:  0 wallclock secs ( 0.13 usr +  0.00 sys =  0.13 CPU) @ 76923.08/s (n=10000)
            (warning: too few iterations for a reliable count)
     xpath: 137 wallclock secs (136.17 usr +  0.04 sys = 136.21 CPU) @ 73.42/s (n=10000) 
... the regexp parse was three orders of magnitude faster than the XPath parse. I'm curious now what the comparison would be for Java's regexp support versus, say, Jaxen and JDOM (which is how I usually do XPath in Java). In my dabblings with timings, Java regexp's are very fast. Apparently, Tim Bray found this as well.

Here's the Perl code:

#!/usr/bin/perl

use XML::XPath;
use XML::XPath::XMLParser;
use XML::Parser;
use Benchmark qw(:all) ; 

my $X = new XML::Parser(ParseParamEnt => 0); # non-validating parsing, please

timethese(10000, {
    'xpath' => \&xpath,
    'regexp' => \&regexp
}); 

sub xpath {
    my $b = getBlog();
    my $parser = XML::XPath::XMLParser->new(parser => $X);
    my $root_node = $parser->parse($b);
    my $xp = XML::XPath->new(context => $root_node);
    my $nodeset = $xp->find('/tapi/document/result/weblog/author'); 
    die if ! defined($nodeset);
}

sub regexp {
    my $b = getBlog();
    my ($author) = $b =~ m{<author>(.*)</author>}sm; 
    die if ! defined($author);
}

sub getBlog {
    return q{<?xml version="1.0" encoding="utf-8"?>
<!-- generator="Technorati API version 1.0 /bloginfo" -->
<!DOCTYPE tapi PUBLIC "-//Technorati, Inc.//DTD TAPI 0.02//EN" "http://api.technorati.com/dtd/tapi-002.xml">
<tapi version="1.0">
<document>
<result>
  <url>http://www.arachna.com/roller/page/spidaman</url>
  <weblog>
    <name>What's That Noise?! [Ian Kallen's Weblog]</name>
    <url>http://www.arachna.com/roller/page/spidaman</url>
    <rssurl>http://www.arachna.com/roller/rss/spidaman</rssurl>
    <atomurl></atomurl>
    <inboundblogs>6</inboundblogs>
    <inboundlinks>8</inboundlinks>
    <lastupdate>2006-01-02 18:38:03</lastupdate>
    <lastupdate-unixtime>1136255883</lastupdate-unixtime>
    <created>2004-02-23 12:04:51</created>
    <created-unixtime>1077566691</created-unixtime>
    <rank>false</rank>
    <lat>0.0</lat>
    <lon>0.0</lon>
    <lang>26110</lang>
    <author>
      <username>spidaman</username>
      <firstname>Ian</firstname>
      <lastname>Kallen</lastname>
      <thumbnailpicture>http://static.technorati.com/progimages/photo.jpg?uid=11648</thumbnailpicture>
    </author>
  </weblog>
  <inboundblogs>6</inboundblogs>
  <inboundlinks>8</inboundlinks>
</result>
</document>
</tapi>
};
}

For some of the messaging infrastructure at Technorati where the messages are real simple name/value constructs, we've been passing on using XML at all. Using a designated-character-delimited format string (say, tabs) that can be rapidly transformed into a java.util.Map (or a Perl hash, a Python dictionary, yadda yadda yea) and passing messages that way buys a lot of cheap milage. We like cheap milage.

( Jan 05 2006, 11:26:28 AM PST ) Permalink View blog reactions


20060104 Wednesday January 04, 2006

Technorati Cosmos Links in Roller

Now that I'm messing around with a roller implementation from within the last 7 months (migrated from Roller 0.98 to 1.1), I'm going to work on closing the gap to 2.0. Migrating all of my apps from an old (3.x) version of MySQL to 4.1.x wasn't too bad. But it appears that somewhere along the way to Roller 2.0, somewhere in the MySQL upgrade cycle perhaps, the post <-> category mappings got mangled and that was resulting in NPE's when the system tries to fetch the categories.

In the meantime, I implemented embedding cosmos links in my posts by patching WEB-INF/classes/weblog.vm (from the 1.1.2 release):

479,486c479
< #end
< 
< #macro( showCosmosLink $entry )
<     <a href="http://technorati.com/search/$absBaseURL/page/$userName/#formatDate($plainFormat $entry.PubTime )"><img
<         src="http://static.technorati.com/pix/icn-talkbubble.gif"
<         border="0"
<         title="Links to this Post" /></a>
< #end
---
> #end
In the velocity template, I just added:
#foreach( $entry in $entries )
    <a name="$utilities.encode($entry.anchor)" id="$utilities.encode($entry.anchor)"></a>
    <b>$entry.title</b> #showEntryText($entry)
    <span class="dateStamp">(#showTimestamp($entry.pubTime))</span>
    #showEntryPermalink( $entry )
    #showCosmosLink( $entry )
    #showCommentsPageLink( $entry )
    <br/>
    <br/>
#end 
I think the POJO's and macros are different in 2.0 but I'll post a cosmos link update when I get there.

( Jan 04 2006, 07:29:26 AM PST ) Permalink View blog reactions


20060101 Sunday January 01, 2006

No Vacancy

This blog had a nice long vacation but it is now occupied, again. No, I wasn't in Borneo. I wasn't kidnapped by aliens (you never can be sure though, can you?). Nor was I in the hospital. I just found myself wanting to fix my platform but always too busy to do it. So I just didn't blog at all (except for on my super secret alter-ego blogs). While my efforts at going from 0.98 to 2.0.x of Roller never seemed to work out, I did get it to a 1.1 release (hey, take a little progress if you can't get it all). Most of all, I ditched my old template and stylesheet, they were pretty long in the tooth... (I think) this seems a lot cleaner.

A lot has happened with Technorati, the blogosphere, my deep dives into various technologies and other stuff. And there's more to come. And it's a new year. And speaking of which, it's that time again.

So here are my :

I'm going to get off my butt and get my cardiovascular system working. I'm going to overcome this rotator-cuff injury I've been hoping would just get better by itself (but never has). Ten years ago, I was physically fit easily, never got fat, injuries just healed themselves and I had no lack of physical agility and stamina. It didn't seem to matter that I didn't really try to take care of myself. Well, what a difference a decade makes and it matters now.
No, I don't need a new calling plan. I need to maintain my personal relationships a bit better. Between work and being with the kiddos and my better half, most of my other relationships have suffered.
I'm going to hit it out of the park with Technorati and live happily ever after. Or something like that. Last year, much of the effort at Technorati was focused on scaling models that can keep up with the blogosphere. Maybe we're not out of the woods now but we're in much better shape now than we were a year a ago (or even the duration in my blogging lapse). In 2006, it's showtime. See that fence about 339 feet away from the plate? Watch the ball go over the fence.

OK, so maybe it's all very self centered. Yea sure, somewhere along the way I'll be working to make the world a better place, too. But first things first.

Happy 2006!

( Jan 01 2006, 10:33:29 PM PST ) Permalink View blog reactions


20050701 Friday July 01, 2005

The Emerging Online Tiger, China's Next Revolution or Balkanization? The micropublishing revolution of the last several years and the wiring (and wirelessing) of China are ripe to converge. Or collide. Or combust.

The numbers cited in this BBC article about the Chinese online population are really staggering.

Of course, how Beijing's appetite for control will adapt remains a fascinating question. There's no shortage of folks willing to probe the boundaries, contrast Microsoft's willingness to play along. But perhaps the most interesting development ahead is a balkanization of the internet. As the U.S. Department of Commerce asserts continued control of ICANN and China asserts more control on its domestic web sites, it doesn't seem that far fetched.

( Jul 01 2005, 10:33:33 AM PDT ) Permalink View blog reactions


20050623 Thursday June 23, 2005

Technorati Japan launch Hot on the heels of Technorati Live8, we released another web site!

Presenting the newly updated Technorati Japan!

( Jun 23 2005, 12:42:40 AM PDT ) Permalink View blog reactions


20050622 Wednesday June 22, 2005

live8 Web site lauches galore this week... the release of the Technorati redesign has been joined by Technorati Live8.

In an effort to raise awareness for African debt relief, Bob Geldof and all the usual suspects are putting on a load of concerts. And Technorati will you bring you the blogosphere's coverage.

( Jun 22 2005, 05:31:28 PM PDT ) Permalink View blog reactions


20050530 Monday May 30, 2005

Annotation Shmannotation Among the most interesting things that the blogospere has demonstrated in the last few years is its capacity as a medium for distributed conversation and meme propagation. Implicit and spontaneous communities coalesce and atrophy and the web has become the transport for peer-to-peer publishing.

A post showed up recently on Ideant, Facilitating the social annotation and commentary of web pages that drew me in but then turned me off. It's a review of working or proposed systems that use anchor/name tags, rdf, autolink-ish page transformations and browser plugins for annotation systems. There's a lot of great stuff there about eliminating the distinction between authors and respondents, filtering, open infrastructure, and so on (read it)... but I can't figure out the emphasis on annotation.

The post goes badly astray with this requirement for distributed textual discourse:

Hypertextual granularity. Discourse participants are able to hypertextually annotate every fragment of an online text, instead of having to refer to online texts as wholes which cannot be annotated.
Every fragment? If I want to identify a particular sentence or two as part of a conversation, I'd be more inclined to simply cite and respond:
  <blockquote cite="http://ideant.typepad.com/ideant/2005/05/facilitating_th.html#challenges">
    Discourse participants are able to hypertextually annotate every fragment of an online text
  </blockquote>
  Well, that level of granularity is an edge case requirement
In fact, the ability to address every fragment of text is not a requirement for dispersed discourse. That all of these systems reviewed to support annotation are so intrusive on the author is indicative of how problematic this requirement is.

HTML's intrinsic support for linking, anchoring and citing provide a sufficient medium for binding together dispersed discourse. Browser plugins? Your blog is your platform for citation. Parallel universes (rdf) or structural modifications to make everything "citable" beyond the author's original intention smells like gratuitous complexity. Let the web be the web.

( May 30 2005, 10:30:55 PM PDT ) Permalink View blog reactions


One Chance Only for Mozilla Mail to Thunderbird Migration? A family member had mistakenly hit "Cancel" when firing up Thunderbird for the first time when prompted to import from Mozilla/Netscape 7. Astonishingly, the Thunderbird developers don't make that option available from that point forward. You can import from Eudora, Outlook (yea, this is a Windows box) or Navigator 4 but there's no option to import from Mozilla. Must be graduates from the School of Masochistic User Interfaces.

Assuming the user (or you if this is your problem) hasn't starting using the Thunderbird installation so the profile can be safely, here's the work around:

Voila

More details and scenario options are available at mozillaZine

( May 30 2005, 01:14:41 PM PDT ) Permalink View blog reactions


20050529 Sunday May 29, 2005

Technorati Japan The lift of weightlessness and the carthasis of a product release is one of the great rewards of ushering a project to fruition.

So it is with this pleasure that I bring your attention to the beta release of Technorati Japan. This is a true eat-your-own-dogfood story; the localizable code base behind the website is built with all searches as clients of the Technorati API, woof. Coinciding with this release is Joi's inaugural post to the Technorati Japan Blog. To toast the efforts of my colleagues at Technorati and the Tokyo team @ technorati.jp, I raise my virtual sake glass!

And if you read Japanese, we hope for your feedback and that you enjoy the site!

( May 29 2005, 11:45:27 AM PDT ) Permalink View blog reactions


Thwarting Spam With GMail and Procmail I've been self hosting my mail for almost 10 years now and I'm not about to quit. But the growing ineffectiveness of SpamAssassin has made me consider it. While SpamAssassin was catching a lot of spam, at least as much was still getting through. It'd really gotten a lot worse lately. I probably could have done more with it (and I may still dig deeper into how to configure SpamAssassin to work better for me) but I was intrigued by the idea of using a web mail host as a pass-through service to do it for me.

I've used GMail since last summer but really haven't had a whole lot need for it... it's a nice place to subscribe to mailing lists from. When I'd read Using Gmail as a Spam Filter a while back it intrigued me but the idioscyncrasies of procmail and qmail made it seem like more of project than I'd wanted to undertake (yea, yea... one of these days I'll migrate to postfix but I have a lot of legacy ezmlm stuff running, I need to figure out how to migrate that to mailman or something).

Well since I had a ton of GMail invites sitting around, I invited myself to create another account (one that no spammers will know the name of, I hope... we'll call it gmail.username for now). I followed the GMail side of the instructions at the site above, e-z nuf. And then I got to the stuff on my server. This is what I ended up doing in my procmailrc to get procmail to forward a message and accept it again once GMail took its turn on it:

:0 
* ! ^X-Forwarded-For: gmail.username@gmail.com my.username@my.domain.com
| /usr/bin/formail -R Delivered-To X-Delivered-To | \
/usr/sbin/sendmail -oi gmail.username@gmail.com
I probably could've used qmail-inject instead sendmail but whatever, this works. So what's up with the pipe to formail -R Delivered-To X-Delivered-To?
Well, without it qmail got very grouchy. Well, grouchy in that inimitable qmail'ish way:
Hi. This is the qmail-send program at my.domain.com.
I'm afraid I wasn't able to deliver your message to the following addresses.
This is a permanent error; I've given up. Sorry it didn't work out.

<my.username@my.domain.com>:
This message is looping: it already has my Delivered-To line. (#5.4.6)
OK, so qmail's loop detection worked a little too well for me; I worked around it by munging the Delivered-To line.

My vindication came in the hours that followed as dozens of pieces of junk messages ended caught by GMail's spam detection and the mail that I wanted got through to me on my longstanding but spam-threatened email address.

Warning: if you want to email me something without Google knowing about it (i.e. say you have a business proposition that is a "google killer"), ask me for some alternate methods.

( May 29 2005, 01:02:26 AM PDT ) Permalink View blog reactions


20050519 Thursday May 19, 2005

Blinkin' Blog I haven't been blogging here frequently as of late. I've been really busy with work; too busy to blog, ironically. So I tend to post things here that've been on my mind for a while. I don't have any rules about it per se, it's just been my modus operandi in recent daze.

As an experiment, I've been blogging my brief whims into ecto (a stolen moment on BART may be my best opportunity to blog). The ecto posts have been going to another blog on blogger, I think the blogger API implementation on this old version of roller is busted, I gave up using ecto with it. I don't know if I'll maintain a separation of ideas that've had a gestation period from passing fancies, but for now that's how it is.

At least the markup and CSS on the other blog are a lot tidier than the one here. I'll have to upgrade this roller implementation soon.

( May 19 2005, 10:29:43 PM PDT ) Permalink View blog reactions


« Sad Mac Speaking of Upgrades »