What's That Noise?! [Ian Kallen's Weblog]

Main | Next month (Jan 2005) »

20041231 Friday December 31, 2004

Pictorial Fallout Seems like there are no events without photostreams anymore.

Here are some from last night's fiesta and after party that followed Scobleizer's Technorati visit.

( Dec 31 2004, 12:49:48 PM PST ) Permalink


Hey 2004, don't let the door hit you on the butt on your way out! I have a lot of things to be thankful for this past year. However, 2004 brought on a sufficient number of painful kicks-in-the-teeth, I say good riddance!

The bad stuff has been damn bad:

But there was good stuff too:

Wishing for a good 2005!

( Dec 31 2004, 10:56:36 AM PST ) Permalink


20041230 Thursday December 30, 2004

New Years Resolutions I have a number of personal, professional, creative and spiritual objectives, but I won't chunk them into specific intervals or milestones. At least not here and now. So for next year, I'll keep it simple: I resolve to help make the world a better place than it was last year.

That's my new years resolution.

( Dec 30 2004, 02:13:30 PM PST ) Permalink


20041225 Saturday December 25, 2004

Hasta La Vista, mod_perl - Hola mod_parrot! While I've become more fond of running my application outside of the webserver (i.e. using mod_jk2 to wire Tomcat to Apache is my preferred modus operandi these days), mod_perl will always have a place in my heart; I'm actually writing a mod_perl handler for a simple web application right now.

Well, I'm painfully aware of how mod_perl is getting long in the tooth. So I was pleased to read of mod_parrot. While it's probably going to be a while before I get a chance to mess with Perl 6 and Parrot (or even Python and Parrot), I'm hopeful that mod_parrot will provide high productivity with application with modernized programming facilities in the future.

Oh, by the way, Happy Festivas!

( Dec 25 2004, 05:39:09 PM PST ) Permalink


20041224 Friday December 24, 2004

Learning to say kaddish Recent reading: Living a Year of Kaddish

One of the things rattling around my mind these days is grief. I recently listened to my dad, aunts and uncles eulogize my recently departed grandfather at his burial service in New Jersey. It's given me plenty to think about as far as what I knew of him on both a first and second hand basis. Growing up on a coast opposite of his, my knowledge of him has been the product of the fleeting visits and the lore passed on by my parents. But I'll always be fond of the interest he took in my goings about, the twinkle in his eye that sparked when he engaged in conversation with me and some of his funny little habits like cutting an article out of the newspaper for some anticipated future reference that would never take place.

The traditional grieving process has a number of rituals and practices that are vaguely familiar but only by hearing or reading of them. I've not before been proximate to these traditions but my grandfather's passing has produced an interest in them. So I picked up Living a Year of Kaddish by Ari Goldman to learn a little more about it. The book consists of a succession of short thought recordings (even blog-like, as it doesn't read like a diary) of the year that followed the death of Goldman's father. The traditional purpose of kaddish, a daily prayer for the deceased (preferably three times a day), is to help the loved one get closer to and eventually arrive at gan eden (paradise). Saying kaddish for eleven months and then on the death anniversary (yahrtzeit) is an obligation of the children but is also a prayer for all who grieve. At least, that's my understanding of it and my knowledge is nominal with these things. But I have to say that Goldman's take on it, that the purpose of kaddish is more inward looking, resonates with me more.

To me, kaddish is more for the living than for the dead. I believe that in my daily recitation of the prayer, I was coming to terms with who my father was and who I am. If I missed a day of kaddish, I suffered, not my father.
When I die, I want my children to say kaddish for me, but for themselves, too.
Indeed, I've been thinking a lot about who my grandfather was, who his eldest son, my father is and who I am. And what will my children know of my father and myself in the years ahead. There is much to consider. I previously didn't know the kaddish prayer but I'm taking the time to learn it now.

( Dec 24 2004, 03:31:40 PM PST ) Permalink


20041221 Tuesday December 21, 2004

Rigorous Operations, Agile Development When it comes to software development, I generally eschew rigorous planning. I prefer to take a defined set of use cases, ascertain that they are the important ones and develop tests and code to fullfill them -- no design documents, no architecture documents and no UML diagrams unless they're helping me think through the data model and call stacks (i.e. sequence diagrams).

The recent colo move that I worked on all last weekend with my cohorts at Technorati was a good demonstration where rigor pays off: operations. Physically moving an entire network requires shared knowledge of complex network systems, detailed resource allocation and troubleshooting contingencies amidst a maze of dependencies requires rigorous documentation, planning and coordination. My hats off to Matt, David, Aaron, Bill and Adam for rising to the occasion and kicking butt. It's been an honor and a pleasure, gentlemen.

I'm in New Mexico for a spell, hoping to take a side trip up to Ski Santa Fe to get a lil bit o' frozen bliss.

( Dec 21 2004, 04:35:58 PM PST ) Permalink


20041218 Saturday December 18, 2004

Rollin' rollin' rollin', keep them servers rollin' Technorati's entire infrastructure was brought down this morning in order to move it.

Here's the flickr tag to follow it.

No ETA on service restoration but I'll post updates where possible.

( Dec 18 2004, 03:01:57 PM PST ) Permalink


PHP's mbstring is a 2 bit thief Normally, PHP's strlen function reports back the number of bytes in a string. When dealing with western characters, that will equal to the number characters as well. However, strange things can happen when PHP's mbstring extension for multi-byte character support is enabled to alias that function.

If you have mbstring.func_overload configured to alias mb_strlen for strlen (i.e. when the 2 bit is flipped), then strlen starts counting characters, not bytes. If you need to count the number of bytes, it's not obvious how you're supposed to do it.

This is how I did it:
In places where I really needed to know the number of bytes, I used a homebrewed function byte_count instead strlen. Here's the function definition for byte_count.

     function byte_count($val) {  
         $len = (function_exists('mb_strlen')) ? 
             mb_strlen($val, 'latin1') :
             strlen($val);
         return $len;
     }

Perl is hokey about it too. The length is supposed to count the number of characters but if you want to force it to count bytes, you need to use the bytes pragma. From the manpage:

           $x = chr(400);
           print "Length is ", length $x, "\n";     # "Length is 1"
           printf "Contents are %vd\n", $x;         # "Contents are 400"
           {
               use bytes;
               print "Length is ", length $x, "\n"; # "Length is 2"
               printf "Contents are %vd\n", $x;     # "Contents are 198.144"
           }

Java is not without it's pickiness but it as least it has byte and char as distinct primitives.

( Dec 18 2004, 12:50:23 AM PST ) Permalink


20041215 Wednesday December 15, 2004

Tool to defeat a DOS against an Apache server farm Throttling the number of requests that come in to a single web server instance isn't rocket science but it requires keeping track of a state shared amongst processes/threads. But how would you efficiently throttle requests that come to a set of servers?

Apparently, there's some Apache goodness available for this now. At least I think it sounds good! Ian Holsman has written mod_ip_count for Apache 2.0. It uses the APR portability layer and memcached for shared state (actually apr_memcache from Paul Querna). This would enable a whole server farm to keep track of request rates from and throttle specific IP addresses.

( Dec 15 2004, 04:24:13 PM PST ) Permalink


20041214 Tuesday December 14, 2004

Greener Pastures

This weekend Technorati's network and server infrastructure is going to move. In one big fell swoop. Well, hopefully nothing will fall.

The home page sez: "Movin' on up" cause Technorati is substituting the Jefferson's theme song for the old ops/facilities anthem, the Talking Heads' "Burning Down The House"

( Dec 14 2004, 12:47:43 AM PST ) Permalink


20041212 Sunday December 12, 2004

l10n development practices I've used Java's handy-dandy ResourceBundles to do some proof-of-concept localizations. But my past proofs were limited; they only used other European languages that utilized ISO-8859-1 characters. I don't recall having to do anything special with the property files in those cases.

Working on a recent Japanese localization project was an eye opening experience. It turns out the java.util.Properties expects ISO-8859-1 characters. I guess that's the downside of having a super-simple file format. I got the localized display boostrapped by using native2ascii to get the UTF-8 localization text rendered as escaped unicode. On a one-off basis, that's easy enough. But collaborative development always begs the tools question, how do folks typically manage this?

What about input encoding? If there's an HTML form on a page and the input has multibyte characters in the query string (or POST data), are characters escaped to ISO-8859-1? My recollection was that HTTP headers must be ISO-8859-1.... but looking at the docs for PHP's mbstring and the encoding_translation parameter, it looks like server-side handling of the request needs to account for other character set encodings. Do browsers honor charset specification as a form attribute, like

<form action=... method=... accept-charset="UTF-8">
(looks like Struts supports this) or is it presumed that the browser always escapes unicode? Or perhaps they simply URL encode the characters so it's a non-issue? On the server side the must the request handling do this
request.setCharacterEncoding("UTF-8");
String raw = request.getParameter("foo");
String clean = new String(raw.getBytes("ISO-8859-1"), "UTF-8");
or is it all supposed to transparently just work (obviating String cleansing) if request.setCharacterEncoding("UTF-8") is used? ...for all of the hand-waving in the docs for ResourceBundle, etc establishing a clear practice for input String handling in a webapp remains murky.

As far as sending responses, is it safe to always just send UTF-8 and include "charset=UTF-8" in the Content-type header? Is it standard practice to presume that the client will send a request header Accept-Charset (which indicates what an acceptable response is)? If they send it and UTF-8 isn't on the list, must the server go through a big String re-writing exercise to encode response to the browser's preference or is UTF-8 presumed to be implicitly acceptable at all times?

So many questions... I'm still digging for anwers.

( Dec 12 2004, 11:51:01 PM PST ) Permalink


20041207 Tuesday December 07, 2004

Runtime inseration of struts tiles in Velocity I've been impressed with TilesTool, it comes in the Velocity Tools package. It runs Velocity views through the struts MVC machine for processing reusable "subviews". However, there's no support for runtime insertion of components!

You can do this in tiles-defs.xml

   <definition name=".dog" extends=".animal.layout">
     <put name="body"    value=".dog.display" /> 
     <put name="head"    value=".dog.head" /> 
   </definition>
   <definition name=".cosmos.head" extends=".head">
     <put name="titleKey" value="dog.title" />
   </definition>
   <definition name=".dog.display" 
     controllerUrl="/dog.do"
     path="/tile/dog.vm"
     />
and so forth. Declaritive tile composition works just fine. But what about programmatic composition at runtime?

With JSTL and struts, I can do this:

<c:forEach var="bit" items="${kibble}">
  <tiles:insert page="/tile/bark.jsp">
    <tiles:put name="bit" beanName="bit" />
  </tiles:insert>
</c:forEach>
I would imagine that the Velocity equivalent would look like this:
<ol>
#foreach ($bit in $kibble)
 $tiles.put("/tile/bark.vm", { "bit" : $bit })
#end
</ol>
but alas, it's not implemented by TilesTool. I can work around this by moving "bark.vm" to its own velocimacro but that it fugly as hell. I would prefer parameterized components.

( Dec 07 2004, 06:53:07 AM PST ) Permalink


20041206 Monday December 06, 2004

Servlet container forward from inside Velocity I ported some JSP UI code to Velocity, it's been fun learning the Velocity paradigm (being able to cleanly process template components outside the container rocks). One of the things in the JSP UI handled forwarding requests that bypassed the struts controller back through struts.

In JSP with struts tags, it looks like this (assume web.xml has "struts-logic" mapped):

<%@ taglib uri="struts-logic" prefix="logic" %>
<logic:redirect forward="home"/>
But what about Velocity? Well, it turns out that the VelocityViewServlet stuffs the basic servlet container things into the Velocity context, much like JSTL does in JSPville. Ergo, the $request object itself can be invoked like this:
$request.getRequestDispatcher("/home.do").forward($request,$response)
Seems kinda grotty to not be able to use struts symbolic name, but so far that's where my read of the Velocity docs has taken me. As I unpeel the onion, I may be inspired to subclass the VelocityViewServlet as a StrutsViewServlet... it seems like however you're invoking the rendering, you should be able to access, if present, other runtime services such as struts, spring, etc. ( Dec 06 2004, 10:05:35 AM PST ) Permalink


20041205 Sunday December 05, 2004

memcached in a service oriented functionality ecosystem Among my efforts over recent months have been those focused on decoupling. Technorati has a very high update rate as it taps the ping streams, fetches update contents, analyzes links and keyword indexes the substance of posts in the blogsphere. Such a system doesn't work well when components are closely coupled; the availability of the whole system is subject to the whim of the system's weakest links. Often, weaknesses are combinatorial; the weakness of the whole is greater than the weakness of the parts. That's what I'm focused on undoing. Fixing weaknesses in the components is important but decoupling them first is more so.

When folks say "service oriented architecture" it still cannotes monolithicism to me. An architecture implies a level of structure definition that sounds rigid; can you re-pour that foundation to adapt redrawn plans? Software development agility and loose coupling should reinforce each other. I prefer to think of architectures and ecosystems. A service oriented functionality ecosystem supplies application functionality as a suite of services. Supporting requirements (as opposed to the core business requirements) such security, logging, persistence, redundancy and caching are each handled independently; they in turn may be provisioned as services that higher level services rely on. This is part of the evolution under way at Technorati; some of the changes are evident in Dave's recent posts but some are just revisions that we're quietly rolling out.

Queues and distributed memory caches are natural elements of a such an environment. In the December issue of Linux Journal, Technorati's use of open source building blocks such as memcached is discussed by Doc Searls.

This is the game:
A memcached server (or a set of servers) can be accessed over the network to store things in a table kept in RAM. When storing things, you can specify a maximum age for the cache entry -- if you go back to fetch it and the elapsed time since it was stored exceeds that age, it gets treated as a cache miss.

Storing things in memcached with the timeout parameter and invalidating cache entries works as long as you have consistent mechanism for calculating the key. If internally you're managing "stories" and each one has an "id" attribute that is unique (a primary key), that's a good candidate to store them with. So for instance putting memcache inside a content management system (CMS) "content service" seems natural. In babytalk code:

  public Story fetchStory(int storyId) {
      Story story = memc.get(storyId);
      if (story == null) // perhaps more rigorous validation of the fetched object
          return story;
      story = StoryDB.findById(storyId);
      memc.put(storyId, story, AGE);
      return(story);
  }

If it's difficult to determine whether something is new or an update because it doesn't have an id and uniqueness is determined by some combination of attributes, then the lookup cycle can be helped by caching with composite keys. It gets a little more complicated:
  public Story fetchStory(Map atts) {
      // encapulate whatever attributes uniquely identify a thing
      CacheKey key = new CacheKey(attrs); 
      Story story = memc.get(key);
      if (story == null) 
          return story;
      story = StoryDB.findByAttrs(attrs);
      memc.put(key, story, AGE);
      return(story);
  }

We're in the process of evolving Technorati's infrastructure to one that is loosely coupled, redundant and robust. Our use of memcached is one of the enabling technologies of that evolution.

( Dec 05 2004, 09:22:23 AM PST ) Permalink