What's That Noise?! [Ian Kallen's Weblog]

Main | Next day (Nov 28, 2008) »

20081127 Thursday November 27, 2008

Topic Clustering Visualized in Library Search

Public service announcement: your low-tech dowdy public libraries have slicked up high-tech. The old days of long searches through card catalogs and filling out forms in triplicate are gone. Since moving to the east bay several years ago, I've been impressed with the Contra Costa County Library's online catalog that searches all of the branches in the country, online reservations and inter-branch transfers. One of my favorite features is the visual topic clustering.

When searching for "django", a hub-and-spoke is displayed with related nodes such as "reinhardt" and "guitar" as well as misspell candidates. The search results are pretty good too, the first result is for a Gypsy jazz guitar (Django Reinhardt's signature style) instructional video by the main guy from Hot Club San Francisco (Paul Mehling can often be found gigging here in the east bay at the Left Bank in Pleasant Hill, good stuff). Overall, the selection of books, CD's and videos matching "django" was what I expected.

As fond as I am of Gypsy jazz, I'm also interested in the web application framework written in the Python programming language. Changing my query to "python django" brings up a different visual cluster with some of the same cluster terms ("reinhardt" and "guitar") but adds some new ones "monty", "boa" and "computer". The search results were exactly what I wanted: The Definitive Guide to Django: Web Development Done Right by Adrian Holovaty and Jacob Kaplan-Moss and Sams Teach Yourself Django in 24 Hours by Brad Dayley. I'm planning on using django (the python web app framework) for a project (not work related) and, while the online docs are pretty good, having a book (or two) to refer to is definitely welcomed.

All said, I'm a fan of the search and clustering technology enabled by AquaBrowser that the CCC library is using, it's had me wondering how well it would perform against the more volatile data set flowing through Technorati.


( Nov 27 2008, 11:43:13 AM PST ) Permalink