Alex Wright - The Web That Wasn’t
27 December 2007
Finally got around to watching Alex Wright’s Google Tech Talk entitled The Web That Wasn’t. Alex is the author of GLUT: Mastering Information Through the Ages, a book I don’t own yet but will be getting soon. The talk is based on the book and gives a tour of philosophical and direct precursors to the web. Fascinating stuff. He discusses Paul Otlet, Vannevar Bush, Eugene Garfield, Ted Nelson, and other. The talk is one hour long, but worth it.
Some of the lessons from looking at the history of early notions of networked systems:
- Top down and bottom up organization of information can work in concert with each other
- Two-way linking provides more information than one-way. (Of course, to this point I’d say that the web wouldn’t have taken off if two-way linking was mandatory.)
- Showing pathways and usage patterns is important information about information.
- Users can be authors and contributors
- The nature of interaction is more from the “oral” tradition
We can see some of these things on the web today, but looking at alternative systems (theoretical or real) still provides inspiration. It also reminds us that the “new” ideas and concepts–even things like Web 2.0–aren’t necessarily new. Overall, he points things in a broad perspective.
One point he makes quickly in the Q&A session: things like controlled vocabularies may have a place in bounded domains. The example he gives is MeSH. He mentions maybe there is a way to automate this, but the point is that we can learn from all the work done on developing controlled vocabularies to date. This mirrors a point I made in my presentation in Barcelona the Euro IA Summit and in an article for the ASIST Bulletin of the same title: Navigating the Long Tail.
Facetag
13 November 2007
In retrospect, I can’t figure out for the life in me why I didn’t mention Facetag during my talk at the Web 2.0 Expo in Berlin. I had plenty of time, and it’s something I cover in Designing Web Navigation. That and maybe some other more forward-looking ideas would have rounded out an otherwise (perhaps too?) practical talk.
Facetag is a working prototype of an application that mixes normal tagging with the power of facets. My friends from Italy developed it: Andrea Resmini, Emanuele Quintarelli, and Luca Rosati. With a little bit of additional effort, tags can be aligned with facets while tagging a web resource. Later, these facets allow you to filter the resources in different ways. It’s pretty straightforward, but very powerful at the same time.
The interesting thing for me is that I proposed a similar idea at the first German IA Conference in Frankfurt in 2005 refering to del.icio.us, which at the time had a sinlge flat list of tags. Not that I want to downplay Andrea, Emanuele, and Luca’s achievment–I’m far too incapable of actually getting such a project going–, but it does show the potential universal appeal of Facetag. I’ve heard of others who had similar ideas. The Lazy Web at work!
One thing that I called for back then were facets of intrinsic metadata: primarily date saved and domain name, but also things like domain extension. Facetag doesn’t have this (yet–or at least it didn’t when I asked them about it in Berlin in 2006). The thought is that you could potentially get a lot of mileage out of intrinsic metadata because users wouldn’t have to do anything extra while entering tags. So, if you bookmark lots of things from, say, Boxes and Arrows you could then zoom in on just links from www.boxesandarrows.com, and then potentially pick a certain date range or filter by another tag.
Ciao!
Librarians, IA, and the Long Tail of Information Spaces
3 November 2007
Not sure if anyone has ever made this connection before, but I’m going to give a try. Let me know if you’ve heard this already. Here goes:
If we consider all published information in the world, we can assume it takes on a long tail curve. The most people sources are read by only a small percentage of people. Librarianship is really about organizing information in the head of the long tail curve. Sure, there are special libraries, like science libraries and music libraries, but even those are concerned with organizing the hits.
After the advent of the web, IA arose out the need to organize information in the long tail. At some point the long tail of information spaces got so fat, someone realized that we need special, dedicated people to take care of our informatoin problems. IA is about finding custom solutions in a niche market for a particular business or client.
My point is that attacks on things like the Dewey Decimal System by people like Clay Shirky and David Weinberger are irrelevant to IA. IAs aren’t concerned about organizing all of human knowledge. We tend to work in niche markets. And it’s in niche markets that things like taxonomy and controlled vocabularies make most sense because they are bounded domains. Even Mr Shirky admits that himself in his polemic article on ontology:
“Ontological classification works well in some places, of course. You need a card catalog if you are managing a physical library. You need a hierarchy to manage a file system. So what you want to know, when thinking about how to organize anything, is whether that kind of classification is a good strategy.”
Of course.
On the other side of the coin, things like tagging might be better when organizing the hits. There you’ll get a critical mass of tags to make them worthwhile. But tagging in niche markets might have holes. You might not even get all of your content tag if the user population is too small. And users in a niche market tend to have a common terminology and structure of the inforamtion space, so a controlled vocabularly could actually help them find, use, and make sense of information.
OK, the above is really a half baked idea and had lots of problems. But blogs let anybody say anything they want anytime, so there you have it.
Library Porn
8 September 2007
Here is an impressive collections of photos from amazing libraries around the world. I’ve only been to a few of them, sadly. Gotta make a point of getting to more of them (particularly the ones in Germany).
BTW, I can recommend visiting libraries while travelling. Most have free internet connections, for starters. But you also get to see great buildings, and many have interesting exhibits and even museums.
What’s the online equivalent? Will anything we create now in the digital world still be around in 50 years? 25? 5? We were pondering this recently while standing in front of the 2000 year-old amphitheater in Nimes: the thing isn’t just still standing, it’s in use. We just don’t build things for longevity anymore or have long term thinking as a whole.
Maybe people will be showing pictures of places on Second Life to future generations and saying, “Wow, they really knew how to build things back then.”
Search and Browse Article in D-LIB
13 August 2007
There is a nice article in the most recent issue of D-Lib Magazine called Enhancing Search and Browse Using Automated Clustering of Subject Metadata. The authors looked at ways to integrated automatic classification with traditional categories. “Results indicated that while the algorithm was somewhat time-intensive to run and using a local classification scheme had its drawbacks, precise clustering of records was achieved and the prototype interface proved that faceted classification could be powerful in helping end-users find resources.”
I like the practicality of this study. Lots of screens are shown, helping you grab onto issues discussed. They also talk about user testing and integrating that feedback into the designs.
Is Relevance Relevant?
24 June 2007
For decades, information science has developed and examined the notion of relevance in information retrieval (IR). By and large, the approach to measuring relevance has been rather technical. Recall and precision have been the two main measures:
- Recall looks at whether all of the documents relevant to a given query are returned.
- Precision measures whether only the relevant documents are returned.
To measure relevance, you first need to create a key. This is a list of matching documents in a given database to a given query. But this key is itself artificial and doesn’t take into account any of the significant contextual factors people employ when determining relevance in real-life situations. It’s made up ahead of time by group of people who themselves don’t have a real information need in a real IR situation.
Tefko Saracevic points to a broader model of relevance in his article Relevance Reconsidered [1]. This includes the notion of technical relevance, but takes a more holistic look at relevance accounting for information interaction in IR situations. In addtion to technical relevance, he adds other types to the mix:
- “Topical or subject relevance: relation between the subject or topic expressed in a query, and topic or subject covered by retrieved texts, or more broadly, by texts in the systems file, or even in existence. It is assumed that both queries and texts can be identified as being about a topic or subject. Aboutness is the criterion by which topicality is inferred.
- Cognitive relevance or pertinence: relation between the state of knowledge and cognitive information need of a user, and texts retrieved, or in the file of a system, or even in existence. Cognitive correspondence, informativeness, novelty, information quality, and the like are criteria by which cognitive relevance is inferred.
- Situational relevance or utility: relation between the situation, task, or problem at hand, and texts retrieved by a systems or in the file of a system, or even in existence. Usefulness in decision making, appropriateness of information in resolution of a problem, reduction of uncertainty, and the like are criteria by which situational relevance is inferred.
- Motivational or affective relevance: relation between the intents, goals, and motivations of a user, and texts retrieved by a system or in the file of a system, or even in existence. Satisfaction, success, accomplishment, and the like are criteria for inferring motivational relevance.”
A recent study in JASIST (July 2007) also shows that relevance is very situational and contextual [2]. The researchers looked at how people picked documents from random-ordered results lists from different search engines (Google, MSN Search, and Yahoo!).
“The findings show that the similarities between the users’ choices and the rankings of the search engines are low. We examined the effects of the presentation order of the results, and of the thinking styles of the participants. Presentation order influences the rankings, but overall the results indicate that there is no ‘average user,’ and even if the users have the same basic knowledge of a topic, they evaluate information in their own context, which is influenced by cognitive, affective, and physical factors.”
Cognitive, affective, and physical factors? Yikes. Recall and precision don’t look at any of these, yet these were found to be significant. So what does the traditional notion of relevance in IR really measure with recall and precision?
I believe there is a much broader context that needs to be considered–one that accounts for the entire information experience. Not sure what this is, but context and situation seem to trump recall and precision in real-world IR. Perhaps relevance isn’t even relevant any more in the online, ditigal world anyway. Perhaps we need a entirely new model for understanding how and when people select documents in IR situations.
[1] Tefko Saracevic (1996). Relevance reconsidered. Information science: Integration in perspectives. Proceedings of the Second Conference on Conceptions of Library and Information Science. Copenhagen (Denmark), 201-218.
[2] Judit Bar-Ilan, Kevin Keenoy, Eti Yaari, & Mark Levene (July 2007). User rankings of search engine results. JASIST (58, 9) 1254-1266.
The Vision of Librarians
13 June 2007
OK, here’s my last gripe about Everything is Miscellaneous, a fantastic book by David Weinberger. I realize that this might be nit, but I’d like to point it out anyway: Weinberger contends that in the past physical formats of information limited the vision of librarians and information professionals.
Yes and no.
Many paper-bound information specialists and librarians had plenty of vision. Take Raganathan. He was able to see organization completely independent of the media that represents it, well before the electronic computer. By pointing out his genius many times in the book, Weinberger contradicts himself. Or look at the work of Paul Otlet. Here’s what Wikipedia has to say about him: “His vision of a great network of knowledge was centered on documents and included the notions of hyperlinks, search engines, remote access, and social networks—although these notions were described by different names.” Then there’s Eugene Garfield, who created a reverse citation index in the early 60s–well before library automation.
The point is that the vision was apparently there in many instances. Sure, there were limitations in implementation, but there are in the digital world too.
I believe it was the GOALS of librarians that limited their foresight. Namely, library systems were created by librarians and primarily for librarians. They are traditionally very content-centered and not user-centered. For instance, what library patron really cares about the dimensions of a book or CD when searching for information? Yet this information is meticulously recorded by librarians as a rule of thumb. The bottom line is that libraries simply are not user-friendly systems.
Perhaps this is subtle and not-so-clear distinction, but one that still exists in my opinion: the vision was there, but the goals were off. Maybe this is what Weinberger was expressing, or maybe it’s really the same thing. In any event, there were visionaries in information organization before the digital world took over, as Weinberger himself points out.
Weinberger on the Card Catalog
2 June 2007
Again, let me start off by saying that Everything is Miscellaneous is a really great book, particularly for an old librarian/IA type like me. Fascinating stuff.
But Weinberger’s comparisons and criticisms of the card catalog in libraries seem odd. There’s hardly a library in the US that still uses them. Even the smallest public libraries have probably converted to an OPAC years ago–many in the mid 80s. Why even bring them up?
Even if you want to keep the argument in the offline world of libraries, Weinberger still makes it seem like the card catalog is the only access point to books. It’s not. There are many many bibliographies and reference resources that slice and split works by any number of facets. There are also many different indexes to articles with many many access points. Heck, you can even see who else has cited that important scientific article you found with the Science Citation Index. Weinberger over-simplifies a very complex system of citations and linking of resources that exist in physical libraries.
I agree with Weinberger that the third order of organization the web affords is different, but not because other means of accessing books (just to stick with that example) don’t exist. That vision was already there in the paper world.
There are indexes that provide access to Bach cantatas by the first line of text, for instance. Same for poetry. And then there are the countless literature guides in just about any discipline and sub-discipline.
So what the web really changes is:
a.) Who is doing the organizing. Now it’s everyone instead of information professionals
b.) The time it takes to create new lists of access points to books, to then find those list, and to use them effectively.
The Time of Information in the third order, then, is the real thing to focus on. It’s not about more information or more ways to organize information or even more people doing the organizing. The information experience people have in the third order world of the web is one that changes the relationship and proportions of time in information seeking, organizing, and use.
Weinberger on Dewey Decimal System (DDC)
28 May 2007
I’m just about in the middle of Everything is Miscellaneous by David Weinberger. I’ve enjoyed his other books, and this one perhaps tops them all. Really good read. If you have anything to do with the development, conception, or organization of web sites and web content, get it.
I find myself, however, agreeing with him and disagreeing with at the same time: “Yes, that’s right, but…” For instance, while I’m sure Mr Weinberger knows exactly what the DDC is, I feel he misrepresents it at times. Not that I’m a fan of the DDC nor am I defending it in any way. I’ve never really used it. But it seems he’s attacking some of the wrong aspects of the DDC and in the wrong way.
Of course, seen abstractly–as a second order organization system, to use Weinberger’s term–there are many problems with the DDC. Yes, the geographic splits are very Western-centric. Yes, Christianity gets many divisions while all other religions are lumped together. Those are certainly weakness of the system that shouldn’t be glossed over and will hopefully be corrected.
But the DDC is really about the first order organization of books–how they sit on the shelf. So if you compare its second order arrangement to other second or third order systems, you lose a lot. The DDC is a classification scheme, not a cataloging system. Missing from Everything is Miscellaneous, then, is a discussion of the user experience you have while in the stacks of a DDC library. Namely, the books are arranged by subject. If you find one book on Muslims, others around it are likely to be about Muslims too.
And if you think people don’t look left and right when retrieving a book from a shelf, you’re wrong. They do. It’s an important type of information discovery in physical libraries. Let’s say you go to the stacks for a biography of J.S. Bach. You may then see biographies of C.P.E. Bach and J.C. Bach, perhaps whom you didn’t know much about or even existed. That’s an interesting connection you may not have seen online or in a card catalogue. Or, you may find other novels by Herman Melville near Moby Dick that also interest you. It’s almost like a menu of links for “related products.” Yes, it’s only one dimensional and limited by physics (a book can only be in one place), but it’s a heck of a lot better than no order of books at all.
Also, on page 58 he compares DDC to topics of books on Amazon. This is just wrong. The DDC is a classification scheme, not a list of topics for cataloging books. Comparing the Library of Congress Subject Headings (LCSH) to topics on Amazon would have been better, for instance. Of course, you’d find problems with LCSH, but at least the comparison would be accurate.
In other words, a subject heading catalogue is about the second order organization of books–what they are about–, whereas DDC is about the first order organization of books on the shelf. I was missing this in the book, and felt Weinberger’s argument gives readers the wrong impression. He seems to make DDC something it’s not, and sets it up as a paper tiger at times. I agree with many of Weinberger’s conclusions, but how he gets there is problematic, in my opinion.
Note: Rather than a single review of Everything is Miscellaneous, I hope to post more thoughts on individual topics in the future.
RSS Feed