Is Relevance Relevant?
24 June 2007
For decades, information science has developed and examined the notion of relevance in information retrieval (IR). By and large, the approach to measuring relevance has been rather technical. Recall and precision have been the two main measures:
- Recall looks at whether all of the documents relevant to a given query are returned.
- Precision measures whether only the relevant documents are returned.
To measure relevance, you first need to create a key. This is a list of matching documents in a given database to a given query. But this key is itself artificial and doesn’t take into account any of the significant contextual factors people employ when determining relevance in real-life situations. It’s made up ahead of time by group of people who themselves don’t have a real information need in a real IR situation.
Tefko Saracevic points to a broader model of relevance in his article Relevance Reconsidered [1]. This includes the notion of technical relevance, but takes a more holistic look at relevance accounting for information interaction in IR situations. In addtion to technical relevance, he adds other types to the mix:
- “Topical or subject relevance: relation between the subject or topic expressed in a query, and topic or subject covered by retrieved texts, or more broadly, by texts in the systems file, or even in existence. It is assumed that both queries and texts can be identified as being about a topic or subject. Aboutness is the criterion by which topicality is inferred.
- Cognitive relevance or pertinence: relation between the state of knowledge and cognitive information need of a user, and texts retrieved, or in the file of a system, or even in existence. Cognitive correspondence, informativeness, novelty, information quality, and the like are criteria by which cognitive relevance is inferred.
- Situational relevance or utility: relation between the situation, task, or problem at hand, and texts retrieved by a systems or in the file of a system, or even in existence. Usefulness in decision making, appropriateness of information in resolution of a problem, reduction of uncertainty, and the like are criteria by which situational relevance is inferred.
- Motivational or affective relevance: relation between the intents, goals, and motivations of a user, and texts retrieved by a system or in the file of a system, or even in existence. Satisfaction, success, accomplishment, and the like are criteria for inferring motivational relevance.”
A recent study in JASIST (July 2007) also shows that relevance is very situational and contextual [2]. The researchers looked at how people picked documents from random-ordered results lists from different search engines (Google, MSN Search, and Yahoo!).
“The findings show that the similarities between the users’ choices and the rankings of the search engines are low. We examined the effects of the presentation order of the results, and of the thinking styles of the participants. Presentation order influences the rankings, but overall the results indicate that there is no ‘average user,’ and even if the users have the same basic knowledge of a topic, they evaluate information in their own context, which is influenced by cognitive, affective, and physical factors.”
Cognitive, affective, and physical factors? Yikes. Recall and precision don’t look at any of these, yet these were found to be significant. So what does the traditional notion of relevance in IR really measure with recall and precision?
I believe there is a much broader context that needs to be considered–one that accounts for the entire information experience. Not sure what this is, but context and situation seem to trump recall and precision in real-world IR. Perhaps relevance isn’t even relevant any more in the online, ditigal world anyway. Perhaps we need a entirely new model for understanding how and when people select documents in IR situations.
[1] Tefko Saracevic (1996). Relevance reconsidered. Information science: Integration in perspectives. Proceedings of the Second Conference on Conceptions of Library and Information Science. Copenhagen (Denmark), 201-218.
[2] Judit Bar-Ilan, Kevin Keenoy, Eti Yaari, & Mark Levene (July 2007). User rankings of search engine results. JASIST (58, 9) 1254-1266.
Silobreaker Beta Launch
23 June 2007
Silobreaker is a current awareness service that launched at the beginning of 2006. It’s designed for the “light information professional,” as Silobreaker puts it. (I’m assuming this description doesn’t refer to the weight of the person, but how much information work they do). The product is rich with various features for visualizing, extracting, and clustering search results to expose relationships in content and give as much context as possible.
They’ve recently re-done the interface. Check out the the beta launch of Silobreaker.
Not surprisingly, the interface is very link rich: you can click on just about anything at any time. There are also quite a few mouse-over features that reveal a quick view of information in layers and such. I like this overall approach and feel it’s appropriate for the target group. But frankly, I prefer the original version of Silobreaker. The information design of the beta product doesn’t seem to help visually scanning information on the screen, and it appears more cluttered somehow (although the amount of information is about the same).
Overall, Silobreaker lives up to its claim that it provides numerous ways to slice and dice content. For a relatively new servcie, it has many strengths and an impressive range of features and functionalities. The underlying concept moves away from searching in favour of browsing; however, the product is complex and presents potential interaction problems such as small texts and targets to click. Nonetheless, Silobreaker’s unique approach is likely to appeal to many users who conduct news research and require current awareness content on a regular basis.
Spock People Search
23 June 2007
OK, I got an invitation directly from Spock to use their service. Andrea also invited me shortly thereafter. (Thanks anyway Andrea).
The entity resolution does appear to work quite well. I searched for common names, like John Smith, and although you get back a ton of results, they all seem to resolve. The easy-to-use advanced search (it’s barely an “advanced” search) helps with things like location and age.
One apparent primary source of information are networking sites, like LinkedIn. Neat idea. There’s also user-entered and generated input that feed into the entities. But right now it seems to work best for well-known people, particularly in displaying photos. Most of the time you get “no image” placeholders shown.
Here’s my page on Spock. Doesn’t look like there is anything to be found for Jim Kalbach, so I’m not sure how well name variations are handled. One cool (and scary feature): you can have Spock go through your Gmail account and add people in your contacts as favorites in Spock.
The interface design is simple, with lots of text links, in the style of Google I’d say. Looks to be a good service, but it seems limited to me right now.
Review: Everything is Miscellaneous
16 June 2007
After pointing out a few contentious points in Everything is Miscellaneous in previous posts (see: June 13, 2007, June 2, 2007, and May 28, 2007), I wanted to review some of the book’s strengths. And there are many. This is perhaps one of the most interesting books about information and its order that I’ve read. Though I disagree with Weinberger on many points, the book got me thinking, and I found it quite engaging overall.
Order in the Court
A central concept Weinberger proposes is that of three orders of order:
- First order - This is the organization of physical objects: “We put silverware into drawers, books on shelves, photos into albums.”
- Second order - This refers to creating a surrogate record that is derived from the item to be organized. This record itself has a physical manifestation. The classic example used throughout the book is the card catalogue.
- Third order - Here, there is no limitation for the type and amount of metadata that links to an item. Instead, an object can be classified, tagged, and organized by any number of means–essentailly without limit. What’s more, documents themselves become metadata. So this order is really more like disorder, and it is where the book gets its title.
I’m not sure the division between the second and third orders is entirely clear, but it rings true for the most part. It’s probably more of a continuum than true buckets of order.
Interestingly enough, Weinberger–a philospher himself–doesn’t refer to Karl Popper’s theory of reality. In the Popperian cosmology there are three worlds:
World 1: the world of physical objects
World 2: the world of mental objects and events
World 3: the world of the products of the human mind
I’m seeing these map roughly to Weinberger’ order like this:
World 1 = first order
World 2 = third order
World 3 = second order
These mappings aren’t 1:1, but the causation is different with Popper’s worlds. Perhaps the third order of order as Weinberger proposes it isn’t the next step forward, but a step back to something that more closely resembles human thought, knowledge, and understanding. OK, I’m probably getting in over my head, so I’ll just leave it at that and let you decide or comment further.
Lumping and Spliting
Another recurring concept is that of lumping and splitting. This refers to either grouping or dividing a topic in order to manage, use, or understand it better. “Nesting is a fundamental technique of human understanding. It may even be the fundamental technique, at least in its most primitive form: lumping and splitting” (p. 68). For example, dividing patterns of order into three orders (see above) helps us talk about and understand those concepts better.
But lumping and splitting inherently bring bias to the table. In the third order, however, this bias is removed–or at least lessened. Rather than one person or one group of people deciding how to lump and split information, we all do it. And we do it to fit our needs–without suffering from someone else’s biases. In the end, Weinberger argues that a big pile of metadata-rich information is better than top-down control of it. You then let users and machines sort it as needed from the bottom up.
Small Pieces Loosely Joined
The phrase Web 2.0 has a certain buzzability these days. Some times you’ll hear people define Web 2.0 as the use of technologies like AJAX, or worse the use of 3-D buttons with a reflection. Even talk about communities and user participation sometimes misses the deeper meaning of Web 2.0. It’s the miscellanization of information that enables Web 2.0 activity–along with the connectivity only the Web can offer, of course.
At its core, then, Everything is Miscellaneous is really about Web 2.0, or at least about the underpinnings thereof. It’s about the theory and consequences of the atomization and re-connecting of information in the digital world.
Even broader, Everything is Miscellaneous is, in part, a philosophy of information, covering wide range of classification-related topics from a historical perspective. The author reviews the origins of taxonomy and alphabetical ordering, and even Aristoltle’s notion of hierarchies and understanding. But at the same time the book is thoroughly steeped in the modern, digital world of information.
Quotes
Here are some of my favorite quotes I highlighted while reading it:
page 82: “Reality is multifaceted. There are lots of ways to slice it. How we choose to slice it up depends on why we’re slicing it up.”
page 88: “The basic fact that order often hides more than it reveals has sometimes itself been hidden within the art and science of organizing our world.”
page 105: “The power of the miscellaneous comes directly from the fact that in the third order, everyhing is connected and therefore everything is metadata.”
page 168: “So Peter Morville may have it backwards: Tags may become more useful, meaningful, relevant, and clearer the more there are.”
page 189: “There is no dorm room, divorce, or political scandal as messy as the World Wide Web. There’s an excellent reason for this: Sir Tim Berners-Lee, the inventor of the World Wide Web, in his wisdom made sure that the Web is a permission-free zone. Anyone can post anything she wants, and anyone can link to anything else, all without altering a central registry, without having to get approval, and without anyone saying exaclty where to shelve the new material. So, the Web has grown without plan, which is exactly why it has grown like crazy.”
Interesting side note: Amazon suggests to purchase Everything is Miscellaneous with my book, Designing Web Navigation. This is an interesting contrast thematically: One is about controlling and ordering information from the top down, the other about messiness as a virtue. The thing that joins these two books, however, is the potential audience. So it’s actually a good example of why making a big messy pile and then using algorithms to find new and interesting connections just might work.
Everything is Miscellaneous is well researched. But unfortunately the book uses end notes (does any one really skip back to them while in the middle of a chapter?). And the text lacked numbered references to the points in the notes, so it is extra hard to follow the notes. It’s impressive, though, the Weinberger has talked with many people first hand and actually gone to location to investigate topics, and it’s welcomed that he shares this with us.
The author takes on some deep topics in a fairly accessible style. Everything is Miscellaneous is well written, but not light reading. But at just over 250 pages, you really have no excuse for not picking it up. Throughout, the discussions are thought-provoking and, at times, simply mesmerizing. I highly recommend this to anyone in the information business or doing web design.
Infomavores in America
15 June 2007
The Pew Internet and American Life Project recently released a study called A Typology of Information and Communication Technology Users. The introduction opens with a rap about Web 2.0, but the study has a stronger focus on use of information gadgets and appliances.
Some interesting nuggets of wisdom from the report: 27% of all respondents said they feel overloaded, and 67% of all respondents said they like having so much information available.
8% of Americans are deep users of the participatory Web and mobile applications.
And then there are key categories of users they came up with–the typology. I’m not sure I get all the distinctions here. Seems like too much overlap to be really useful.
- Omnivores: 8% of American adults constitute the most active participants in the information society, consuming information goods and services at a high rate and using them as a platform for participation and self-expression.
- The Connectors: 7% of the adult population surround themselves with technology and use it to connect with people and digital content. They get a lot out of their mobile devices and participate actively in online life.
- Lackluster Veterans: 8% of American adults make up a group who are not at all passionate about their abundance of modern ICTs. Few like the intrusiveness their gadgets add to their lives and not many see ICTs adding to their personal productivity.
- Productivity Enhancers: 9% of American adults happily get a lot of things done with information technology, both at home and at work.
- Mobile Centrics: 10% of the general population are strongly attached to their cell phones and take advantage of a range of mobile applications.
- Connected but Hassled: 9% of American adults fit into this group. They have invested in a lot of technology, but the connectivity is a hassle for them.
- Inexperienced Experimenters: 8% of adults have less ICT on hand than others. They feel competent in dealing with technology, and might do more with it if they had more.
- Light but Satisfied: 15% of adults have the basics of information technology, use it infrequently and it does not register as an important part of their lives.
- Indifferents: 11% of adults have a fair amount of technology on hand, but it does not play a central role in their daily lives.
- Off the Net: 15% of the population, mainly older Americans, is off the modern information network.
The Vision of Librarians
13 June 2007
OK, here’s my last gripe about Everything is Miscellaneous, a fantastic book by David Weinberger. I realize that this might be nit, but I’d like to point it out anyway: Weinberger contends that in the past physical formats of information limited the vision of librarians and information professionals.
Yes and no.
Many paper-bound information specialists and librarians had plenty of vision. Take Raganathan. He was able to see organization completely independent of the media that represents it, well before the electronic computer. By pointing out his genius many times in the book, Weinberger contradicts himself. Or look at the work of Paul Otlet. Here’s what Wikipedia has to say about him: “His vision of a great network of knowledge was centered on documents and included the notions of hyperlinks, search engines, remote access, and social networks—although these notions were described by different names.” Then there’s Eugene Garfield, who created a reverse citation index in the early 60s–well before library automation.
The point is that the vision was apparently there in many instances. Sure, there were limitations in implementation, but there are in the digital world too.
I believe it was the GOALS of librarians that limited their foresight. Namely, library systems were created by librarians and primarily for librarians. They are traditionally very content-centered and not user-centered. For instance, what library patron really cares about the dimensions of a book or CD when searching for information? Yet this information is meticulously recorded by librarians as a rule of thumb. The bottom line is that libraries simply are not user-friendly systems.
Perhaps this is subtle and not-so-clear distinction, but one that still exists in my opinion: the vision was there, but the goals were off. Maybe this is what Weinberger was expressing, or maybe it’s really the same thing. In any event, there were visionaries in information organization before the digital world took over, as Weinberger himself points out.
The Time of Information
11 June 2007
Here’s something I’ve been thinking about for a long time and hope to work up into a presentation or story:
- With the advent of digital information available online, people pointed to how much more information there is than before. At first it was about the volume of information.
- But then others pointed out that it’s not the volume, it’s the access to information that changed. The information was previously available, we just couldn’t get to it.
- But really, you could get it if you had enough time. So my thought is that it’s not the amount of information or increased access to it, but the time it takes to find, use, understand, and experience information that has really changed.
This is an important aspect of Information Foraging Theory described by Peter Pirolli and Stuart Card: “We have argued that in an information-rich world, the real design problem to be solved is not so much how to collect more information, but rather, how to optimize the user’s time.” Foraging for information in the digital world is a trade-off between the perceived value of information and the time it takes to interact with and experience it.
Relevance, then, is also time dependent. Relevance guru Tefko Saracevic hints at this with the notion of Situational Relevance in a paper titled Relevance Reconsidered.
Perhaps the Time of Information needs more attention. Or is this so obvious that it doesn’t even need to be mentioned?
Uday Gajendar on Richness
6 June 2007
Uday has a really good thought piece over at Boxes and Arrows entitled What Does Rich Mean? Good question…and good answers. Read the article. This de-buzzes the buzzword “rich” for sure.
“And therein lays the great burden and hope of designing for rich experiences. As arbiters of human attention, designers must ensure there is not an overload of superfluous, gratuitous richness that distracts users or makes a product difficult to use. Recognizing that every digital product is a rhetorical moment amplified by expressiveness can enable designers to tap into the promise of rich experience: intelligently crafted, well-intentioned acts of communication that are emotionally satisfying and sensibly organized to meet user goals, thus becoming something memorable and valuable. Ultimately, that is what richness is about—connecting to those core human qualities that define our goals, values, and attitudes for living.”
Go Uday.
See more good stuff over at his blog.
Live Long and Prosper - Spock
4 June 2007
Spock is a new people-finding service available free on the Web. It is currently an invitation-only beta service, which means you must receive an invitation from Spock or a friend to sign up for the service. Apparently, their entity resolution technology is killer.
Any one get a login yet? I’ve requested one but am still waiting.
Live Ink
4 June 2007
Scientists at the Walker Reading Technologies in Minnesota have an interesting new technique for improving online reading and comprehension. Basically, the human brain doesn’t deal with block text well. Instead, our eyes view text as if they’re peering through a straw. We only focus on a small area at once, the lines above and below can cause noise and distraction while reading.
Here’s the detailed study of the technique:
http://www.readingonline.org/articles/art_index.asp?HREF=/articles/r_walker/
Of course, we’re so used to reading block text, this might seem counter-intuitive. How could thousands of years of writing and printing be wrong? Well, physiologically block text is not the most conducive for humans to read.
Here’s an article about it with an example of the technique:
Live Ink offers better way to read text online
(Mark Coker, VentureBeat, May 10, 2007)
Be sure to check out the image of before and after formatting.
But do we really want all of our online texts looking like a haiku? For one, this would make pages many times longer. And printing would take reams of paper. So, solving one problem may cause others.
The interesting over-arching lesson from this, however, is that HOW text is presented affects how we read, understand, and interact with information. Information design is crucial to the user experience on many levels.
Maybe there’ll be a FireFox plugin to switch this kind of formatting on and off from your browser?
Weinberger on the Card Catalog
2 June 2007
Again, let me start off by saying that Everything is Miscellaneous is a really great book, particularly for an old librarian/IA type like me. Fascinating stuff.
But Weinberger’s comparisons and criticisms of the card catalog in libraries seem odd. There’s hardly a library in the US that still uses them. Even the smallest public libraries have probably converted to an OPAC years ago–many in the mid 80s. Why even bring them up?
Even if you want to keep the argument in the offline world of libraries, Weinberger still makes it seem like the card catalog is the only access point to books. It’s not. There are many many bibliographies and reference resources that slice and split works by any number of facets. There are also many different indexes to articles with many many access points. Heck, you can even see who else has cited that important scientific article you found with the Science Citation Index. Weinberger over-simplifies a very complex system of citations and linking of resources that exist in physical libraries.
I agree with Weinberger that the third order of organization the web affords is different, but not because other means of accessing books (just to stick with that example) don’t exist. That vision was already there in the paper world.
There are indexes that provide access to Bach cantatas by the first line of text, for instance. Same for poetry. And then there are the countless literature guides in just about any discipline and sub-discipline.
So what the web really changes is:
a.) Who is doing the organizing. Now it’s everyone instead of information professionals
b.) The time it takes to create new lists of access points to books, to then find those list, and to use them effectively.
The Time of Information in the third order, then, is the real thing to focus on. It’s not about more information or more ways to organize information or even more people doing the organizing. The information experience people have in the third order world of the web is one that changes the relationship and proportions of time in information seeking, organizing, and use.
Navigating Microsoft SharePoint
2 June 2007
We’ve had SharePoint at work for over a year now. I’ve heard nothing but complaints from colleagues about how to use it. Sure, it might solve technical problems and allow for some flexibility, but the usability of the system stinks.
I’ve had an unusual thing happen while using it: seems the more I use it, the worse I get at it. I feel I’ve actually un-learned how to use it. Is there such thing as a negative learning curve? If so, Microsoft has figured out how to do it.
I’m surprised there isn’t more discussion about how bad it is, particularly the navigation. I’m constantly searching for the right link to click, and often am lead to click the wrong thing. Curiously, most of things you see about SharePoint on the web are about how to implement it, how to customize the CSS, and so forth.
One problem is that it tries to be like Office applications, but it’s web based. Navigating for desktop apps and websites isn’t the same thing. So there seems to be a collision of approaches in SharePoint. Maybe it’s just me, but SharePoint is embarrassingly bad.

RSS Feed