Quotation Mining at Google

Folks at the IAI mailing list pointed to an interesting talk by Bill Schilit of Google Research called “Navigating the network of knowledge: Mining quotations from massive-scale digital libraries of books.” Check out the video. Or, see a couple of the papers published on this topic from Bill’s publication page.

Overall, they are mining the millions of scanned books in Google Books for quotations, or popularly cited passages. There’s a lot of complexity to their approach, but basically there are two key points:

  • First, they let authors decide which passages from other sources are quote-worthy. So if a passage in one source is quoted by a hundred other works, that passage probably contains an important or interesting idea.
  • Second, by looking at the context before and after a quoted passage in the citing sources, they can extract labels for that quote. In other words, they also let the authors citing another source use their own terms to describe the quote.

The result for you and me is twofold:

  • First, when looking at a book on Google Books, you can see and navigate passages in that book that have been quoted elsewhere. This serves as a quasi summary of the book in some ways, as Bill points out in his talk.
  • Second, you can then navigate to the other sources that reference a particular quote in a book you’re looking at. You might know this as backward chaining or reverse citation, which isn’t a new concept.

So, it’s all quite in line with Google’s approach to search in general: harness the wisdom of crowds to create meaningful links based on frequency or popularity.

Sure, there is a lot of heavy computing going on here with tons of complex algorithms. But Google is once again parasitizing human judgement. That is, their algorithms (for Google Search, Google News, or Google Books quotations) don’t determine what’s important directly and mechanically; rather, they look to see what others have already determined to be important and they aggregate that into relevant linking. I don’t mean to minimize their innovations and hard effort by point that out, but do keep in mind that there is still human judgement at the heart of equation: it’s not all technology. Actually, it’s this human element in the Google equation that makes their approach to search so usefull and so successful, I believe.

About Jim Kalbach

Head of Customer Success at MURAL

One comment

  1. Pingback: Syntopicon « Experiencing Information

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: