Finally got around to watching Alex Wright’s Google Tech Talk entitled The Web That Wasn’t. Alex is the author of GLUT: Mastering Information Through the Ages, a book I don’t own yet but will be getting soon. The talk is based on the book and gives a tour of philosophical and direct precursors to the web. Fascinating stuff. He discusses Paul Otlet, Vannevar Bush, Eugene Garfield, Ted Nelson, and other. The talk is one hour long, but worth it.

Some of the lessons from looking at the history of early notions of networked systems:

  • Top down and bottom up organization of information can work in concert with each other
  • Two-way linking provides more information than one-way. (Of course, to this point I’d say that the web wouldn’t have taken off if two-way linking was mandatory.)
  • Showing pathways and usage patterns is important information about information.
  • Users can be authors and contributors
  • The nature of interaction is more from the “oral” tradition

We can see some of these things on the web today, but looking at alternative systems (theoretical or real) still provides inspiration. It also reminds us that the “new” ideas and concepts–even things like Web 2.0–aren’t necessarily new. Overall, he points things in a broad perspective.

One point he makes quickly in the Q&A session: things like controlled vocabularies may have a place in bounded domains. The example he gives is MeSH. He mentions maybe there is a way to automate this, but the point is that we can learn from all the work done on developing controlled vocabularies to date. This mirrors a point I made in my presentation in Barcelona the Euro IA Summit and in an article for the ASIST Bulletin of the same title: Navigating the Long Tail.

GoLexa Search Engine

22 December 2007

Just came across GoLexa. The interesting thing about this is the search results. They provide quite a bit of context, including links to bookmarking sites, page data, page previews, etc. And there are also plenty of other tools, like direct links to analyze keywords and refine your search.

This brings up the point of the Navigation Layer that I made in my presentation at the Euro IA conference in Barcelona. Navigating the long tail of online information isn’t necessarily about having content or even just finding it. It’s about making sense of it and understanding it. In order to do that, you have to provide structure to both the tools and the content, which is what GoLexa does. There is a lot of hand-crafted IA work on the search results page for GoLexa, even though the content is all dynamically populated.

Check it out–it’s quite interesting.

Looks like libraries are putting Designing Web Navigation under the LC Classification of TK5105.888. This is roughly:

  • Technology
    • Electrical engineering. Electronics. Nuclear engineering
      • Telecommunication, including telegraphy, telephone, radio, radar, television.

The full call number in a given library might be something like TK5105.888 .K35 2007.

LC Subject Headings from several libraries include:

  • Electronic texts
  • Web site development
  • Web sites–Design.
  • World Wide Web
  • User interfaces (Computer systems)
  • Internet searching

Amazon has these subjects:

  • Books > Computers & Internet > Microsoft > Web Browsers
  • Books > Computers & Internet > Home Computing > Internet > Web Browsers
  • Books > Computers & Internet > Graphic Design > Website Architecture & Usability

O’Reilly has it under on their site

  • Web > Web Design

Libri.de has the subjects:

  • User Interfaces
  • Internet - Browsers
  • Internet / Programmierung
  • Internet - Web Site Design

Compare these to the tags on LibraryThing:

  • $ Köp (1)
  • Collib (1)
  • computers (1)
  • currently reading (1)
  • design (1)
  • faceted browse (1)
  • information (1)
  • iacanberra (1)
  • information (1)
  • information architecture (3)
  • information seeking (1)
  • labels (1)
  • layout (1)
  • library2 (1)
  • navigation (3)
  • non-fiction (2)
  • organization (1)
  • rias (1)
  • rich web applications (1)
  • search (1)
  • tagging (1)
  • to catalogue (1)
  • to read (1)
  • usability (2)
  • user research (1)
  • ux (2)
  • visual design (1)
  • web (2)
  • web design

What does this tell us? Not 100% sure. Some of the controlled subject headings are off, like “Electronic texts” from LCSH and “Web browsers” from Amazon. So it’s hard to make a case that those are better access points.

The tags seem better to me, but perhaps too numerous. (Of course, I tagged the heck out of Designing Web Navigation on LibraryThing, so I’m contradicting myself). And except for a few personal tags, I actually find they are more descriptive of the book. There is information on tagging and facetted browse interfaces in the book, and that’s hard to show in most library subject headings.

So from this sample, the tags win out in for me.

The AltSearchEngines blog recently issued a list of the top 10 alternative search engines for 2007. These highlight lesser-known search engines that rate well from an innovation, retrieval, or popularity standpoint. All of these are trying to distinguish themselves in different ways, and it’s quite exciting to see their inventive ideas. Here’s the list:

  1. Quintura – This puts results in a tag cloud alongside of a list of results.
  2. Answers.com - Aggregates results from well-known sources. I used this a lot while writing Designing Web Navigation.
  3. Exalead - Supports regular Boolean query formats.
  4. Omgili – Searches user-generated content such as forums and discussion groups to “find out what people are saying about everything and anything.”
  5. KoolTorch – Visualizes results (but I found the rollovers with blurbs of the results problematic)
  6. GoshMe – Still in beta. Instead of searching sites, GoshMe finds the most relevant search engines to find results. It’s a search engine about search engines.
  7. Aftervote – Combines results from Google, Yahoo! and Live Search and indicates ranking fromthose sites. You can also sort by any one of those engine’s rankings, as well as by Digg votes. You can then rank results yourself. I found this approach quite interesting.
  8. KartOO - One of the first to visualize results
  9. Dialogus – A Russian Answers.com-like search engine in English or Russian). Not sure about how well this one works, but they seem to be really trying. I quite like the waiting message after submiting a search: you really get a sense that something is happening on the back-end.
  10. Onkosh – Pptimized for searching Arabic language content.

Some trends I noticed:

  • Word wheels - Answers.com is an example of this I often use to demonstrate a word wheel. These seem to becoming more and more popular, but many have usability problems. There are two kinds: those that show terms in the search engine’s index, like on Answers.com, or those that display recently typed in strings from the browser. Some (e.g., CiteSeer) grab things you’ve typed from a variety of input fields and go far back in time.
  • Displaying results as text list - Well, this isn’t new, but when you’re doing things like visualizing results you don’t need a plain list of results anymore, right? That doesn’t seem that’s the case in every situation. For instance, Grokker (not in the list) used to only show their visualization. Now they offer the text list as the default. Maybe information visualizations complement plain old results lists and won’t replace them?
  • Defaulting to a country based on your location - Lots of sites put me into their German version of the site automatically, even if I go to the dotcom address. This is generally annoying to me. Sometimes you can get to the dotcom site, but most now have a link at the bottom. Still, if I put in a dotcom address, please don’t swtich me automatically. I know–they need the eyeballs for advertising revenue in a fixed geographical region. This also applies to the Best Bet hits at the top of results: I see things in German even if I search from the dotcom site.
  • Visual cues to foreshadow sites - Many search engines are now including thumbnails of homepages in the results list. Or, Quintura includes the site’s logo, for instance.
  • Search refinement options - Most of the sites above start with a Google-like experience: a simple input field and a Go button. Then, in the results environment, people can refine and manipulate items in a number of ways. Making suggestions is very popular, particular spelling suggestions. But there’s also more and more search refinement suggestions using things like pseudo relevance feedback techniques or similar. Overall, the experience is: put a few words in and get to the results as quickly as possible; then refine them later.

Photosynth

13 December 2007

The best talk I saw at the Web 2.0 conference in Berlin this year was from Blaise Aguera y Arcas, Software Architect at Microsoft Live Labs. He showcased the latest updates of Photosynth, a new technology from Microsoft Labs that stitches photos together from any number of sources to create (the illusion) of a 3-D model of a building or landmark. If you’ve not seen this yet, do so. Here’s a brief video of Blaise showcasing Photosynth at the TED conference.

Basically, the software recognizes unique points on photos of a stationary geo-location and is able to align them with other photos. If you get enough photos in a collection, you effectively have a 3-D version of the original location. Take Notre Dame in Paris: you can point Photosynth at a collection of photos on Flickr, forn instance, and Photosynth compilies a 3-D rendering of the building. Sure, there are some ugly seams, but it’s a pretty amazing results nonetheless. With the ubiquity of digital cameras these days, we could potentially have every place on earth represented in 3-D on the web in the future.

The interesting thing would be to apply this principle to tagging. If you have a rich, complex folksonomy, would you be able to pick out unique descriptive points, and then be able to “sew” the terms together to get a clearer semantic picture of the objects being described? I suppose that’s  what things like Twine are trying to do, in a sense.

Check out Blaise’s TED talk.

I wrote a brief summary of my talk at the Euro IA 2007 conference on “Navigating the Long Tail” in the Dec 2007-Jan 2008 issue of the ASIST Bulletin with the same title. See the full article online here. A PDF version is also available.

It doesn’t include all of the points made in the talk, but it gets at the basic jist. Here are a couple of quotes:

“If this new online, long-tail economy is to work, people have to be able to navigate to the markets that interest them and filter the information quickly and efficiently. This is really the value of information architecture (IA). IA not only helps people find the information they need, but it also helps them makes sense of it by providing context.”

“The point is that designing navigation for the long tail calls for any and all types of sources of metadata and all types of structure to provide context. It’s not about one or the other, but about what’s right for the situation. In some situations, a traditional taxonomy may be the best thing; in others, tagging works great. A mix is needed, and those practicing IA will have to experts in them all.”

I spotted this article on NPR about BPR3: Bloggers for Peer-Reviewed Research Reporting.

Basically, if you blog about peer-reviewed research, you can then add the approved BPR3 icon to that posting. Here’s the brief description from the BPR3 homepage:

“Bloggers for Peer-Reviewed Research Reporting strives to identify serious academic blog posts about peer-reviewed research by offering an icon and an aggregation site where others can look to find the best academic blogging on the Net.”

See more about the BPR3 guidelines here.

Ultimately, they want to offer an aggregation service that will filter blog postings to just show peer-reviewed entries.

To me, this points to how peer-reviewed information and top-down edited content can complement and co-exist alongside of user-generated content on the web via blogs and wikis and such. One type doesn’t have to replace the other, does it?

New Book on Text Mining

1 December 2007

Just came across a new book on text mining: Tapping into Unstructured Data: Integrating Unstructured Data and Textual Analytics into Business Intelligence, by William H. Inmon and Anthony Nesavich. I previewed it on Safari and downloaded a few chapters.

The book is not technical in the sense of showing programmers how to code, but it does focus on database architectures and the like. And when they talk about structured vs unstructured, they are really referring to database structures, not necessarily information architectures.

There is a chapter on visualization, but this is disappointing: it’s more about the process of creating visualizations than about whether the visualizations will be meaning to any human being. In fact, one of the examples used is a bar graph, where the bars themselves are blocks and they are stacked in a three-dimensional arrangement—two no-no’s.

One key point they make—a point I made in my presentation at the Euro IA Summit this year in Barcelona—is that for unstructured data to be useful, it often makes sense to bring it into a structured environment. This makes possible analysis and understanding that would otherwise not be possible.

The penultimate chapter is a brief case study on creating a corporate taxonomy. This company in question created one to help them tie together disparate IT systems and to allow analytics to take place at all. Taxonomies still have a place in the unstructured world.

The writing style is dry and not very engaging. And the summaries for each chapter (which I hoped to give me a better overview of the content) are very thin. So, I’m not sure I’d recommend you run out and buy the book, but since I have a Safari account it was certainly worthwhile to go over the content quickly. I plan to read a few key chapters in full later.