Alex Wright – The Web That Wasn’t
27 December 2007
Finally got around to watching Alex Wright’s Google Tech Talk entitled The Web That Wasn’t. Alex is the author of GLUT: Mastering Information Through the Ages, a book I don’t own yet but will be getting soon. The talk is based on the book and gives a tour of philosophical and direct precursors to the web. Fascinating stuff. He discusses Paul Otlet, Vannevar Bush, Eugene Garfield, Ted Nelson, and other. The talk is one hour long, but worth it.
Some of the lessons from looking at the history of early notions of networked systems:
- Top down and bottom up organization of information can work in concert with each other
- Two-way linking provides more information than one-way. (Of course, to this point I’d say that the web wouldn’t have taken off if two-way linking was mandatory.)
- Showing pathways and usage patterns is important information about information.
- Users can be authors and contributors
- The nature of interaction is more from the “oral” tradition
We can see some of these things on the web today, but looking at alternative systems (theoretical or real) still provides inspiration. It also reminds us that the “new” ideas and concepts–even things like Web 2.0–aren’t necessarily new. Overall, he points things in a broad perspective.
One point he makes quickly in the Q&A session: things like controlled vocabularies may have a place in bounded domains. The example he gives is MeSH. He mentions maybe there is a way to automate this, but the point is that we can learn from all the work done on developing controlled vocabularies to date. This mirrors a point I made in my presentation in Barcelona the Euro IA Summit and in an article for the ASIST Bulletin of the same title: Navigating the Long Tail.
Best Alternative Search Engines of 2007
15 December 2007
The AltSearchEngines blog recently issued a list of the top 10 alternative search engines for 2007. These highlight lesser-known search engines that rate well from an innovation, retrieval, or popularity standpoint. All of these are trying to distinguish themselves in different ways, and it’s quite exciting to see their inventive ideas. Here’s the list:
- Quintura – This puts results in a tag cloud alongside of a list of results.
- Answers.com – Aggregates results from well-known sources. I used this a lot while writing Designing Web Navigation.
- Exalead – Supports regular Boolean query formats.
- Omgili – Searches user-generated content such as forums and discussion groups to “find out what people are saying about everything and anything.”
- KoolTorch – Visualizes results (but I found the rollovers with blurbs of the results problematic)
- GoshMe – Still in beta. Instead of searching sites, GoshMe finds the most relevant search engines to find results. It’s a search engine about search engines.
- Aftervote – Combines results from Google, Yahoo! and Live Search and indicates ranking fromthose sites. You can also sort by any one of those engine’s rankings, as well as by Digg votes. You can then rank results yourself. I found this approach quite interesting.
- KartOO – One of the first to visualize results
- Dialogus – A Russian Answers.com-like search engine in English or Russian). Not sure about how well this one works, but they seem to be really trying. I quite like the waiting message after submiting a search: you really get a sense that something is happening on the back-end.
- Onkosh – Pptimized for searching Arabic language content.
Some trends I noticed:
- Word wheels - Answers.com is an example of this I often use to demonstrate a word wheel. These seem to becoming more and more popular, but many have usability problems. There are two kinds: those that show terms in the search engine’s index, like on Answers.com, or those that display recently typed in strings from the browser. Some (e.g., CiteSeer) grab things you’ve typed from a variety of input fields and go far back in time.
- Displaying results as text list – Well, this isn’t new, but when you’re doing things like visualizing results you don’t need a plain list of results anymore, right? That doesn’t seem that’s the case in every situation. For instance, Grokker (not in the list) used to only show their visualization. Now they offer the text list as the default. Maybe information visualizations complement plain old results lists and won’t replace them?
- Defaulting to a country based on your location - Lots of sites put me into their German version of the site automatically, even if I go to the dotcom address. This is generally annoying to me. Sometimes you can get to the dotcom site, but most now have a link at the bottom. Still, if I put in a dotcom address, please don’t swtich me automatically. I know–they need the eyeballs for advertising revenue in a fixed geographical region. This also applies to the Best Bet hits at the top of results: I see things in German even if I search from the dotcom site.
- Visual cues to foreshadow sites – Many search engines are now including thumbnails of homepages in the results list. Or, Quintura includes the site’s logo, for instance.
- Search refinement options – Most of the sites above start with a Google-like experience: a simple input field and a Go button. Then, in the results environment, people can refine and manipulate items in a number of ways. Making suggestions is very popular, particular spelling suggestions. But there’s also more and more search refinement suggestions using things like pseudo relevance feedback techniques or similar. Overall, the experience is: put a few words in and get to the results as quickly as possible; then refine them later.
Photosynth
13 December 2007
The best talk I saw at the Web 2.0 conference in Berlin this year was from Blaise Aguera y Arcas, Software Architect at Microsoft Live Labs. He showcased the latest updates of Photosynth, a new technology from Microsoft Labs that stitches photos together from any number of sources to create (the illusion) of a 3-D model of a building or landmark. If you’ve not seen this yet, do so. Here’s a brief video of Blaise showcasing Photosynth at the TED conference.
Basically, the software recognizes unique points on photos of a stationary geo-location and is able to align them with other photos. If you get enough photos in a collection, you effectively have a 3-D version of the original location. Take Notre Dame in Paris: you can point Photosynth at a collection of photos on Flickr, forn instance, and Photosynth compilies a 3-D rendering of the building. Sure, there are some ugly seams, but it’s a pretty amazing results nonetheless. With the ubiquity of digital cameras these days, we could potentially have every place on earth represented in 3-D on the web in the future.
The interesting thing would be to apply this principle to tagging. If you have a rich, complex folksonomy, would you be able to pick out unique descriptive points, and then be able to “sew” the terms together to get a clearer semantic picture of the objects being described? I suppose that’s what things like Twine are trying to do, in a sense.
Check out Blaise’s TED talk.
New Book on Text Mining
1 December 2007
Just came across a new book on text mining: Tapping into Unstructured Data: Integrating Unstructured Data and Textual Analytics into Business Intelligence, by William H. Inmon and Anthony Nesavich. I previewed it on Safari and downloaded a few chapters.
The book is not technical in the sense of showing programmers how to code, but it does focus on database architectures and the like. And when they talk about structured vs unstructured, they are really referring to database structures, not necessarily information architectures.
There is a chapter on visualization, but this is disappointing: it’s more about the process of creating visualizations than about whether the visualizations will be meaning to any human being. In fact, one of the examples used is a bar graph, where the bars themselves are blocks and they are stacked in a three-dimensional arrangement—two no-no’s.
One key point they make—a point I made in my presentation at the Euro IA Summit this year in Barcelona—is that for unstructured data to be useful, it often makes sense to bring it into a structured environment. This makes possible analysis and understanding that would otherwise not be possible.
The penultimate chapter is a brief case study on creating a corporate taxonomy. This company in question created one to help them tie together disparate IT systems and to allow analytics to take place at all. Taxonomies still have a place in the unstructured world.
The writing style is dry and not very engaging. And the summaries for each chapter (which I hoped to give me a better overview of the content) are very thin. So, I’m not sure I’d recommend you run out and buy the book, but since I have a Safari account it was certainly worthwhile to go over the content quickly. I plan to read a few key chapters in full later.
RSS Feed