There’s a post over at Search Engine Land by Kim Krause Berg which highlights recent research being done in search and use behaviour. It’s a nice short summary, very accessible and quotes some pretty authoritative sources such as Jim Jansen for example.
I wanted to post about this because this is actually what my PhD is about. I have done 6-7 years of research into this area of research. My journey has been:
Automated translation ==> Information retrieval ==> Cross-language information retrieval ==> Natural language understanding & Generation ==> HCI ==> Conversational systems for assisted search and information acquisition.
“As a site owner, how do you structure your information architecture for easy search? As a marketer, how do you know what words to optimize for and when you dig up the top used phrases, do you make a separate page for each one? Wouldn’t that make for a gigantic web site that will confuse everybody? Welcome to information overload.”
I can assure you that this is correct. It’s the motivation for my work and there has been quite a lot of research since at least as early as the early 90′s on information overload. My own research shows that this is also true and I have factual evidence for this. I can’t however reveal my source data as I am bound to an NDA, but my evaluations were carried out on a very very large commercial site. I also analyse a lot (seriously, it was overwhelming!) of customer service logs and the comparison between search es on the site and actual customer motivation was vastly different. Regardless of what is being said, people aren’t that great at formulating queries, and this is largely due to the awkward method of querying. Natural language search is clearly the way to go and I am absolutely sure of that because I’ve seen it first hand.
David Robins is correct of course, users change their search focus often and I’ve also observed this. In Fact “shift in focus” occurs often even in a closed domain interestingly enough.
Amanda Spink says that she found that 22% of queries were reformulated. In my area we call this “repair”. I’ve seen it to happen more often than that even. Amanda has been doing research on this since forever! She started out working on a load of excite queries so that should give you some idea of how long.
The system is called “KIA” (Knowledge interaction agent). In my research, it being a dialogue system, I was able to guide the user through the information seeking activity. This happens automatically of course. The system figures out where they are and what sort of thing they’re after and then helps guide them through the process. The one thing that is a bit different from standard IR systems is that my system tried to give an answer to the question rather than a set of documents. It can also provide a link to the right part of the site. It might be a form for example, or a whitepaper, or simply further product information.
User personas are not something that I found terribly useful. My system relies heavily on machine learning and it learns through experience. It is able to know where the user is going because it’s seen this types of querying behaviour before. It is however able to adapt to the “repairs” and if it’s a totally new kind of query path, it’ll add it to its experience.
My system of course, being different in architecture to a standard search engine does not store a huge amount of data in an index. It stores knowledge in an ontology, and this translates as its “experience of the world”. The key thing here has been getting it to “understand” human language. This is possible but rather than the controversial “understand” I like to use “identify”. I used cognitive linguist concepts rather than standard NLP and Grammars. This means that a lot of language ambiguity is not resolved but avoided. If there’s a traffic jam, take the back streets.
My work has led me into philosophical debate as well. The Turing test is an evaluation method which involves a human chatting with a machine. If the human thinks s/he is talking to another human, the system has passed. Turing maintained that this meant that the machine was “intelligent” but this has been debated ever since. I am working on different evaluation methods because I believe that the Turing test while valid some time ago, is outdated -gasp! Beliefs back in the day are not those of today and the advances in this kind of technology has moved beyond that due to the function we expect these machines to fulfill.
“The question of whether a computer can think is no more interesting than the question of whether a submarine can swim”, said Edsger Dijkstra and I agree these days.
Kim also says that search marketing is more and more to do with understanding user behaviour which I totally agree with. I have observed this first hand and over a few years as well.
The Your Future:
The future is not made up of a list of results on a web page. The future of search is question-answering in natural language with links to documents of interest. Summarisation research is highly important for this because you wouldn’t be looking at a list. You’d be looking at a little paragraph summarising information drawn from knowledge and online resources, reworded and put together nicely so that you can get on with your day and not trawl through search engine results. The semantic web is super important because it allows machines to connect with those online resources.
Here’s an example:
You - I want to know which surfboard is right for me (SE query would be something like “which surfboard” or “surfboard” or whatever)
Machine - It depends on how tall and heavy you are and also on your level of ability. If you are a novice you will need a larger board until you develop a good techniques. If you are an intermediate to advanced surfer you can opt for a shortboard and there are lots of different types. You can read more about which board would suit you here http://xyz.com, there is a chart.
<beneath: More resources listed>
User - I’m 5ft5 and weight 130lbs and I’m a beginner
Machine - blah blah blah ect…
This isn’t possible right now due the scale issues, but it is a reality for the future. These systems do work in closed domains and various different techniques other than my own are at work.
SEO, online marketing:
I can’t say it enough, get into the semantic web, write excellent informative content and always look to the future. Just because it’s not going to earn you a buck now doesn’t mean that it isn’t important. Or can it be profitable….hmmmm….
I wrote a paper on user interaction with conversational systems for HCI International (Springer) which you might find interesting if you’re into this stuff. There’s more information about my research on the “About me” page. I’ll be adding more information about my research over the next few months including a ppt explaining what I’ve done and how. Of course I’ll also let you know what failed and where the difficulties were!