Long-tail is rubbish!

 

long tail Long tail is rubbish!

http://edgewatertech.wordpress.com

 

I got to the Hitwise reportmini rdf Long tail is rubbish! through a link from Davemini rdf Long tail is rubbish! to a post on Search Engine Landmini rdf Long tail is rubbish! written by Matt McGee (convoluted journey I know).  I liked the post because I have tons of papers and things to read each day and the concise writeup pleased me greatly.  The report (isn’t that long actually) says that 5+ word length queries have increased 10% comparing January 2009 to January 2008.  That sounds like quite a lot doesn’t it?  

 

Matt McGee says that basically this means that as users become more sophisticated they use more words.  In my experience of evaluating user search and behaviour in a scientific environment, this means that users are having to “repair” the query more often.  This means that they need to reformulate more often to get what they want.  This is considered bad.  The fewer words in a query, the better.  The more there are, the lower the precision of the engine.

Jeff who commented on Matt’s post and said “Search engines have gotten better at ranking relevant long-tail results” and also said they convert better.  I don’t know if they convert better because I haven’t seen sufficient data and analysis to deduce that, but I’d welcome seeing any if it’s available.  I find it interesting also because some seo’s are sayingmini rdf Long tail is rubbish! long-tail converts and is very good.  

The science says otherwise.

Here are couple of evaluations that disagree, I have found none that so far that say that a SE performs well with long-tail queries:

“Understanding the Relationship between Searchers’Queries and Information Goals” by Downy, Dumais, leibling and Horvitz (CIKM 08)

They found that a user’s query is far more specific or general than their underlying information goal. This means that the search engine has more chance of success when the the relative frequency of the query matches that of the need.  Common queries require far less repair than specific and rare ones.  They performed extensive data analysis and found that search engines perform poorly on long-tail queries (and URLs as measured by SERP clicks and requeries).  

Their results show that “the probability of a requery drops (from almost 50% to less than 20%) as query frequency increases from the tail to 100+ occurrences”.  They also state that the differences in distribution of actions following a query observed seem to be related to the query frequency rather than the query length.  This means that search engines do far better on common queries.  This is due to the matching techniques that they use.  The best result happens when there is an alignment between the “frequency of goals and expression of those goals”.  

Here we see clear contradiction with the statement that search engines are doing fine with long-tail queries, because we see that the matching techniques used are not tailored to long-tailed queries.  I get excellent results when I search for the title of a particular paper which may be 12 terms long for example but this is very very specific, there can only be one result as the matching techniques are easily able to cope.  If I type in a few keywords, I don’t get my paper.  

For that particular paper to get a lot of traffic, many people much be searching specifically for it.  How does this work in online businesses?  Apparently long-tail converts.  It would interesting to get a substantial amount of this data and analyse it to see whether it is indeed true.  Maybe if you’re getting a return from these you should just leave it all alone if it’s not broken, but perhaps there are a lot of other variables to take into account:

Query and task complexity and frequency, how it impacts your users, query and goal rarity, seeing where you’re losing your long tail searchers…not just conversion alone for example.

Understanding the Relationship of Information Need Specificity to Search Query Length” by Phan, Bailey and Wilkinson (SIGIR 07)

They found that the longer the query, the more specific it was.  In this sense we can talk of repair but how many here type in a long query first off?  ”We found an average cross-over point of specificity from broad to narrow of 3 words in the query”.  “broad” and “narrow” queries are used to define “quality of current knowledge and knowledge state specificity”.  They found that there was a “statistically highly significant relationship (99% confidence level) between narrow/broad specificity and query length”.  The intersection of broad and narrow terms was observed to be at 3.  So they conclude that “as query length increases, the corresponding information need is more likely to be perceived to be narrow.”  They basically found that there is a correlation between query length and the degree of specificity of a query.  There are however such things as short but specific queries so they want to look at those next.

So, like me searching for a particular paper, people look for similarly specific things.  I don’t know if most of the population looks for that particular paper but I think not many.  How generic can long-tail be?

A Study of Query Length” by Arampatzis and Kamps (SIGIR 08)

Interesting paper describing analysis using query length, and fit power-law and Poisson distributions.  The main thing to take away is: “The relative steepness of the power-law indicates that users do not need many words to formulate information needs or that the diminishing value of adding words appears soon.”

This is an old paper from 1999 but you shoud read it because it describes good evaluation methods and the results haven’t changed much in 10 years:

Patterns of Search: Analyzing and Modeling Web Query Refinement” by Lau and Horvitz 

The goal of their research was to use Bayesian networks to infer the probability of a user’s next action.  Again we see that specialized queries contain more words than others, and queries become longer as query refinement occurs.  They break down the data into different categories like education and so on, and they found that the overall average query length was of 2.30 words, the longest were in the education category and had a mean of over 3 words per query.

From the Hitwise data we can see that there are more instances of queries of 5+ words per query for the long ones.  But, their data does show that the most prevalent queries are those of 1,2 and 3 words.  Those are still quite significantly the most used query lengths.

Are the search engines cleverer?  Well the longer your queries the better for me, I’ve spent 5 years trying to prove that people are ready for natural language queryingmini rdf Long tail is rubbish!, so please continue!  I work with conversational systems and the point of them is to retrieve information and present it to you in natural language.  The thing it shares with a search engine is the IR part (which is quite vast!).  It’s harder in natural language systems because they have more words to deal with, grammar, anaphora, etc…a standard search engine uses matching based on fewer dimensions – for both if the user needs to keep reiterating, it’s not performing well.

For some SEO’s the take is that you go for long-tail when you can’t rank for the competitive short queries.  They also maintain that long-tail does not convert high enough to be worth it.  I’m reading very conflicting things about this.

As a community of SEO practitioners, we should be doing our own experiments.  This doesn’t mean in isolation, each looking at our own data, but rather pooling together anonymous logs and other data so we can verify these things.  Obviously on a practical level this isn’t so easy I know, and we have to make money and not shuffle about.  In my experience though analysing the data well pays off in the long term.  As an SEO professional I have never had the opportunity to do this on that scale.  I’m certainly ready to though.

Related Posts:


17 Comments Add Yours ↓

  1. 1

    wow! thats a great post, if there is something in the seo community who should be able to grasp this concept and make sense of it, i would put my trust in your judgement.

    the question about if longtail convert, an example
    “sydney plumbers” 314,000 results
    “24 hour sydney plumbers” 90,000 results
    “24 hour sydney plumbers who can service potts point” 2,310 results

    The interesting thing is that google still displays adwords which would I assume shows that there is a reasonable level of searches performed which is similar. Google adwords rarely runs on low search volume terms/categories.

    The other interesting thing is the first result actually seems to show much more meta description in the results, which seems to show google is trying to offer assistance through more detail…

    Sydney Plumber | Blocked Drains Plumbers Repairs | Highlander Plumbing
    Click now for all plumbing repairs & 24 hour emergency plumbers. … Potts Point Plumbers … Finding a Sydney plumber you can trust is a challenge at the best of times, especially when you are up to your … solid reliability while delivering the best plumbing repairs, products and installation services in Sydney. …

    The answer is that final longtail query is someone who might be standing in a foot of water, it wont happen every day, but I would think that there is someone ready for a conversion…

  2. 2

    As we had talked before I view long tail differently than many people do.. My gut business instinct and many years of watching things tells me that actively targeting long tail search terms is rarely, if ever, cost effective and a valid way to spend marketing dollars when it comes to physical products.. Yes, it tends to convert more, but the farther you go down the tail the more the returns are diminished..

    It makes for a nice add-on to a more mainstream marketing plan, but to make it the primary campaign simply doesn’t have the ROI in far too many cases.. Besides, a lot of people that trot long tail do it because they can’t rank for short tail results.. :)

  3. 3

    Interesting stuff, and the crux of search marketing in my view, well worth more examination.

    My instinct says that the more words in a query, the more likely it is the person knows what they want. The more they know what they want, the less competition in search results, as you say – this doesn’t necessarily mean the engine is worse, it may just mean there aren’t as many results/choices and the person searches for something slightly different, hence the more re-searches.

    All of which matches the data you quote above. I think. If I haven’t misunderstood. So, the science isn’t exactly disagreeing in that it just means it’s easier to find something you want if it’s widely available.

    The more specific the query, the less there are making that query and so not enough volume to satisfy hungry businesses – that’s also true :-)

    I think long tail does convert, if the content exactly matches a long query, relevance pays.

  4. 4

    I’m with David and yes the whole longtail thing can get out of hand. What people search for, is what people search for. What’s facinating to note, is that I do specific searches myself, I’ll even type in a whole phrase sometimes 10-20 words in length and surprisingly I often come up with what I’m looking for. It’s only when I do not find what I’m looking for from a longtail search, that I go back to a more general search and hunt through the pages. But then again, I generally make queries about subjects that are well documented. In other words Google is only as good as the content that is provided to it. So for a search to have value, it must be relevant not just to the one that is doing a very desciptve search, but it must also have relevance to Google itself as we all know, that Google wants to remain number one, which means it’s job is to provide results that please.

  5. 5

    Oh! Another thing. It really is all about linking, take for example the fact that all the domain addresses that have been created are already in use, it is to my amazement that the juggling of information that is in cyberspace doesn’t just completely vanish. But that is not the protocol. The “Cloud” for example is a unique way of dealing with the vanishing issue as cyber realestate becomes harder to come by. Essentially I envisage a time in the not to distant future, that cyber storage by the terabyte will become available, which means we will literally be able to be able to pull gigabytes of data from the web as fast as your connection will allow it. Me thinks though that this will happen as we get close to 128 bit processing power.

  6. 6

    Of course the user’s info need is different than the query, because queries as understood today are expressions made up out of words, not out of meanings. If I search for “soccer”, it’s too narrow because “football” is a synonym. If I search for “soccer OR football” my query’s now too broad, because “football” can also mean Australian or American rules football (and probably lots of other things having to do with hackey-sacks, too). But maybe either one will be OK if all I want is the World Cup bracket and schedule, given how community ranking works. This is a problem with language, understanding user’s intents and guessing typical requests, not the length of the query.

    Is there a correlation between click-through on search results and how vague the information need is? And if vague intentions are correlated with short queries, that’ll explain part of the results reported in these papers.

  7. admin #
    7

    Hi Guys,

    first, thank you for your insightful and valuable comments. Dave, thanks for the example, and yes, there’s always going to be someone looking for extremely specific things. Feydakin, I agree that it’s an additional feature of an overall market plan, but not at the center of it. Yes that’s right! I do the same thing as you especially with papers Lawrence, I have 10+ query words because it’s the title of the paper, and I get less specific if I can’t find it, although if I get less specific usually I don’t find it anyway. It’s somewhere there, but I drop out and just ask someone to send me a copy! Bob, that’s exactly what I think too, it’s about user intent, which is hard to figure out because of the language issues. The length of the query isn’t relevant. Marketeers should probably be looking at that kind of thing rather than long-tail, because clearly it’s not amazing. I think it depends on the nature of the query, how many star with long-tail, like me and my research papers. Then if I have to go more vague, it’s also repair but the reverse!

  8. 8

    One reason for the increased word length of queries in US could also come due to the Google Suggest feature.
    Just by typing “New York Hotel” on the Google page I get some suggestions with up to 6 words (for example “New York Hotel near Central Park”). Since Google decided to roll out Google Suggest for the US users and they always do a lot of user testing, I assume that the click rate for the suggestions is good enough. Therefore more users could click on the suggestion (which has often more words), as start the search with their starting phrase.
    After clicking a suggestion it is also easier to add some words and do another search with even more words.

  9. CJ #
    9

    Good Idea RudolfR,

    the results and experiments though in the papers did not include Google suggest, but it’s definitely a good observation. This is more about re-formulation rather than suggestion, so people typing something in and then typing another things in and so on, until they decide to stop. It would be interesting to see some research using Google suggest. I’ll see if I can dig anything up.

  10. 10

    I would tell you based on my experience of working with enterprise level clients that have huge brands and that are relevant to thousands of possible keyword phrase permutations, the long tail SEO strategies that ensure that pages target the maximum number of possible permutations is extremely important. Many of these clients get the majority of their search traffic from the long tail despite being ranked very highly for a number of competitive keyword phrases. The long tail opportunity may be limited for web sites that are targeting a smaller piece of the pie. We could debate that. But certainly for Web sites that have a large potential keyword universe, the long tail is ignored at ones own peril.

  11. CJ #
    11

    I don’t dispute that. I think it’s interesting to see how what people like you are seeing, is not what the scientific evidence says. The point I raise is that there’s a problem here somewhere. I’m not saying anyone is wrong.

  12. 12

    I referenced this post in a post I did for my company’s blog titled Do Long Tail Keywords Convert Better? Like I say in the post I’m more familiar with paid search rather than SEO so I used keyword data from one of my paid search clients and applied it to the Hitwise data. I’d be interested to see what anyone here has to say!

  13. CJ #
    13

    Thanks Nate – great to see some data, and PPC data is still very valid here imho. That’s good analysis and Love your monkeys. From my own research on ngrams (groups of words) in various scenarios, post 4 the engine doesn’t do as well generally. I haven’t done research like the guys listed in my post though. From an seo perspective I have to agree with you and your paid search perspective, which is really interesting don’t you think? Keep it coming !

  14. 14

    Could you recommend any specific resources, books, or other blogs on this specific marketing topic?

  15. CJ #
    15

    This not really a topic for a book, there are a myriad of blogs who covered this some of which I mention here.

  16. 16

    I follow your blog for quite a long time and should tell that your articles always prove to be of a high value and quality for readers.

  17. CJ #
    17

    Thank you!


2 Trackbacks/Pingbacks

  1. Google branding algorithm fuss | Science for SEO 28 02 09
  2. Search engines and long queries | Science for SEO 06 03 09

Your Comment






© 2009-2013 Science for SEO All Rights Reserved -- Copyright notice by Blog Copyright

SEO Powered by Platinum SEO from Techblissonline

Twitter links powered by Tweet This v1.8.1, a WordPress plugin for Twitter.