Link building a fundamental part of SEO work but the analysis methods for these could be further expanded. To me links as a entire entity are like a molecule in that everything is linked together and has an effect – “the simplest structural unit of an element or compound” (WordNet). It is also “a unit of two or more atoms held together by covalent bonds” (wikipedia). But obviously there’s a reference to DC comics where Molecule is a superhero who work with The Atom, another superhero. The atom can generally be thought of as each link and the molecule as the collection of links.
There’s a lot of information to be mined from a link analysis on a site let alone on an entire search engine index. Here I’ll focus on different metrics that can be used and information that can be retrieved. This additional information gives more insight into what is going on within a website and around the inbound links. This might also provide you with a good way to predict your users’ needs.
As always this is not a comprehensive list and the methods I’ve listed aren’t necessarily exactly right for your own needs, the idea os that you tweak them and take from them what you think is useful. If you want to try them as I’ve described them, you might also find some useful things. No method is perfect, which is why researchers still have jobs partly, and statistics can also lie. Remember “rubbish in, rubbish out”.
The web vs your site:
Link analysis algorithms such as PageRank and HITS use eigenvector calculations to identify authoritative pages based on hyperlink structures. The very basic explanation is that a hyperlink in page A pointed to page B stands for the recommendation of page B by the author of page A.
Within a website the link structure is very different because links are often organised in a sotr of hierarchy. The wider website sphere (the site plus inbound links) is also different because it includes a degree of the PageRank variables but also some of these links are navigation al and others informational. Users come to site through these different inbound links for a variety of different reasons which may or may bot lead to a different behaviour on the site. There are also “implicit recommendation links” which are links that are not obviously recommendations.
The idea of “random walk” on a website is actually quite different for than on the web at large. Actually the best definition at a very basic level that I have heard on the random walk algorithm is “What people do when they’re bored”. On a website the intent of the user should be more closely defined so you’ll find that when they go to a specific product for example, they’re there not because they’re bored.
An easy first data analysis:
- Filter out the access entries for embedded objects such as images and scripts.
- Use ip addresses to identify users (and assume all consecutive entries from that IP are form the same user)
- Get rid of all IP entries that exceed a respectable and expected threshold (eliminate bots)
- Eliminate consecutive repetitions in a session
- Process your site and extract all your links
- Check your user logs and run a machine learning algorithm* on approximately 4/5 of the data (this allows you to see what the general user patterns are)
- Test on the remaining 1/5 to see if the patterns are accurate
- The usage patterns learnt can then be applied to your future logs to see what behaviour is the most common
- Group them according to their type
- It also helps you discover implicit links and navigational ones
- Include temporal patterns too adding to the dimensions in the above data
- For inbound link analysis follow the above method and add each link to the user path through the site
- Entries that end in conversion might be either implicit or navigational. Finding out helps you assess how users come to you and for what reasons.
How authoritative are your inbound links?
- produce a log of all your inbound links which includes their topic also
- Topic tags can be extracted directly from the blog and also using a simped named entity extraction and noun clustering (the most common ones will indicate topic)
- Find out which topics most commonly provide you with links
- check these keywords (that the tags give you) and view what kind of sites appear
- The SERPS will allow you to gage basically which sites are popular
- Do you have links from these? If you do count them as authoritative
You can expand this to include the same analysis for the sites that link in to you for more information still.
Obviously the amount of traffic you get as result of having particulr links pointing to you is a strong measure of the authority of the link in relation to your site alone. Be sure to look at why more athoritative resurces in relation to the web don’t provide as much traffic for example.
Power law distribution:
“A power law is a special kind of mathematical relationship between two quantities. If one quantity is the frequency of an event, the relationship is a power-law distribution, and the frequencies decrease very slowly as the size of the event increases.” (wikipedia)
- The Pareto principle (80-20 rule): 80% of the effects come from 20% of the causes.
You can apply this to links and also your tags, named entities/nouns to see of anything useful pops out.
- Zipf law: given your tag or noun /named entity corpus, the frequency of any one of them is inversely proportional to its rank in the frequency table. This means that the most frequent one will be found twice as often as the second most frequent word, which occurs twice as often as the fourth most frequent word, and so on.
It’s usually plotted on a log-log graph using x for rank and y for frequency.
Much more complex method do exist to identify characteristics for Web pages such as in- and out-degree PageRank for example for that’s a whole new world of head scratching.
Why should you care?
Having the best possible information about your site and what affect it has in the virtual world around it can provide some extremely valuable information for you. It can tell you where your best niche is, where best to target your link building efforts, and what your users are really looking for. Their behaviour on your site and what the sites ther are referred from have in common can help you a great deal.