This is the second guest post to appear here on SFS, and I’m excited to present a piece by Samir Balwani
. You will know him from Mashable
also. Don’t forget to follow him on Twitter
and join in the conversation.
here we go…
There’s been a lot of recent news about a series of new applications, everything from Wolfram Alpha to Google Squared. The latest buzz is a prominent signal that our quest to index and understand the web is far from complete.

But the new introduction of social media, has made our requirements for data mature. With the introduction of the social web, we need a number of new tools to track and collect the information.
One application, I think, could be very important, is understanding and segmenting the many conversations that occur around a single theme.
Need For Tool
The first step to understanding the need for the tool, is recognizing how the web has matured.
The early web included simple pages, with a relatively slow overall growth. Over time, web pages became more sophisticated; content grew and more pages were being introduced.
Now, everyone can easily create a website or create a page online. The web is growing quickly, and information is being updated. It’s become difficult to keep up with the almost real-time data.
We’ve come to a point where there is a need to recognize, track, and report multiple conversations. But, with the need comes multiple obstacles to overcome.
Obstacles
The first problem is simply the speed of indexing. Since data needs to be tracked in real-time, the information needs to be analyzed almost instantaneously.
Because of this, signals that usually take time to appear, can no longer be depended on. Consider things like inbound linking, imagine ranking data without that information. The ability to differentiate between spam and unique content becomes very difficult.
Secondly, recognizing exactly what a topic is, and determining the long tail conversations would be hard. The tool must be intelligent enough to understand context, sentiment, and synonyms. It also must recognize separate buckets and place articles accordingly.
Finally, the tool must understand the conversation and decide where each mention should be placed. On a basic level it should create and separate the conversations. For example, “Michellin tires” and “Michellin environmental impact”.
A more sophisticated tool would also be able to determine where a mention such as “Michellin tires environmental impact” would be placed, based on the content in the article.

Potential Ideas
In the end, the tool should be able to track all data around a specific search query, in real time. After assigning a trust value to it, the data should be separated out into multiple buckets.
As the tool matures, it should be able to assign important and sentiment metrics to each individual bucket.
The amount of online data continues to grow, creating more and more problems for indexing and understanding the information. We must be able to track lines of communication as they happen and recognize the difference between topics.
The web has grown, but the way we search has not changed fundamentally in many years. Now may be the time we see the paradigm shift, towards indexing and understanding the social web.



I’m excited by this blog post. So much that I’d think about building such a tool (or become a member of a team that builds one). Of course for a head start more requirements needed. Care to elaborate?
I’d love to elaborate, but I haven’t though that deep into it! Let’s connect via email and talk more in-depth.
My e-mail address is emre . sevinc at google’s mail. Hope to hear from you.
Too many links too fast can be a recipe for no progress. The more natural the linking pattern and the higher quality the links (PageRank of the linking page) the better. What do you think?