Alrighty, this post addresses the confusion I see about “semantics” and “the semantic web”. It’s not a complete misunderstanding, but rather an assumption that the semantic web = semantics. This isn’t quite right. I want to illustrate the difference by first providing a really really brief intro to semantics but long enough for people to appreciate its complexity. Then I’m going to look at the semantic web and how that differs. In brief the semantic web does use semantics but not in the way that semantics in linguistics are used. It’s sort of a high level notion if you like. Adding meaning to the web.
What are “semantics”?
This refers to meaning in language (or code or anything else). It uses syntax and pragmatics as well as contextual information to provide the meaning of the text or even audio stream if you want to use that. It’s not just about finding similarities or context between words to establish meaning, but also phrases, sentences and texts. Semantics is a branch of linguistics. The different linguistic units are homonymy, synonymy, antonymy, polysemy, paronyms, hypernymy, hyponymy, meronymy, metonymy, holonymy, exocentricity / endocentricity, linguistic compounds.
It’s not as simple as just looking at the sense of words. It is a rather large field of study and includes truth conditions, argument structure, thematic roles, discourse analysis and all of these are linked in to syntactic analysis also.
Thematic roles (Theta roles)
These serve as an example to show you what else comes into semantic analysis other than what you can find in a dictionary or thesaurus. There has to be further analysis such as this to determine context in any precise way (or as precise as possible). These are the different roles that we identify:
Agent: the entity that performs the action
Theme: the entity undergoing an action or a movement
Source: the starting point for the movement
Goal: the end point for a movement
Location: the place where an action occurs
Instrument: the object with which an action is performed
Different types of semantic analysis:
We also look at what’s going on in a sentence by looking at predicates and arguments (this is a blog post so I won’t go into any detail):
For example: John ate the cake
Argument: John, cake
There is a well known sentence which illustrates the ambiguity in human language and the difficulties we have defining meaning. It was written by Noam Chomsky in 1957 (he’s the man in the picture):
“Colorless green ideas sleep furiously”
The grammar is totally right, but the semantics make no sense at all. It was originally used as an example of how probabilistic grammars can’t work properly, but it’s also useful for putting to the test other ideas. In this case thinking about this type of sentence will help you work out how and why semantics are hard.
What is the semantic web?
It’s a common framework allowing information to be shared and reused. Information is stored in machine readable formats.
“The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” – Tim Berners-Lee
The semantic web adds a layer of meaning to the existing web. The semantic web is not web 3.0 it is an extension of web 3.0. RDF, OWL and so on add information to resources which allow for machines to “make sense” of them. These things do not use the complex semantics described above.
The semantic web is not about “semantics” as such if you see what I mean. If you use RDF and so forth you will appreciate the distinction.
There’s a really nice article at Alt search engines but a couple of things drew my attention and sort of inspired me to write this post. Marissa Mayer said a successful search engine didn’t need “semantics”. This does not refer to semantic web stuff in particular. Her saying this is far more interesting than that don’t you think?
Google does not have a copy of every page on the web and never did, no search engine has. There’s a whole lot of research on “Where to end the crawl”. It’s not practical or efficient to collect every page on the web.
“They’ve taken the worst part – the syntax – and thrown away the best – the decentralized vocabularies of terms. It’s like using microformats without the one thing they do well: the simplicity. This is why I believe Google missed the point. They made the mistake of treating RDFa as an alternative to microformats, which completely ignores its true strength as a structured data format.”
I understand the fear about how everyone has to manually mark-up every single page and so on. The Google way, as they say, you have to. This doesn’t go for the RDFa stuff everyone else has been doing. In fact there’s even an RDF plugin you can use for WordPress, it does it all for you. There are lots of tools out there, you can look at the list I provided as a start. I think that Google will end up picking up on all sem web formats eventually because there is already a lot of that out there.
Anyway, it’s a cool post, read it.
This post is a bit long and a bit “here and there” but I hope it conveys what I hoped to. More semantic web posts will follow, there a lot more to it than RDFa and after lots of investigation into this I chose to use another semantic web idea instead.