“Semantic web” != “semantics”

noam chomsky big 215x300 Semantic web != semantics

Alrighty, this post addresses the confusion I see about “semantics” and “the semantic web”. It’s not a complete misunderstanding, but rather an assumption that the semantic web = semantics. This isn’t quite right. I want to illustrate the difference by first providing a really really brief intro to semantics but long enough for people to appreciate its complexity. Then I’m going to look at the semantic web and how that differs. In brief the semantic web does use semantics but not in the way that semantics in linguistics are used. It’s sort of a high level notion if you like. Adding meaning to the web.

What are “semantics”?

This refers to meaning in language (or code or anything else).  It uses syntax and pragmatics as well as contextual information to provide the meaning of the text or even audio stream if you want to use that.  It’s not just about finding similarities or context between words to establish meaning, but also phrases, sentences and texts. Semantics is a branch of linguistics. The different linguistic units are homonymy, synonymy, antonymy, polysemy, paronyms, hypernymy, hyponymy, meronymy, metonymy, holonymy, exocentricity / endocentricity, linguistic compounds.

It’s not as simple as just looking at the sense of words. It is a rather large field of study and includes truth conditions, argument structure, thematic roles, discourse analysis and all of these are linked in to syntactic analysis also.

Thematic roles (Theta roles)

These serve as an example to show you what else comes into semantic analysis other than what you can find in a dictionary or thesaurus. There has to be further analysis such as this to determine context in any precise way (or as precise as possible). These are the different roles that we identify:

Agent:  the entity that performs the action

Theme: the entity undergoing an action or a movement

Source: the starting point for the movement

Goal: the end point for a movement

Location: the place where an action occurs

Instrument: the object with which an action is performed

Different types of semantic analysis:

Argument structure:

We also look at what’s going on in a sentence by looking at predicates and arguments (this is a blog post so I won’t go into any detail):

For example: John ate the cake

Predicate: ate

Argument: John, cake

Weirdness:

There is a well known sentence which illustrates the ambiguity in human language and the difficulties we have defining meaning. It was written by Noam Chomsky in 1957 (he’s the man in the picture):

“Colorless green ideas sleep furiously”

The grammar is totally right, but the semantics make no sense at all. It was originally used as an example of how probabilistic grammars can’t work properly, but it’s also useful for putting to the test other ideas. In this case thinking about this type of sentence will help you work out how and why semantics are hard.

What is the semantic web?

It’s a common framework allowing information to be shared and reused.  Information is stored in machine readable formats.

“The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” – Tim Berners-Lee

The semantic web adds a layer of meaning to the existing web. The semantic web is not web 3.0 it is an extension of web 3.0. RDF, OWL and so on add information to resources which allow for machines to “make sense” of them. These things do not use the complex semantics described above.

The semantic web is not about “semantics” as such if you see what I mean. If you use RDF and so forth you will appreciate the distinction.

Some clarification:

There’s a really nice article at Alt search engines but a couple of things drew my attention and sort of inspired me to write this post. Marissa Mayer said a successful search engine didn’t need “semantics”. This does not refer to semantic web stuff in particular. Her saying this is far more interesting than that don’t you think?

Google does not have a copy of every page on the web and never did, no search engine has. There’s a whole lot of research on “Where to end the crawl”. It’s not practical or efficient to collect every page on the web.

Google did introduce Rich snippets, but honestly, it wasn’t groundbreaking and done in a weird way as well. Ian Davis called it a “Damp Squib“:

“They’ve taken the worst part – the syntax – and thrown away the best – the decentralized vocabularies of terms. It’s like using microformats without the one thing they do well: the simplicity. This is why I believe Google missed the point. They made the mistake of treating RDFa as an alternative to microformats, which completely ignores its true strength as a structured data format.”

I understand the fear about how everyone has to manually mark-up every single page and so on. The Google way, as they say, you have to. This doesn’t go for the RDFa stuff everyone else has been doing. In fact there’s even an RDF plugin you can use for WordPress, it does it all for you. There are lots of tools out there, you can look at the list I provided as a start. I think that Google will end up picking up on all sem web formats eventually because there is already a lot of that out there.

Anyway, it’s a cool post, read it.

This post is a bit long and a bit “here and there” but I hope it conveys what I hoped to. More semantic web posts will follow, there a lot more to it than RDFa and after lots of investigation into this I chose to use another semantic web idea instead.

Related Posts:

  • No Related Posts

Your Comment






© 2009-2013 Science for SEO All Rights Reserved -- Copyright notice by Blog Copyright

SEO Powered by Platinum SEO from Techblissonline

Twitter links powered by Tweet This v1.8.1, a WordPress plugin for Twitter.