Semantic Web Using R

6/8/2019

Semantic Web

Principally, the Semantic Web is a Web 3.0 web technology - a way of linking data between systems or entities that allows for rich, self-describing interrelations of data available across the globe on the web.
In essence, it marks a shift in thinking from publishing data in human readable HTML documents to machine readable documents. That means that machines can do a little more of the thinking work for us.

How Does It Differ From The Web As It Is Today?

Today, much of the data we get from the web is delivered to us in the form of web pages - HTML documents that are linked to each other through the use of hyperlinks. Humans or machines can read these documents, but other than typically seeking keywords in a page, machines have difficulty extracting any meaning from these documents themselves.

Liberating Web Databases From Their Old Chains

The web contains lots of information, but typically the raw data itself isn’t available - rather only HTML documents constructed from data, if a web site is generated from a database at all.
So the semantic web seeks to change the landscape of the internet with regard this problem in a number of ways:
- Opening up the web of data to artificial intelligence processes (getting the web to do a bit of thinking for us).
- Encouraging companies, organisations and individuals to publish their data freely, in an open standard format.
- Encouraging businesses to use data already available on the web (data give/take).

Liberating Web Databases From Their Old Chains

In essence, taking all that information published in HTML documents in different places, and allowing the description of models of data that allow it all to be treated - and researched - as if it were one database.
The benefits to the automated research of all the data humanity has to offer on the internet in comparison to today’s tools and software are tremendous.

Semantic Data Modeling

Comparing The Popular Data Models

Table in (http://www.linkeddatatools.com/semantic-modeling)

Why Include Semantics In Data? Knowledge Integration

One of the primary benefits of adding semantic meaning to your data is that it can be branched across domains of knowledge automatically.
In our example, two websites are started independently from each other. One site hosts information on current and historic Oscar winning films; the other a large database of biographies of Hollywood actors and actresses.
Both contain complementary information in their website databases. We will cover firstly how information sharing between these sites could happen without the use of semantics. Then, we will describe how the same information can be shared between the two sites - and potentially beyond - with the use of semantics.

Why Include Semantics In Data? Knowledge Integration

Our two sites, one fronting an MS SQL database of all Oscar winning films, and another one fronting a MySQL database of Hollywood actors, reside at http://www.oscarwinners.fake and http://www.actorbiographies2go.fake respectively. The two sites were started independently, and do not collaborate.
The Oscar Winners site lists, as its name suggests, all of the Oscar winning films ever produced and also a list of actors and actresses who starred in them. However, it doesn’t hold any other actor information other than their name and date of birth.
The Actor Biographies site contains a complete listing of many current and former Hollywood actors, including a complete biography, plus a list of movies that they starred in. But, it does not contain any film plots, or screenshots of the films.

Why Include Semantics In Data? Knowledge Integration

Let’s look at how these two sites might collaborate under their current, more traditional data model:

Obviously, the users of http://www.oscarwinners.fake would benefit from being able to click on the name of a starring actor and find out more about them - this information is stored in the MySQL database at http://www.actorbiographies2go.fake.
Likewise, the users of http://www.actorbiographies2go.fake would benefit from being able to click on the names of films that the actors starred in and find more information. This is stored in the MS SQL database at http://www.oscarwinners.fake.

Why Include Semantics In Data? Knowledge Integration

Any sharing of data between the two sites cannot be done by joining tables in their databases. Firstly, they have been independently designed in the first place and so their primary keys referring to individual actors or films in both databases will not be synchronized. They would have to be mapped. But secondly, they are using different database server systems which are not cross-compatible.
To collaborate using their current databases, the owners of either site would have to decide on a common data format by which to share information that they could both understand by using a common film and actor unique ID scheme of their own invention. They could do this, for example, by creating a secure XML endpoint on each of their websites from which they can request information from each other on demand. This way, their shared information is always up to date.

Why Include Semantics In Data? Knowledge Integration

Let’s look at how these two sites might collaborate under their current, more traditional data model:

This sort of information interchange across incompatible, independently designed data systems takes time, money and human contextual interpretation of the different datasets.
It also is restrictive to the data domains of only these two websites, any further additions to their knowledge from elsewhere will demand similar efforts.
It requires humans to understand the meaning of the data and agree on common formats to collaborate the two databases appropriately.

Sharing With The Semantic Web Model (The solution)

Vocabulary - A collection of terms given a well-defined meaning that is consistent across contexts.
Ontology - Allows you to define contextual relationships behind a defined vocabulary. It is the cornerstone of defining a knowledge domain.

Figure from (http://www.linkeddatatools.com/semantic-modeling)

This may be done by the two sites adopting the same base ontology, or a common vocabulary, for expressing the meaning behind the data they expose, and publishing that data on a queryable endpoint so that the two sites can communicate with each other across the web.

Sharing With The Semantic Web Model (The solution)

With this standard vocabulary in place:

The two sites can now query each other using the same terms.
The Oscar Winning Movies site can now query the actor names on the Actor Biographies site on-demand and gain more detail about a specific actor or actress that has starred in a movie.
The Actor Biographies site can now query the film plots on the Oscar Winning Movies site on-demand and gain more detail about films an actor has starred in.

Sharing With The Semantic Web Model (The solution)

With the contextual relationships defined in a formal web ontology, further related information about the actors or films, e.g. film locations, other news events happening on the same day of filming or birthdate or the actor, or films made by the same director, may be found via the linked standard terminology without the user even imagining that information initially existed.
This happens without the need for transformation, mapping, or contracts being set up between the two sites. It all happens through semantics.

Metadata Initiatives

The cross-domain knowledge sharing is not just apply to websites, but also within the knowledge bases built by organisations. Semantic web technologies need not be restricted to applications or information published on the web.
Although there may be a little more groundwork required when first setting up a semantic database, the benefits for ease of cross-domain integration from across the globe and the time saved and ideas gained from doing so are, potentially, highly significant.
The good news is we often won’t have to go through the effort of defining and sharing our own ontology for your particular domain of knowledge. There are many popular, standard ontologies already distributed on the web which we can adopt, and if necessary extend yourself.

Metadata Initiatives

Standard vocabularies, or formal ontologies representing terms within a domain of knowledge, are already available freely from various organisations dedicated to creating standard vocabularies for a range of subjects - for example media terms, or biomedical terms, or scientific terms. Below are some examples:
- Dublin Core Metadata Initiative (DCMI) - Creates ontologies for a range of subjects, particularly focusing on common, every day terms and terms important in media.
- Friend Of A Friend (FOAF) - focuses on developing a standard vocabulary/ontology for social networking purposes.
- OpenCyc - An ontology of everyday, common sense terms.

Linked Data Objects (Graph Data)

The semantic web can seem unfamiliar and daunting territory at first. If you’re eager to understand what the semantic web is and how it works, you must first understand how it stores data. We start from the ground up by outlining the graph database - the data storage model used by the semantic web.

Graph Database

Image from (http://www.linkeddatatools.com/introducing-rdf)

Graph Database (Demo)

Bengie is a dog. Bonnie is a cat. Bengie and Bonnie are friends. Image from (http://www.linkeddatatools.com/introducing-rdf)

The RDF Statement (Triple) (Demo)

In terms of the simple graph above, the:

Subject is the animals Predicate (property) is name, friend, animal type Object is Bengie, Bonnie, Dog/Cat

Graph Database - RDF Format (Demo)

will be added later use (https://www.w3.org/RDF/Validator/) as well

Graph Database - R Format (Demo)

Demo will be added later

Graph Database (Workshop)

** Margot is a journalist woman, 32 years old, married to Arthur who is a man with whom she had two children: Marie who is a woman and Simon who is a man **

Graph Database - RDF Forma (Workshop)

During the meetup

Graph Database - RDF Forma (Workshop)

give them hints in R file

RDFS and OWL

RDF data can be encoded with semantic metadata using two syntaxes: RDFS and OWL.

Building A SPARQL Query

Build Example Dataset
Two Excercieses (select all and select one record in R and RDF)

What About CREATE, INSERT, UPDATE?

Discussion