Wikidata introduction

1.1 Why wikidata was created?

1.2 What is Wikidata?

1.3 Use of Wikidata by Google

1.4 The Wikidata Community

1.5 The Wikidata project Covid-19

1.6 The wikidata architecture, an introduction to the “semantic web”

1.7 The wikidata data model

1.8 SPARQL
An overview of R libraries to query Wikidata

Wikidata introduction

back to the TOC

Why wikidata was created?

connect together with one unique identifier all wikipedia pages related to one concept written in x languages: example: Pneumonia
make the multi-langual update of wikipedia pages much easier for structured information

What is Wikidata

It is a giant graph of knowledge
Itis completely free, even for commercial usage (CC0)
Anybody can contribute
It can be read and edited by both humans and machines
Covers all domains of knowledge
Extensive item history, talk pages, projects, users
Integration with the semantic web
High performance query engine (SPARQL)
stable! Long term support not dictated by funding cycles
Actively developed
Already has large number of active users, editors, contributors
Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wiktionary, Wikisource, and others.
Wikidata in brief

Use of Wikidata by Google

Google to wikipedia

Wikidata to Google

The wikidata community

The Wikidata project Covid-19

The wikidata architecture, an introduction to the “semantic web”

Wikidata has a very advanced architecture. It implements the so called “linked open data (LOD)” architecture.
LOD is the implementation of a fomral knowledge graph, formal meaning here that it can be processed by a computer
Only when you understand the underlying architecture and concepts, you will use the full potential of wikidata.

The wikidata datamodel

The triples

The two basic component pieces of Wikidata are items and properties.
An item is a thing - a concept, object or topic that exists in the real world, such as “Rush”.
These items each have statements associated with them - for example, “Rush is an instance of: Rock Band”. In that statement, “Rock Band” is a property: a class or trait that items can hold.

SPARQL

Coordinates of the birth places of people named Antoine

An overview of R libraries to query Wikidata

(taken from a blog of Envel Le Hir)

This code tutorial is taken from the code of OpenVirus. See the discussion here Thomas Shafee is the author of the code used

Let us use some naming conventions (suffixes) for wikidata specific objects:

.qr = Query result(s) .qid = Wikidata QID number(s) .qs = Wikidata item(s) summary .q = Wikidata item(s) in full .p = Properties of a Wikidata item(s) .wh = Wiki page in html .wx = Wiki page in xml

Let us define some helper functions to test the nature the wikidata type of chain of characters:

is.qid  <- function(x){grepl("^[Qq][0-9]+$",x)}
is.pid  <- function(x){grepl("^[Pp][0-9]+$",x)}
is.date <- function(x){grepl("[0-9]{1,4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}",x)}
is.quot <- function(x){grepl("^\".+\"$",x)}

WikidataR

The package WikidataR is an API Client Library for Wikidata
sources can be found on github
authors: Oliver Keyes, Serena Signorelli & Christian Graul
last commit December 2017 :-(

as_qid <- function(x){if(!all(is.qid(x))){WikidataR::find_item(x)[[1]]$id}else{x}}
as_pid <- function(x){if(!all(is.pid(x))){WikidataR::find_property(x)[[1]]$id}else{x}}

Writting to wikidata with R

dataframe/tibble to quickstatements

see this github issue

Ideas of queries

List of all persons in wikidata that died because the Covid-19

P509 (cause_of_death) Q84263196 (Covid-19)

WikipediR: A MediaWiki API client library

Many websites run on versions of MediaWiki, most prominently Wikipedia and its sister sites. WikipediR is an API client library that allows you to conveniently make requests for content and associated metadata against MediaWiki instances.

Retrieving content

“content” can mean a lot of different things - but mostly, we mean the text of an article, either its current version or any previous versions. Current versions can be retrieved using page_content, which provides both HTML and wikitext as possible output formats. Older, individual revisions can be retrieved with revision_content. These functions also return a range of possible metadata about the revisions or articles in question.

Diffs between revisions can be generated using revision_diff, while individual ‘’elements’’ of a page’s content - particularly links - can be extracted using page_links, page_backlinks, and page_external_links. And if the interest is in changes to content, rather than content itself, recent_changes can be used to grab a slice of a project’s Special:RecentChanges feed.

Retrieving metadata

Page-related information can be accessed using page_info, while categories that a page possesses can be retrieved with categories_in_page - the inverse of this operation (what pages are in a particular category?) uses pages_in_category.

User-related info can be accessed with user_information, while user_contributions allows access to recent contributions by a particular user: this can be conveniently linked up with

Wikidata inte(R)action

Table of Contents