21. Linked Data – an introduction

21.01 Introduction

We’ve been looking at distributing our data between machines. Originally, that seemed to be about efficiency, and speed, and sharing the load of the large data and large processing between multiple systems. But there are other benefits we’ve seen.

In the last topic, we started to look at the idea of sharing meaning and sharing data and expertise from different sources being combined in a larger, more distributed database.

We’re going to be pursuing that a little bit further in this topic. To introduce it, I’d like to look at a proposal that was submitted within a large organization in 1989 – “Information Management: a proposal”.

It was a response to issues to do with losing information within a larger organisation that is used for communication within that organisation. This organisation involves several thousand people, many of them very creative, all working towards common goals.

A problem is high turnover of people. Information is constantly being lost. The knowledge that is embodied in those people is not being passed on because the turn up is too fast and because the information is often implicit.

Requirements include: remote access; heterogeneity; non-centralisation; access to existing data; private links; live links; data analysis.

It was observed that there’s a structure to the organisation that does not reflect the structure of communication, but it also doesn’t reflect the structure of information that’s needed. There’s something else needed. It involves links – a web of notes with links like references. That will be far more useful than a fixed hierarchical system.

A lot of the database systems that would have been used up to this point would be a fixed hierarchical system. Some network structure, some graph, some web, was what Tim Berners-Lee proposed. For this Tim Berners-Lee’s proposal to CERN, the practical acceleration lab in 1989, where he suggested something that might be called a mesh. That eventually became called the World Wide Web.

21.02 Linked open data

Open – cost free; lack of barriers – it should be findable; restriction-free use.

Use of the data should be barrier-free, which means it has to be accessible in various different ways, and use of it should be restriction free, which means that I should be allowed to reuse it according to a reasonable set of terms.

The concept of FAIR data has been adopted by a certain number of large international governmental organisations.

FAIR is findable, accessible, interoperable, and reusable.

Even if the data is open, can it be located? Can we get hold of it? Can we use it? Can we share it? Can we reuse it? You’ll see this quite often used as an extra set of criteria on top of the base idea of open data.

What do we mean by linked data? The links will usually be a URL. In link data, there’s an expectation that this linking happens at the level of a URL. The great thing about URLs is that they have guaranteed unique in the world in a way that we would struggle to do with database case.

The responsibility for maintenance is clear because the domain name is registered with an individual or an organisation, and that is where the responsibility lies for ensuring that that URL is maintained or otherwise.

Web links:

• Links are one-way

– No permission needed
– No central registry of links

• URL

– Guaranteed unique
– Reponsibility of maintenance
– Unlimited number of URLs
– Unique ID independent of server

Question: Which of these might be considered as factors in whether data is Open? (Select ALL that apply)

(a) Do licensing terms permit the reuse of the data without a charge?
(b) Is the author of the data alive? Were they alive within the last 70 years?
(c) Can the data be found without requiring access to a private network?
(d) Can the data be downloaded without a requirement to pay money?

Answer: (a), (c), (d). (b) is not relevant.

Question: What is the link in Linked Data? (select ONE correct answer)

(a) Anchor elements () from HTML
(b) Loading external data into your database
(c) Using URIs as identifiers for everything

Answer: (c). HTTP and URIs are the central technologies of both the Web and Linked Data.

Wednesday 2 March 2022, 441 views

Next post: 22. RDF – Remote Description Framework
Previous post: 20. Transforming XML databases

21. Linked Data – an introduction

Wednesday 2 March 2022, 441 views

Leave a Reply Cancel reply

Databases and Advanced Data Techniques index