1.01 Key Concepts
1.02 Where does data come from?
Data can come form:
When it comes to new data, we can take different approaches:
External sources of data are interesting because they reduce the cost of data entry or quality checks. When data is purchased from a supplier, it comes pre-cleaned and in a format that’s easy to consume. Moreover, we can also have the opportunity of acquiring data produced by experts in a given field.
Conversely, when we acquire data from an external source, we have little or no control over the quality of the data and its structure. The data may also be incomplete and/or ambiguous from our point view; i.e. the level of detail to which a particular piece of information is encoded may be different from what we need. As a final concern, there may be concerns of trustworthiness with
regards to the data.
Where can you find usable data?
Whilst many organisations and individuals make large amounts of data openly available, it can be hard to find. The Open Data Institute founded by Sir Tim Berners-Lee and Sir Nigel Shadbolt is dedicated to getting large-scale open publication of useful data started.
1.02 What does your data look like?
When modelling real-life data, we must consider what sort of information is necessary for the application. For example, the data required for a book may be:
Type Book Weight 557g Height 172mm Colour Red and Green Title Gardener's Calendar Authors Thomas Mawe, John Abercrombie Date 1803 Edition 17th
Some questions arise when it comes to which form of e.g. the title to store. From the point of view of finding it in a shelf “Gardener’s Calendar” is enough; from the point of view of comparison against other similar titles, a long form may be required.
1.03 Licenses, sharing and ethics
In academic and government circles, it’s common to make data as openly available as possible. That, however, doesn’t apply to all parts of government or the commercial world.
There are legal restrictions regarding the use of data which need to be considered.
The Linked Open Data Cloud project produces a graph of all the data openly available published in the Linked Data format.
Considering the size of the graph which contains but a subset of all openly available data, the question to ask is Why is so much data being shared for free if information is so valuable?
To put into perspective, a furniture catalogue from any given furniture company will contain many details about every item: price, sizes, materials, photos. In principle, the furniture could be copied from information that can be gathered from catalogues and manuals. However, the furniture company needs their products to be easy to find if they want to sell them.
The same argument can be used for many other industries: music industry, electronics, streaming services, etc.
To summarise, some of the reasons to share open data:
Conversely, here are some reasons not to share open data:
Licensing questions:
Q. What is copyleft?
A. Copyleft is a license which requires any works derived from the thing being licensed, or any redistribution of it, should use the same licence. Or in other words, Copyleft is free software license requiring copyright authors to permit some of their work to be reproduced.
Q. What is CC0?
A. CC0 is the Creative Commons license for public domain works (which have minimal restrictions or IP rights).
Leave a Reply