2. What shape is your data?
2.01 Introduction
Data is structured in some form, and we have to be concerned about that. There are different levels of structure which can be considered:
- Programming Languages – Data types (float, int, double, etc) impose a certain structure to the data.
- Data Models – Relations between different data. Think databases.
- Data Serialisation – Data formats used for transmission using e.g. a network connection.
- Exchange Protocols – Some form of standardization for information exchange using e.g. Unix Sockets, Named Pipes, shared memory or similar methods.
- User Interfaces – Data is user interfaces is structured in a way that’s comfortable for humans to consume
Some of the shapes of data we will deal with are:
- Tables
- Trees
- Graphs
- Media (raw data)
- Documents and objects
2.02 Tables
A table has cells with a number of rows and columns. In our case, every row represents a thing. Each column represents a type of information about that thing.
Tables are easy to understand and structure. They’re also very direct in how they communicate information. Tables are very important to Relational Databases.
However, they’re not so good at communicating or structuring data that branches or has hierarchy. A better suited representation for such data would be Trees.
2.03 Trees
A tree in Computer Science is a data structure based on the metaphor of a real tree.
It is worth noting that HTML and XML are examples of tree structures.
Every tree has a root node, every branch in the tree has a path to the root.
- The root of the tree is node a
- Nodes e, g, i, k l, m, n, o, p, r, s, and u are leaf nodes
- Node f is a parent of l, m, and n
- Nodes l, m, and n are children of node f
- Nodes a, and b are ancestor of nodes l, m, and n
- Nodes i, j, and k are siblings
- Nodes b, c, d, h, and others are internal nodes
Here’s a way of showing data in the form of a tree:
2.04 Other data structures
There are other data structures other than tables and trees:
Graphs
A graph is a tree where we remove the requirement for every node to have exactly one parent. This is more like the WWW. Heterogenous, non-hierarchical, structured data.
The vertices in a graph could be web pages and the edges could be links between them, or perhaps each node is a file with the edges being a filesystem path.
Blobs
Blobs are raw data representations without a perceivable structure. For example, a raw sound file.
Features
Features are pieces of information derived from blobs, for example the sample rate from a raw audio file.
Tuesday 19 October 2021, 557 views
Next post: 3. Relational Databases Previous post: 1. Sources of data
Databases and Advanced Data Techniques index
- 26. A very good guide to linked data
- 25. Information Retrieval
- 24. Triplestores and SPARQL
- 23. Ontologies – RDF Schema and OWL
- 22. RDF – Remote Description Framework
- 21. Linked Data – an introduction
- 20. Transforming XML databases
- 19. Semantic databases
- 18. Document databases and MongoDB
- 17. Key/Value databases and MapReduce
- 16. Distributed databases and alternative database models
- 15. Query efficiency and denormalisation
- 14. Connecting to SQL in other JS and PHP
- 13. Grouping data in SQL
- 12. SQL refresher
- 11. Malice and accidental damage
- 10. ACID: Guaranteeing a DBMS against errors
- 9. Normalization example
- 8. Database normalization
- 7. Data integrity and security
- 6. Database integrity
- 5. Joins in SQL
- 4. Introduction to SQL
- 3. Relational Databases
- 2. What shape is your data?
- 1. Sources of data
Leave a Reply