the red penguin

2. What shape is your data?

2.01 Introduction

Data is structured in some form, and we have to be concerned about that. There are different levels of structure which can be considered:

  • Programming Languages – Data types (float, int, double, etc) impose a certain structure to the data.
  • Data Models – Relations between different data. Think databases.
  • Data Serialisation – Data formats used for transmission using e.g. a network connection.
  • Exchange Protocols – Some form of standardization for information exchange using e.g. Unix Sockets, Named Pipes, shared memory or similar methods.
  • User Interfaces – Data is user interfaces is structured in a way that’s comfortable for humans to consume

Some of the shapes of data we will deal with are:

  • Tables
  • Trees
  • Graphs
  • Media (raw data)
  • Documents and objects

2.02 Tables

A table has cells with a number of rows and columns. In our case, every row represents a thing. Each column represents a type of information about that thing.

Tables are easy to understand and structure. They’re also very direct in how they communicate information. Tables are very important to Relational Databases.

However, they’re not so good at communicating or structuring data that branches or has hierarchy. A better suited representation for such data would be Trees.

2.03 Trees

A tree in Computer Science is a data structure based on the metaphor of a real tree.

It is worth noting that HTML and XML are examples of tree structures.

Every tree has a root node, every branch in the tree has a path to the root.

  • The root of the tree is node a
  • Nodes e, g, i, k l, m, n, o, p, r, s, and u are leaf nodes
  • Node f is a parent of l, m, and n
  • Nodes l, m, and n are children of node f
  • Nodes a, and b are ancestor of nodes l, m, and n
  • Nodes i, j, and k are siblings
  • Nodes b, c, d, h, and others are internal nodes

Here’s a way of showing data in the form of a tree:

2.04 Other data structures

There are other data structures other than tables and trees:


A graph is a tree where we remove the requirement for every node to have exactly one parent. This is more like the WWW. Heterogenous, non-hierarchical, structured data.

The vertices in a graph could be web pages and the edges could be links between them, or perhaps each node is a file with the edges being a filesystem path.


Blobs are raw data representations without a perceivable structure. For example, a raw sound file.


Features are pieces of information derived from blobs, for example the sample rate from a raw audio file.

Tuesday 19 October 2021, 399 views

Leave a Reply

Your email address will not be published. Required fields are marked *