the red penguin

19. Semantic databases

19.01 What does a table actually tell us?

If we want to share a relational database, we can (for example) give them CSV files. We may lose information (column names, type etc) when creating those CSV files.

If we do a full data dump, MySQL would save a .sql file which carries all the information you need to reconstruct the database. Table structures, foreign key relations etc.

If we want to get more meaning out of our data, we need a better format that lets us encode more meaning.

For example, if we’re dealing with a Movies database where each movie has 3 actors, we may have columns Actor1, Actor2, and Actor3.

However, we can encode more meaning here. We know actors are input in the system by their names and we know the name refers to an actor.

Perhaps the table also has the Year column encoding the year of release. We can assume year to be a positive integer within a certain range.

We can also assume that it’s a Calendar Year, i.e. the year number is a valid number within some Calendar structure.

In other words, we have layers of meaning:

  • Data Type
    • String
    • Integer
    • Float
  • Data Domain
    • Place
    • Person
    • Date
  • Data Semantics
    • Person acted in film

If we have a database encoding semantics, there is the potential for the system itself to make use of logic and implement automated reasoning for data retrieval.

19.02 Shared meaning in the real world

Common Semantics:

  • Shared documents – we can share documents that don’t carry semantics and communicate the semantics separately. Ideally, we’d want the document itself to carry itssemantics, but that’s not essential
  • Formal Specifications – say how documents are made and gives structure to them
  • Human-readable Definitions – regardless of how much machine-readable semantics is encoded, we still need to specify semantics for humans. Reasons for this include: agreement for getting data into the system; agreement about how to interpret the data; unifying the meaning of data processing.

19.03 XML: Documents with semantics

What is XML?

The essence of XML is in its name: Extensible Markup Language. It defines rules to encode documents that are both human-readable and machine-readable.


XML is extensible. It lets you define your own tags, the order in which they occur, and how they should be processed or displayed. Another way to think about extensibility is to consider that XML allows all of us to extend our notion of what a document is: it can be a file that lives on a file server, or it can be a transient piece of data that flows between two computer systems (as in the case of Web Services).


The most recognizable feature of XML is its tags, or elements (to be more accurate). In fact, the elements you’ll create in XML will be very similar to the elements you’ve already been creating in your HTML documents. However, XML allows you to define your own set of tags.


XML is a language that’s very similar to HTML. It’s much more flexible than HTML because it allows you to create your own custom tags. However, it’s important to realize that XML is not just a language. XML is a meta-language: a language that allows us to create or define other languages. For example, with XML we can create other languages, such as RSS, MathML (a mathematical markup language), and even tools like XSLT.

Every opening tag has a closing tag, so every element should be closed before we start a new element that’s not going to close.

So you can have an element containing another element but you can’t have two elements that overlap one another. You can’t have multiple hierarchies.

We can say that the XML is well-formed provided you follow these rules. We can say that it follows the basic syntax of XML.

Here’s an example of XML which is not well-formed:

 ​ <category label="fiction">
 ​   <subcategory label="scienceFiction" />
 ​   <subcategory label="detective" />
 ​ <category label="nonFiction">
 ​   <subcategory label="dictionary" />

This isn’t well-formed because it is missing closing tags for both category elements – and since it is not well formed, it is also not valid.

That’s very similar to the case, say when you’ve got a CSV file, or a JSON file which can be correct and readable or broken and unreadable.

But the trick that XML adds here is the idea of validation; the ability to point to a rule set that says, not just is this syntactically legal XML, but does it follow a set of rules that the person who created this set of meanings gave?

To do that, we need to share not just the data itself, not just the documents but also the rule set. We do that simply by referring to the rule set.

eg we could have a TEI, a textual markup tag. This is the parent node for texting coding usually for historical sources and it can point to xmlns, a namespace which is a rule set for TEI’s encoding.

It would be on the web and always accessible which means that when your tool wants to validate it, it can go to this site and find the rules that we used to encode the document.

Those rules are machine readable, but they’re also human readable which means that they’re shared meaning accessible and referenced from within the file.

Thursday 13 January 2022, 398 views

Leave a Reply

Your email address will not be published. Required fields are marked *