20. ORM – work through example

We previously created a new database using PostgreSQL. We are going to start a new project which recreates that database using the Django ORM.

In this example we’re going to create a virtual environment called env, a folder called bioweb, a project called bioweb and an app called genedata.

Firstly we’re going to run through the setup phase which we did in part 9.

So in the Win10 CMD prompt ($ means enter at the command prompt):

  • Navigate to desktop
  • $ cd bioweb
  • $ python -m venv env
  • $ env\Scripts\activate
  • (env) $ pip install django
  • (env) $ django-admin startproject bioweb
  • * if we do a dir now we see two directories, bioweb and env
  • (env) $ cd bioweb
  • (env) $ python manage.py startapp genedata

Note we haven’t started the server (by using python manage.py runserver) yet.

We can now open us VSC and navigate to the bioweb folder. We can see a .vscode folder, a bioweb folder (created when we started the project with startproject), a genedata folder and manage.py.

We need to navigate to genedata and open models.py.

In the Django ORM we are looking to map database tables to single python classes.

We are now going to create the five main tables and the link table that we created in our previous database (see part 14). In that part we created tables in PostgreSQL using these commands:

CREATE DATABASE gene_test;
CREATE TABLE genes (pk SERIAL PRIMARY KEY, gene_id VARCHAR(256) NOT NULL, entity VARCHAR(256), source VARCHAR(256), start INT, stop INT, sequencing_pk INT, ec_pk INT, FOREIGN KEY (sequencing_pk) REFERENCES sequencing(pk), FOREIGN KEY (ec_pk) REFERENCES ec(pk));
CREATE TABLE ec(pk SERIAL PRIMARY KEY, EC_name VARCHAR(256)); 
CREATE TABLE sequencing(pk SERIAL PRIMARY KEY, sequencing_factory VARCHAR(256), factory_location VARCHAR(256));
CREATE TABLE products (genes_pk INT, type VARCHAR(256), product VARCHAR(256), FOREIGN KEY (genes_pk) REFERENCES genes(pk));
CREATE TABLE attributes (pk SERIAL PRIMARY KEY, key VARCHAR(256), value VARCHAR(256));
CREATE TABLE gene_attribute_link(genes_pk INT, attributes_pk INT, FOREIGN KEY (genes_pk) REFERENCES genes(pk), FOREIGN KEY (attributes_pk) REFERENCES attributes(pk));

(We did make some changes to these tables during the exercise so the fields below may not match exactly)

We start by declaring the class name and showing that inherits from models.model:

class Gene(models.Model):

Now we can start to define the fields in this table:

class Gene(models.Model):
    gene_id = models.CharField(max_length=256, null=False, blank=False)
    entity = models.CharField(max_length=256, null=False, blank=False)
    start = models.IntegerField(null=False, blank=True)
    stop = models.IntegerField(null=False, blank=True)
    sense = models.CharField(max_length=1)
    start_codon = models.CharField(max_length=1, default="M")

By convention we use singular names for these classes (that’s why Gene and not genes). An individual class is mapped to a single database table.

We can see that models.CharField equates to VARCHAR, models.IntegerField equates to INT, null=False equates to NOT NULL, etc.

By default, Django adds the primary key field for you, in a field (column) called ID. You don’t have to declare that yourself.

This class is the equivalent of the following SQL code:

CREATE TABLE Gene (
    gene_id VARCHAR(256) NOT NULL,
    entity VARCHAR(256) NOT NULL,
    start INT NOT NULL,
    stop INT NOT NULL,
    sense CHAR,
    start_codon CHAR DEFAULT 'M',
);

Now we knew in our previous data model that genes are related to both an EC table and our sequencing table by many to one relationships. So we can go ahead and add those to our genes table definition as foreign keys. And that’s quite an easy thing to declare.

    sequencing = models.ForeignKey(Sequencing, on_delete=models.DO_NOTHING)
    ec = models.ForeignKey(EC, on_delete=models.DO_NOTHING)

By convention in Django, we often give the foreign key the same name as the target table, in lower case.

A foreign key requires an on_delete activity and this tells us what we should do to the data in the linked table. Should we delete a linked record in the gene table?

So if we delete a gene, should we also delete the sequencing records that points at? And should we also delete the EC record it points at?

In this case we’ve told it to do nothing. So Sequencing records and EC records will be maintained even if we delete Gene.

If we want it to delete all the other records that refer to it, we can use on_delete=models.CASCADE

Now we can go ahead and add these tables that have these foreign key relationships.

class EC(models.Model):
    ec_name = models.CharField(max_length=256, null=False, blank=False)

class Sequencing(models.Model):
    sequencing_factory = models.CharField(max_length=256, null=False, blank=False)
    factory_location = models.CharField(max_length=256, null=False, blank=False)

This completes the tables that have the many to one relationship. We can now add our remaining tables.

class Product(models.Model):
    type = models.CharField(max_length=256, null=False, blank=False)
    product = models.CharField(max_length=256, null=False, blank=False)
    gene = models.ForeignKey(Gene, on_delete=models.CASCADE)

class Attribute(models.Model):
    key = models.CharField(max_length=256, null=False, blank=False)
    value = models.CharField(max_length=256, null=False, blank=False)
    gene = models.ManyToManyField(Gene, through="GeneAttributeLink")

In the attributes table we had a key value pair – strings of various lengths.
We declared this above as follows:

    gene = models.ManyToManyField(Gene, through="GeneAttributeLink")

We then need to create the link table too:

class GeneAttributeLink(models.Model):
    gene = models.ForeignKey(Gene, on_delete=models.DO_NOTHING)
    attribute = models.ForeignKey(Attribute, on_delete=models.DO_NOTHING)

We’ve now constructed the five tables and link table using the Django ORM.

Model classes can also host methods as they are just regular Python classes. We’ve used a number of built-in methods but we can add our own if we need them.

There is a typical example here. If you look in many people’s applications you’ll see that most tables override the built-in string method.

When we retrieve a row from the database and then use that data in a string context, we’re interested in some easy human-readable value to represent that row in the database.

Returning the primary key value, which is just an integer, is often not very informative. So often a unique stream column is quite useful.

If we go back to our Gene class, we can override this and have it return gene id like this:

    def __str__ (self):
        return self.gene_id

So whenever we use a row of the data later in string context it will return the string value in the gene id column.

We can add this to all of the other classes here too. Note that:

  • We’re returning a variable that can help with identifying the record
  • We can also return a pair of values, as in Attribute, where returning just one wouldn’t be helpful
  • We didn’t do this for every class right now, just the ones where we will need a meaningful return

Here’s the full code:

from django.db import models

class Gene(models.Model):
    gene_id = models.Charfield(max_length=256, null=False, blank=False)
    entity = models.Charfield(max_length=256, null=False, blank=False)
    start = models.IntegerField(null=False, blank=True)
    stop = models.IntegerField(null=False, blank=True)
    sense = models.Charfield(max_length=1)
    start_codon = models.Charfield(max_length=1, default="M")
    sequencing = models.ForeignKey(Sequencing, on_delete=models.DO_NOTHING)
    ec = models.ForeignKey(EC, on_delete=models.DO_NOTHING)

    def __str__ (self):
        return self.gene_id

class EC(models.Model):
    ec_name = models.CharField(max_length=256, null=False, blank=False)

    def __str__ (self):
        return self.ec_name

class Sequencing(models.Model):
    sequencing_factory = models.CharField(max_length=256, null=False, blank=False)
    factory_location = models.CharField(max_length=256, null=False, blank=False)

    def __str__ (self):
        return self.sequencing_factory

class Product(models.Model):
    type = models.CharField(max_length=256, null=False, blank=False)
    product = models.CharField(max_length=256, null=False, blank=False)
    gene = models.ForeignKey(Gene, on_delete=models.CASCADE)

class Attribute(models.Model):
    key = models.CharField(max_length=256, null=False, blank=False)
    value = models.CharField(max_length=256, null=False, blank=False)
    gene = models.ManyToManyField(Gene, through="GeneAttributeLink")
    
    def __str__ (self):
        return self.key+" : "+self.value

class GeneAttributeLink(models.Model):
    gene = models.ForeignKey(Gene, on_delete=models.DO_NOTHING)
    attribute = models.ForeignKey(Attribute, on_delete=models.DO_NOTHING)
Wednesday 3 November 2021, 31 views


Leave a Reply

Your email address will not be published. Required fields are marked *