15. Queries and table joins in SQL
We previously saw that we can use simple queries to obtain data from the tables and database. In those instances, we were looking up relevant primary keys, independent tables such that we could add new rows to our genes tables. The power of a relational database is in building relational queries.
Using our previous example:
We can query individual tables like this:
example=# SELECT * FROM genes; pk | gene_id | entity | source | start | stop | sequencing_pk | ec_pk ----+---------+------------+--------+-------+------+---------------+------- 1 | Gene1 | Chromosome | ena | 190 | 255 | 1 | 1 2 | Gene2 | Chromosome | ena | 375 | 566 | 1 | 2 3 | Gene3 | Chromosome | ena | 780 | 980 | 1 | 1 4 | Gene4 | Plasmid | ena | 1001 | 1111 | 1 | 2 5 | Gene5 | Plasmid | ena | 786 | 888 | 1 | 2 (5 rows) example=# SELECT * FROM ec; pk | ec_name ----+----------------- 1 | oxidioreductase 2 | transferase (2 rows)
If the tables are linked, we can also query two tables together: example=# SELECT * FROM genes, ec; pk | gene_id | entity | source | start | stop | sequencing_pk | ec_pk | pk | ec_name ----+---------+------------+--------+-------+------+---------------+-------+----+----------------- 1 | Gene1 | Chromosome | ena | 190 | 255 | 1 | 1 | 1 | oxidioreductase 2 | Gene2 | Chromosome | ena | 375 | 566 | 1 | 2 | 1 | oxidioreductase 3 | Gene3 | Chromosome | ena | 780 | 980 | 1 | 1 | 1 | oxidioreductase 4 | Gene4 | Plasmid | ena | 1001 | 1111 | 1 | 2 | 1 | oxidioreductase 5 | Gene5 | Plasmid | ena | 786 | 888 | 1 | 2 | 1 | oxidioreductase 1 | Gene1 | Chromosome | ena | 190 | 255 | 1 | 1 | 2 | transferase 2 | Gene2 | Chromosome | ena | 375 | 566 | 1 | 2 | 2 | transferase 3 | Gene3 | Chromosome | ena | 780 | 980 | 1 | 1 | 2 | transferase 4 | Gene4 | Plasmid | ena | 1001 | 1111 | 1 | 2 | 2 | transferase 5 | Gene5 | Plasmid | ena | 786 | 888 | 1 | 2 | 2 | transferase (10 rows)
Here every row of data in the genes table, rows 1-5, has been paired with every row of data in the ec table, rows 1-2.
This isn’t particularly informative but just an illustration of how it all works.
We can restrict our queries so that it only allows rows where the primary and foreign keys are correctly matched. So let’s do this:
example=# SELECT * FROM genes, ec WHERE genes.ec_pk=ec.pk; pk | gene_id | entity | source | start | stop | sequencing_pk | ec_pk | pk | ec_name ----+---------+------------+--------+-------+------+---------------+-------+----+----------------- 3 | Gene3 | Chromosome | ena | 780 | 980 | 1 | 1 | 1 | oxidioreductase 1 | Gene1 | Chromosome | ena | 190 | 255 | 1 | 1 | 1 | oxidioreductase 5 | Gene5 | Plasmid | ena | 786 | 888 | 1 | 2 | 2 | transferase 4 | Gene4 | Plasmid | ena | 1001 | 1111 | 1 | 2 | 2 | transferase 2 | Gene2 | Chromosome | ena | 375 | 566 | 1 | 2 | 2 | transferase (5 rows)
This is returning all the rows in the 10 row table where ec_pk and pk match. We’re talking about the second pk, as that’s the one that belongs to ec (ec.pk).
We can refine the search further omitting the various primary key and foreign key values, as that’s the data we probably really want.
example=# SELECT gene_id, entity, source, start, stop, ec_name FROM genes, ec WHERE genes.ec_pk=ec.pk; gene_id | entity | source | start | stop | ec_name ---------+------------+--------+-------+------+----------------- Gene3 | Chromosome | ena | 780 | 980 | oxidioreductase Gene1 | Chromosome | ena | 190 | 255 | oxidioreductase Gene5 | Plasmid | ena | 786 | 888 | transferase Gene4 | Plasmid | ena | 1001 | 1111 | transferase Gene2 | Chromosome | ena | 375 | 566 | transferase (5 rows)
We can make a more complex query by searching from three tables:
example=# SELECT gene_id, entity, source, start, stop, ec_name, sequencing_factory, example-# factory_location FROM genes, ec, sequencing WHERE genes.ec_pk=ec.pk AND example-# genes.sequencing_pk=sequencing.pk; gene_id | entity | source | start | stop | ec_name | sequencing_factory | factory_location ---------+------------+--------+-------+------+-----------------+--------------------+------------------ Gene3 | Chromosome | ena | 780 | 980 | oxidioreductase | Sanger | UK Gene1 | Chromosome | ena | 190 | 255 | oxidioreductase | Sanger | UK Gene5 | Plasmid | ena | 786 | 888 | transferase | Sanger | UK Gene4 | Plasmid | ena | 1001 | 1111 | transferase | Sanger | UK Gene2 | Chromosome | ena | 375 | 566 | transferase | Sanger | UK (5 rows)
You’ll note that this linking together tables is essentially the reverse operation to the normalizing we previously thought through in an earlier video. Our attributes were linked in many ternary relationships through a link table. This requires slightly more careful handling. Let’s look at how are we going to do that.
example=# SELECT gene_id, key, value FROM genes, attributes, gene_attribute_link WHERE example-# genes.pk=gene_attribute_link.genes_pk AND example-# attributes.pk=gene_attribute_link.attributes_pk; gene_id | key | value ---------+-------------+--------------------------- Gene1 | ID | gene:b001 Gene1 | Name | thrL Gene1 | biotype | protein_coding Gene1 | description | thr operon leader peptide Gene2 | biotype | protein_coding Gene2 | Name | fucA Gene2 | description | Fructokinase A Gene3 | biotype | protein_coding Gene3 | Name | timB Gene3 | description | Triosphosphare Isomerase Gene4 | biotype | protein_coding Gene4 | Name | gluA Gene4 | description | Glucose isomerase (13 rows)
In these examples we are doing table joins – which is implicitly implied. We can explicitly mention which table joins we want to use. These are equivalent:
SELECT * FROM genes, ec WHERE genes.ec_pk=ec.pk; SELECT * FROM genes INNER JOIN ec ON genes.ec_pk=ec.pk;
We can think of joins as set operations:
We can use a LEFT JOIN which requests that we return every row of data from the first or left-hand table in the select statement, plus only those rows from the right hand table that satisfy our joining requirement.
example=# SELECT * FROM genes LEFT JOIN ec ON genes.ec_pk=ec.pk AND example-# ec_name='transferase'; pk | gene_id | entity | source | start | stop | sequencing_pk | ec_pk | pk | ec_name ----+---------+------------+--------+-------+------+---------------+-------+----+------------- 1 | Gene1 | Chromosome | ena | 190 | 255 | 1 | 1 | | 2 | Gene2 | Chromosome | ena | 375 | 566 | 1 | 2 | 2 | transferase 3 | Gene3 | Chromosome | ena | 780 | 980 | 1 | 1 | | 4 | Gene4 | Plasmid | ena | 1001 | 1111 | 1 | 2 | 2 | transferase 5 | Gene5 | Plasmid | ena | 786 | 888 | 1 | 2 | 2 | transferase (5 rows)
Similarly:
example=# SELECT * FROM genes RIGHT JOIN ec ON genes.ec_pk=ec.pk AND example-# ec_name='transferase'; pk | gene_id | entity | source | start | stop | sequencing_pk | ec_pk | pk | ec_name ----+---------+------------+--------+-------+------+---------------+-------+----+----------------- | | | | | | | | 1 | oxidioreductase 5 | Gene5 | Plasmid | ena | 786 | 888 | 1 | 2 | 2 | transferase 4 | Gene4 | Plasmid | ena | 1001 | 1111 | 1 | 2 | 2 | transferase 2 | Gene2 | Chromosome | ena | 375 | 566 | 1 | 2 | 2 | transferase (4 rows)
And a full join:
example=# SELECT * FROM genes FULL JOIN ec ON genes.ec_pk=ec.pk AND example-# ec_name='transferase'; pk | gene_id | entity | source | start | stop | sequencing_pk | ec_pk | pk | ec_name ----+---------+------------+--------+-------+------+---------------+-------+----+----------------- | | | | | | | | 1 | oxidioreductase 5 | Gene5 | Plasmid | ena | 786 | 888 | 1 | 2 | 2 | transferase 4 | Gene4 | Plasmid | ena | 1001 | 1111 | 1 | 2 | 2 | transferase 2 | Gene2 | Chromosome | ena | 375 | 566 | 1 | 2 | 2 | transferase 3 | Gene3 | Chromosome | ena | 780 | 980 | 1 | 1 | | 1 | Gene1 | Chromosome | ena | 190 | 255 | 1 | 1 | | (6 rows)
Wednesday 27 October 2021, 700 views
Next post: 16. SQL Query performance Previous post: 14. Inserts and queries in SQL
Advanced Web Development index
- 38. Writing API tests
- 37. Testing in Django
- 36. Class-based views in the Django REST framework
- 35. Building a RESTful web service in Django
- 34. Introduction to CRUD, REST and APIs
- 33. Refactoring with generic views in Django
- 32. Django validators
- 31. Django forms (2) – using the ModelForm class
- 30. Django forms (1)
- 29. JavaScript basics
- 28. Adding CSS to the template
- 27. Django templating
- 26. Deleting and updating records
- 25. Joins, filters and chaining commands
- 24. Using the ORM in views.py
- 23. Adding to the database by writing a script
- 22. Adding to the database with Django Admin
- 21. Migrations
- 20. ORM – work through example
- 19. An introduction to the Object-Relational Mapper
- 18. Altering the database
- 17. SQL functions and summaries
- 16. SQL Query performance
- 15. Queries and table joins in SQL
- 14. Inserts and queries in SQL
- 13. Good practice in relational database design
- 12. Limitations to database modelling
- 11. Building a database using SQL
- 10. Introduction to PostgreSQL
- 9. How to start writing a new application in Django
- 8. Building a lightweight project
- 7. Django URLs
- 6. Django templates
- 5. Django models
- 4. Django views
- 3. Creating a new hello app
- 2. Creating a new virtual environment
- 1. Setting up Django
Leave a Reply