14. Inserts and queries in SQL

Let’s start with the tables we created from the previous post:

CREATE DATABASE gene_test;
CREATE TABLE ec(pk SERIAL PRIMARY KEY, EC_name VARCHAR(256)); 
CREATE TABLE sequencing(pk SERIAL PRIMARY KEY, sequencing_factory VARCHAR(256), factory_location VARCHAR(256));
CREATE TABLE genes (pk SERIAL PRIMARY KEY, gene_id VARCHAR(256) NOT NULL, entity VARCHAR(256), source VARCHAR(256), start INT, stop INT, sequencing_pk INT, ec_pk INT, FOREIGN KEY (sequencing_pk) REFERENCES sequencing(pk), FOREIGN KEY (ec_pk) REFERENCES ec(pk));
CREATE TABLE products (genes_pk INT, type VARCHAR(256), product VARCHAR(256), FOREIGN KEY (genes_pk) REFERENCES genes(pk));
CREATE TABLE attributes (pk SERIAL PRIMARY KEY, key VARCHAR(256), value VARCHAR(256));
CREATE TABLE gene_attribute_link(genes_pk INT, attributes_pk INT, FOREIGN KEY (genes_pk) REFERENCES genes(pk), FOREIGN KEY (attributes_pk) REFERENCES attributes(pk));

We can enter data as follows:

INSERT INTO sequencing (sequencing_factory, factory_location) VALUES ('Sanger', 'UK');
INSERT INTO ec (ec_name) VALUES ('oxidioreductase');
INSERT INTO genes (gene_id, entity, source, start, stop, sequencing_pk, ec_pk) VALUES ('Gene1', 'Chromosome', 'ena', 190, 255, 1, 1);
INSERT INTO ec (ec_name) VALUES ('transferase');
INSERT INTO genes (gene_id, entity, source, start, stop, sequencing_pk, ec_pk) VALUES ('Gene2', 'Chromosome', 'ena', 375, 566, CURRVAL('sequencing_pk_seq'), CURRVAL('ec_pk_seq'));

CURRVAL(‘sequencing_pk_seq’) is a way of adding the next primary key ID if you can’t remember where you got up to previously.

INSERT INTO genes (gene_id, entity, source, start, stop, sequencing_pk, ec_pk) VALUES ('Gene3', 'Chromosome', 'ena', 780, 980, CURRVAL('sequencing_pk_seq'), 1), ('Gene4', 'Plasmid', 'ena', 1001, 1111, CURRVAL('sequencing_pk_seq'), CURRVAL('ec_pk_seq'));
INSERT INTO products (genes_pk, type, product) VALUES
(1, 'gene', 'mrna'), (1, 'cds', 'protein'),
(2, 'gene', 'mrna'), (2, 'cds', 'protein'),
(3, 'gene', 'mrna'), (3, 'cds', 'protein'),
(4, 'gene', 'mrna'), (4, 'cds', 'protein');
INSERT INTO attributes (key, value) VALUES
('ID', 'gene:b001'),
('Name', 'thrL'),
('biotype', 'protein_coding'),
('description', 'thr operon leader peptide'),
('Name', 'fucA'),
('description', 'Fructokinase A'),
('Name', 'timB'),
('description', 'Triosphosphare Isomerase'),
('Name', 'gluA'),
('description', 'Glucose isomerase');
INSERT INTO gene_attribute_link (genes_pk, attributes_pk) VALUES
(1, 1), (1, 2), (1, 3), (1, 4),
(2, 3), (2, 5), (2, 6),
(3, 3), (3, 7), (3, 8),
(4, 3), (4, 9), (4, 10);

Some of these are simple inserts, some a little bit more complicated.

Now we can start querying the data:

SELECT * FROM sequencing;

returns this:

 pk | sequencing_factory | factory_location
----+--------------------+------------------
  1 | Sanger             | UK
(1 row)

We can see this from the first insert we added.

We can choose specific fields by doing this:

SELECT sequencing_factory, factory_location FROM sequencing;

which returns:

 sequencing_factory | factory_location
--------------------+------------------
 Sanger             | UK
(1 row)

We can add conditions to our query with the WHERE keyword. If we want the pk and ec_name from ec where the name is transferase we can do this:

SELECT pk, ec_name FROM ec WHERE ec_name='transferase';

which returns

 pk |   ec_name
----+-------------
  2 | transferase
(1 row)

For a large table * might be too much. For example if we do this:

SELECT * FROM attributes;

we get this:

 pk |     key     |           value
----+-------------+---------------------------
  1 | ID          | gene:b001
  2 | Name        | thrL
  3 | biotype     | protein_coding
  4 | description | thr operon leader peptide
  5 | Name        | fucA
  6 | description | Fructokinase A
  7 | Name        | timB
  8 | description | Triosphosphare Isomerase
  9 | Name        | gluA
 10 | description | Glucose isomerase
(10 rows)

So we might want to limit it to 5 entries:

SELECT * FROM attributes LIMIT 5;

which will return this:

 pk |     key     |           value
----+-------------+---------------------------
  1 | ID          | gene:b001
  2 | Name        | thrL
  3 | biotype     | protein_coding
  4 | description | thr operon leader peptide
  5 | Name        | fucA
(5 rows)

We can have multiple conditions using AND:

SELECT pk, value from attributes WHERE key='biotype' AND value='protein_coding';

which returns:

 pk |     value
----+----------------
  3 | protein_coding
(1 row)

At the moment we have 4 records in genes. We can insert a new record and add RETURNING pk at the end which will return the pk for the new record inserted. So if we type:

INSERT INTO genes (gene_id, entity, source, start, stop, sequencing_pk, ec_pk) VALUES ('Gene5', 'Plasmid', 'ena', 786, 888, 1, 2) RETURNING pk;

we get this – a table with the new pk, and a confirmation of the insert:

 pk
----
  5
(1 row)


INSERT 0 1

We can also add a new link for the gene into the link table we discussed:

INSERT INTO gene_attribute_link (genes_pk, attributes_pk) VALUES (5, 3);
Wednesday 27 October 2021, 18 views


Leave a Reply

Your email address will not be published. Required fields are marked *