2024-08-19

Advantages of SQL databases

  1. Structured Data: Organizes complex datasets with relational schemas.
  2. Powerful Queries: Enables efficient data retrieval using SQL.
  3. Efficient Resource Usage: Reads only the data queried, minimizing RAM usage and optimizing performance.
  4. Scalable: Manages large, growing datasets effectively.
  5. Data Integrity: Maintains consistency with keys and constraints.
  6. Secure: Provides robust access control and security features.
  7. Interoperable: Seamlessly integrates with tools like R, Python, and SQL for direct querying and analysis.

Example Danish plant species

Structured Data

Full Diagram

Species table

ID species arterid gbifid taxonrang accepteret_dansk_navn den_danske_rodliste rodliste_2010 herkomst levested wcvp_name wcvp_species Source WarningsEng gbif_genus gbif_species gbif_rank
1 Abies alba 17123 2685484 Art Almindelig ædelgran NA NA Introduceret NA Abies alba Abies alba wcvp Abies Abies alba SPECIES
2 Abies balsamea 17124 2685383 Art Balsam-ædelgran NA NA NA NA Abies balsamea Abies balsamea wcvp Abies Abies balsamea SPECIES
3 Abies cephalonica 17125 2685326 Art NA NA NA NA NA Abies cephalonica Abies cephalonica wcvp Abies Abies cephalonica SPECIES
4 Abies concolor 17126 2685580 Art Langnålet ædelgran NA NA Introduceret NA Abies concolor Abies concolor wcvp Abies Abies concolor SPECIES
5 Abies grandis 17127 2685361 Art Kæmpegran NA NA Introduceret NA Abies grandis Abies grandis wcvp Abies Abies grandis SPECIES
6 Abies homolepis 17128 2685663 Art NA NA NA NA NA Abies homolepis Abies homolepis wcvp Abies Abies homolepis SPECIES
7 Abies koreana 17129 2685681 Art NA NA NA NA NA Abies koreana Abies koreana wcvp Abies Abies koreana SPECIES
8 Abies lasiocarpa 17130 2685313 Art Klippe-ædelgran NA NA NA NA Abies lasiocarpa Abies lasiocarpa wcvp Abies Abies lasiocarpa SPECIES
9 Abies lowiana 17131 2685589 Art Oregon-ædelgran NA NA Introduceret NA Abies concolor Abies concolor wcvp Abies NA GENUS
10 Abies nordmanniana 17132 2685552 Art Nordmannsgran NA NA Introduceret NA Abies nordmanniana Abies nordmanniana wcvp Abies Abies nordmanniana SPECIES
11 Abies nordmanniana subsp. equi-trojani 56213 2685553 Underart Tyrkisk ædelgran NA NA NA NA Abies nordmanniana subsp. equi-trojani Abies nordmanniana wcvp Abies Abies nordmanniana SUBSPECIES
12 Abies nordmanniana subsp. nordmanniana 56214 7071074 Underart Nordmannsgran (underart) NA NA NA NA Abies nordmanniana subsp. nordmanniana Abies nordmanniana wcvp Abies Abies nordmanniana SUBSPECIES
13 Abies pinsapo 17133 2685464 Art NA NA NA NA NA Abies pinsapo Abies pinsapo wcvp Abies Abies pinsapo SPECIES
14 Abies procera 17134 2685308 Art Sølvgran NA NA Introduceret NA Abies procera Abies procera wcvp Abies Abies procera SPECIES
15 Abies sibirica 17135 2685654 Art Sibirisk ædelgran NA NA NA NA Abies sibirica Abies sibirica wcvp Abies Abies sibirica SPECIES
16 Abies veitchii 17136 2685400 Art Veitch’s ædelgran NA NA NA NA Abies veitchii Abies veitchii wcvp Abies Abies veitchii SPECIES
17 Abietinella abietina 17140 5283402 Art Bakke-granmos EN NA NA NA Abietinella Abietinella abietina SPECIES
18 Abutilon theophrasti 17165 3152614 Art Kinajute NA NA Introduceret NA Abutilon theophrasti Abutilon theophrasti wcvp Abutilon Abutilon theophrasti SPECIES
19 Acacia saligna 17167 2978552 Art TÃ¥re-akacie NA NA Ingen forekomst i Danmark NA Acacia saligna Acacia saligna wcvp Acacia Acacia saligna SPECIES
20 Acaena novae-zelandiae 17168 5370755 Art Tornnød NA NA Introduceret NA Acaena novae-zelandiae Acaena novae-zelandiae wcvp Acaena Acaena novae-zelandiae SPECIES
21 Acanthosphaera zachariasii 17229 6135987 Art NA NA NA NA NA Naucleopsis wcvp [Partial] Acanthosphaera Acanthosphaera zachariasii SPECIES
22 Acanthus hungaricus 17233 8116963 Art Balkantidsel NA NA Introduceret NA Acanthus hungaricus Acanthus hungaricus wcvp Acanthus Acanthus hungaricus SPECIES
23 Acanthus mollis 17234 5415455 Art Blød akantus NA NA Introduceret NA Acanthus mollis Acanthus mollis wcvp Acanthus Acanthus mollis SPECIES
24 Acaulon muticum 17258 7941056 Art Siddende ægmos NT NA NA NA Aloinopsis wcvp [Partial] Acaulon Acaulon muticum SPECIES
25 Acer campestre 17268 3189863 Art Navr LC LC Naturlig forekomst NA Acer campestre Acer campestre wcvp Acer Acer campestre SPECIES
26 Acer campestre var. hebecarpum 57844 8055346 Varietet NA NA NA NA NA Acer campestre subsp. campestre Acer campestre wcvp Acer Acer campestre SUBSPECIES
27 Acer campestre var. leiocarpum 57845 7760544 Varietet NA NA NA NA NA Acer campestre subsp. leiocarpum Acer campestre wcvp Acer Acer campestre VARIETY
28 Acer cappadocicum 17269 7262908 Art Tyrkisk løn NA NA Introduceret NA Acer cappadocicum Acer cappadocicum wcvp Acer Acer cappadocicum SPECIES
29 Acer circinatum 17270 3189884 Art Vin-løn NA NA Introduceret NA Acer circinatum Acer circinatum wcvp Acer Acer circinatum SPECIES
30 Acer heldreichii 17271 7375236 Art Græsk løn NA NA Introduceret NA Acer heldreichii Acer heldreichii wcvp Acer Acer heldreichii SPECIES

All columns

Powerful Queries

Examples of querries

con <- dbConnect(RSQLite::SQLite(), "plant_species.db")
Test <- dbGetQuery(con, "
  SELECT ID, species, accepteret_dansk_navn, wcvp_species, gbif_species 
  FROM Species 
  WHERE rodliste_2010 = 'VU'
")
ID species accepteret_dansk_navn wcvp_species gbif_species
238 Althaea officinalis Lægestokrose Althaea officinalis Althaea officinalis
301 Anacamptis pyramidalis Horndrager Anacamptis pyramidalis Anacamptis pyramidalis
368 Anthericum liliago Ugrenet edderkopurt Anthericum liliago Anthericum liliago
379 Anthyllis vulneraria subsp. danica Dansk rundbælg Anthyllis vulneraria Anthyllis vulneraria
498 Asplenium scolopendrium Hjortetunge Asplenium scolopendrium Asplenium scolopendrium
694 Botrychium matricariifolium Kamillebladet månerude Botrychium matricariifolium Botrychium matricariifolium
695 Botrychium multifidum Stilk-månerude Botrychium multifidum Sceptridium multifidum
753 Bromus racemosus Eng-hejre Bromus racemosus Bromus racemosus
984 Carex colchica Skrænt-star Carex colchica Carex colchica
1690 Cuscuta epithymum var. epithymum Lyng-silke Cuscuta epithymum Cuscuta epithymum
1726 Cypripedium calceolus Fruesko Cypripedium calceolus Cypripedium calceolus
1746 Dactylorhiza incarnata subsp. lobelii Klit-gøgeurt Dactylorhiza incarnata Dactylorhiza incarnata
1780 Dactylorhiza sambucina Hylde-gøgeurt Dactylorhiza sambucina Dactylorhiza sambucina
1831 Dianthus armeria Kost-nellike Dianthus armeria Dianthus armeria
1940 Draba incana Hvidgrå draba Draba incana Draba incana
1954 Drosera anglica Langbladet soldug Drosera anglica Drosera anglica
2027 Eleocharis parvula Lav kogleaks Eleocharis parvula Eleocharis parvula
2140 Epipactis atrorubens Rød hullæbe Epipactis atrorubens Epipactis atrorubens
2147 Epipactis leptochila Storblomstret hullæbe Epipactis leptochila Epipactis leptochila
2334 Festuca polesica Baltisk svingel Festuca beckeri Festuca beckeri
2495 Gentianella campestris var. campestris Bredbægret ensian (varietet) Gentianella campestris Gentianella campestris
2914 Hypochaeris maculata Plettet kongepen Hypochaeris maculata Trommsdorffia maculata
2963 Jacobaea erucifolia Smalfliget brandbæger Jacobaea erucifolia Jacobaea erucifolia
2980 Juncus alpinoarticulatus subsp. rariflorus Stilk-siv Juncus alpinoarticulatus Juncus alpinoarticulatus
3151 Lathyrus sphaericus Enblomstret fladbælg Lathyrus sphaericus Lathyrus sphaericus
3498 Melampyrum cristatum Kantet kohvede Melampyrum cristatum Melampyrum cristatum
3499 Melampyrum nemorosum Blåtoppet kohvede Melampyrum nemorosum Melampyrum nemorosum
3681 Najas marina Stor najade Najas marina Najas marina
3715 Neotinea ustulata Bakkegøgeurt Neotinea ustulata Neotinea ustulata
4143 Pilosella cymosa var. cymosa Kvast-høgeurt (varietet) Pilosella cymosa NA
4339 Polygonum raii subsp. norvegicum Norsk pileurt Polygonum norvegicum Polygonum norvegicum
4425 Potamogeton coloratus Vejbred-vandaks Potamogeton coloratus Potamogeton coloratus
4427 Potamogeton compressus Bændel-vandaks Potamogeton compressus Potamogeton compressus
4434 Potamogeton friesii Brodbladet vandaks Potamogeton friesii Potamogeton friesii
4602 Pulmonaria angustifolia Himmelblå lungeurt Pulmonaria angustifolia Pulmonaria angustifolia
4606 Pulmonaria officinalis Hvidplettet lungeurt Pulmonaria officinalis Pulmonaria officinalis
4611 Pulsatilla vernalis Vår-kobjælde Pulsatilla vernalis Pulsatilla vernalis
4692 Ranunculus polyanthemos subsp. polyanthemos Mangeblomstret ranunkel (underart) Ranunculus polyanthemos Ranunculus polyanthemos
4829 Rosa inodora Lugtløs æble-rose Rosa inodora Rosa inodora
5103 Sagina subulata Syl-firling Sagina hawaiensis Sagina subulata
5230 Saxifraga hirculus Gul stenbræk Saxifraga hirculus Saxifraga hirculus
5236 Scabiosa canescens Vellugtende skabiose Scabiosa canescens Scabiosa canescens
5280 Scheuchzeria palustris Blomstersiv Scheuchzeria palustris Scheuchzeria palustris
5300 Schoenus nigricans Sort skæne Schoenus nigricans Schoenus nigricans
5346 Scutellaria hastifolia Spydbladet skjolddrager Scutellaria hastifolia Scutellaria hastifolia
5366 Selaginella selaginoides Dværgulvefod Selaginella selaginoides Selaginella selaginoides
5572 Spergula morisonii VÃ¥r-spergel Spergula morisonii Spergula morisonii
5723 Stellaria crassifolia Tykbladet fladstjerne Stellaria crassifolia Stellaria crassifolia
5746 Stuckenia filiformis Tråd-vandaks Stuckenia filiformis Stuckenia filiformis
6242 Tephroseris integrifolia Bakke-fnokurt Tephroseris integrifolia Tephroseris integrifolia
6254 Tetragonolobus maritimus Kantbælg Lotus maritimus Lotus maritimus
6363 Trichophorum alpinum Liden kæruld Trichophorum alpinum Trichophorum alpinum
6641 Veronica verna Vår-ærenpris Veronica verna Veronica verna
6673 Vicia orobus Lyng-vikke Vicia orobus Vicia orobus
6687 Vicia sylvatica var. condensata Lav skov-vikke Vicia sylvatica Vicia sylvatica
6712 Viola mirabilis Forskelligblomstret viol Viola mirabilis Viola mirabilis
6725 Viola stagnina Rank viol Viola stagnina Viola stagnina

Included tables

More complex querry

# Query to get the mean of w_thousandseedweight and number of observations for each species
query <- "
  SELECT 
    s.ID AS Species_ID,
    s.species AS Species_Name,
    COUNT(sw.SeedID) AS Number_of_Observations,
    AVG(sw.w_thousandseedweight) AS Mean_Thousand_Seed_Weight
  FROM Species s
  INNER JOIN MainSeed ms ON s.ID = ms.ID
  INNER JOIN SeedWeight sw ON ms.SeedID = sw.SeedID
  WHERE s.rodliste_2010 = 'VU'
  GROUP BY s.ID, s.species
"

# Execute the query
result <- dbGetQuery(con, query)

Complex table

  • GROUP BY s.ID, s.species: This line groups the results by Species_ID and species name. Aggregations like COUNT() and AVG() are performed within each group.
  • AVG(sw.w_thousandseedweight) AS Mean_Thousand_Seed_Weight: Computes the average value of w_thousandseedweight for each species group defined by Species_ID.
Species_ID Species_Name Number_of_Observations Mean_Thousand_Seed_Weight
238 Althaea officinalis 11 4.1891273
301 Anacamptis pyramidalis 1 0.0007263
368 Anthericum liliago 6 7.5505348
753 Bromus racemosus 2 4.0784000
984 Carex colchica 1 0.9236000
1726 Cypripedium calceolus 3 0.0028936
1831 Dianthus armeria 15 0.4138892
1940 Draba incana 3 0.1258667
1954 Drosera anglica 4 0.0302400
2147 Epipactis leptochila 5 0.0034179
2334 Festuca polesica 2 0.6428000
2914 Hypochaeris maculata 6 1.4538303
3151 Lathyrus sphaericus 29 17.6857135
3498 Melampyrum cristatum 4 5.8036000
3499 Melampyrum nemorosum 2 9.2498964
3681 Najas marina 1 2.6441489
4425 Potamogeton coloratus 3 0.4225333
4427 Potamogeton compressus 2 3.6039259
4434 Potamogeton friesii 1 3.0659341
4606 Pulmonaria officinalis 3 5.0802667
4829 Rosa inodora 2 16.0300000
5230 Saxifraga hirculus 8 0.1035187
5236 Scabiosa canescens 3 0.8814667
5280 Scheuchzeria palustris 2 2.8913712
5300 Schoenus nigricans 22 66.4037676
5346 Scutellaria hastifolia 1 0.1120000
5572 Spergula morisonii 1 0.1736000
5746 Stuckenia filiformis 3 1.4558273
6242 Tephroseris integrifolia 5 0.5738240
6363 Trichophorum alpinum 2 0.1270166
6641 Veronica verna 7 0.1222000
6673 Vicia orobus 4 24.2000000
6712 Viola mirabilis 3 2.5828000

Included tables

Efficient Resource Usage

Speed

RAM usage

Interoperable

With command line or Python

Scalable

Scalability Techniques

  • Partitioning: Divide large tables, such as species observations, into smaller partitions based on geographic regions or time periods.
  • Indexing: Create indexes on columns frequently used in queries, such as species names and observation dates, to speed up search and retrieval operations.
  • Sharding: Distribute data across multiple database servers to balance the load and manage the increasing dataset size effectively.

Partitioning and Indexing in SQLite

CREATE INDEX idx_observation_date ON Observations(observation_date);
-- Example of partitioning by geographic region (in a practical implementation, this might involve separate tables)
CREATE TABLE Observations_North_America AS
SELECT * FROM Observations WHERE region = 'North America';

Data Integrity:

Species and Observation Data

  • Data Integrity:
    • Primary Keys: Ensure each species and observation has a unique identifier.
    • Foreign Keys: Link observations to the correct species using foreign keys.
    • Constraints: Ensure valid data entries, such as valid species IDs and observation dates.

Setting Up Keys and Constraints in SQLite

CREATE TABLE Observations (
    observation_id INTEGER PRIMARY KEY,
    species_id INTEGER,
    observation_date DATE,
    location TEXT,
    FOREIGN KEY (species_id) REFERENCES Species(species_id)
);

-- Adding a constraint to ensure no future dates for observations
ALTER TABLE Observations
ADD CONSTRAINT check_date CHECK (observation_date <= CURRENT_DATE);

Secure

Access Control for Sensitive Data

  • Security Features:
    • Access Control: Define roles and permissions for different types of users (e.g., researchers, administrators).
    • Encryption: Encrypt sensitive data, such as location details, to prevent unauthorized access.
    • Audit Logging: Track access to sensitive data for compliance and security monitoring.

Basic Access Control and Encryption in Python with SQLite

import sqlite3
from cryptography.fernet import Fernet

# Create a key for encryption
key = Fernet.generate_key()
cipher_suite = Fernet(key)

# Connect to the database
conn = sqlite3.connect('biodiversity.db')
cursor = conn.cursor()

# Example of storing encrypted data
def encrypt_data(data):
    return cipher_suite.encrypt(data.encode()).decode()

def decrypt_data(encrypted_data):
    return cipher_suite.decrypt(encrypted_data.encode()).decode()

# Insert encrypted data into the database
encrypted_location = encrypt_data("Sensitive Location Data")
cursor.execute("INSERT INTO Observations (species_id, observation_date, location) VALUES (?, ?, ?)",
               (1, '2023-08-15', encrypted_location))
conn.commit()

# Retrieve and decrypt data
cursor.execute("SELECT location FROM Observations WHERE species_id = 1")
encrypted_location = cursor.fetchone()[0]
decrypted_location = decrypt_data(encrypted_location)
print(f"Decrypted Location: {decrypted_location}")

conn.close()

Possible datasets

More datasets

  • Seed traits (Derek): from kew germination, oil, protein, dispersal, seed weights, storage behaviour, morphology
  • Field work plots (Bjarke and Mattis): Table for each plot and for each presence
  • Novana plots (Rasmus): One table for plots and one for presences
  • Archive datasets (Claus): One with study Id and one with presences?
  • CSR dataset (Derek): Species and traits
  • Plant traits (Volunteers):

Joint desicions

  • Main species table taxonomy (Bjarke, Rasmus)
  • Type of SQL (eg. PostGIS alows for geographical operations such as)
-- Query to find places within a certain distance
SELECT name
FROM places
WHERE ST_Distance(geom, ST_MakePoint(-73.9654, 40.7829)) < 1000;
  • Minimum metadata (right now there is a minimal automatic metadata generation)

Codes and dataset

Questions