Introduction

A database is a structured set of data. Terminology is a little bit different when working with a database management system compared to working with data in R.

  • field: variable or quantity
  • record: collection of fields
  • table: collection of records with all the same fields
  • database: collection of tables

The relationship between R terminology and database terminology is explained below.

R terminology Database terminology
column field
row record
data frame table
types of columns table schema
collection of data frames database

SQL (structured query language) allows you to directly interact with a database and perform tasks such as pull data and make updates. R has two packages available that make it easy to run SQL queries: DBI, RSQLite.

Getting started

To get started, load packages DBI, RSQLite, and tidyverse. Install any packages by entering command install.packages("package") in your Console pane.

Package DBI provides a database interface definition for communication between R and relational database management systems. Package RSQLite embeds the ‘SQLite’ database engine in R and provides an interface compliant with the package DBI.

Open connection

The below chunk of code will turn data frame mtcars into a database named mtcarsdb. In the SQL queries that follow, you will see that the database is referenced by name mtcarsdb.

At the conclusion of this RMarkdown file there is a code chunk that closes the connection and frees resources. Run that chunk when you complete the document.

con <- dbConnect(SQLite(), ":memory:")
dbWriteTable(con, "mtcarsdb", mtcars)

# see tables
dbListTables(con)
[1] "mtcarsdb"

SQL to dplyr

For each SQL query with function dbGetQuery(), think about what data is being pulled from the database. Then use object mtcars in R along with functions in package dplyr to obtain the same result.

Task 1

dbGetQuery(con, statement = "SELECT mpg FROM mtcarsdb")



Task 2

dbGetQuery(con, paste("SELECT * FROM mtcarsdb", 
                      "WHERE mpg > 25")
           )



Task 3

dbGetQuery(con, paste("SELECT * FROM mtcarsdb", 
                      "WHERE mpg > 20", 
                      "ORDER BY cyl")
           )



Task 4

dbGetQuery(con, paste("SELECT mpg, wt, gear FROM mtcarsdb",
                      "WHERE mpg > 25 AND wt < 3", 
                      "ORDER BY mpg DESC")
           )



Task 5

dbGetQuery(con, paste("SELECT mpg, wt, gear FROM mtcarsdb",
                      "WHERE mpg > 25 OR wt < 3",
                      "LIMIT 6")
           )



SQL computations to dplyr

For each SQL query with function dbGetQuery(), think about what data is being pulled from the database. Then use object mtcars in R along with functions in package dplyr to obtain the same result. You will need to make use of summarise().

Task 6

dbGetQuery(con, statement = "SELECT AVG(mpg) FROM mtcarsdb")



Task 7

dbGetQuery(con, "SELECT COUNT(mpg) FROM mtcarsdb")



Task 8

dbGetQuery(con, paste("SELECT cyl, AVG(hp) FROM mtcarsdb",
                      "GROUP BY cyl",
                      "ORDER BY AVG(hp) DESC")
           )



Task 9

dbGetQuery(con, paste("SELECT cyl, MAX(hp) FROM mtcarsdb",
                      "GROUP BY cyl",
                      "ORDER BY AVG(hp) ASC")
           )



SQL queries

Use function dbGetQuery() to write SQL queries for tasks 10 - 14. Make use of function paste() to neatly format your SQL statement.

Task 10

Write a SQL query that returns 10 rows of the variables mpg, gear, and carb.



Task 11

Write a SQL query that returns the variables mpg, gear, and carb ordered in ascending order by carb.



Task 12

Write a SQL query that returns the variables mpg, cyl, and carb, where carb is greater than 2 and cyl is greater than 4.



Task 13

Write a SQL query that returns the sum of wt for each carb value.



Task 14

Write a SQL query that returns the mean mpg, mean hp, and mean wt grouped by cyl and then carb.



Close connection

dbDisconnect(con)