A database is a structured set of data. Terminology is a little bit different when working with a database management system compared to working with data in R.
The relationship between R terminology and database terminology is explained below.
| R terminology | Database terminology |
|---|---|
| column | field |
| row | record |
| data frame | table |
| types of columns | table schema |
| collection of data frames | database |
SQL (structured query language) allows you to directly interact with a database and perform tasks such as pull data and make updates. R has two packages available that make it easy to run SQL queries: DBI, RSQLite.
To get started, load packages DBI, RSQLite, and tidyverse. Install any packages by entering command install.packages("package") in your Console pane.
Package DBI provides a database interface definition for communication between R and relational database management systems. Package RSQLite embeds the ‘SQLite’ database engine in R and provides an interface compliant with the package DBI.
The below chunk of code will turn data frame mtcars into a database named mtcarsdb. In the SQL queries that follow, you will see that the database is referenced by name mtcarsdb.
At the conclusion of this RMarkdown file there is a code chunk that closes the connection and frees resources. Run that chunk when you complete the document.
con <- dbConnect(SQLite(), ":memory:")
dbWriteTable(con, "mtcarsdb", mtcars)
# see tables
dbListTables(con)
[1] "mtcarsdb"
For each SQL query with function dbGetQuery(), think about what data is being pulled from the database. Then use object mtcars in R along with functions in package dplyr to obtain the same result.
dbGetQuery(con, statement = "SELECT mpg FROM mtcarsdb")
dbGetQuery(con, paste("SELECT * FROM mtcarsdb",
"WHERE mpg > 25")
)
dbGetQuery(con, paste("SELECT * FROM mtcarsdb",
"WHERE mpg > 20",
"ORDER BY cyl")
)
dbGetQuery(con, paste("SELECT mpg, wt, gear FROM mtcarsdb",
"WHERE mpg > 25 AND wt < 3",
"ORDER BY mpg DESC")
)
dbGetQuery(con, paste("SELECT mpg, wt, gear FROM mtcarsdb",
"WHERE mpg > 25 OR wt < 3",
"LIMIT 6")
)
For each SQL query with function dbGetQuery(), think about what data is being pulled from the database. Then use object mtcars in R along with functions in package dplyr to obtain the same result. You will need to make use of summarise().
dbGetQuery(con, statement = "SELECT AVG(mpg) FROM mtcarsdb")
dbGetQuery(con, "SELECT COUNT(mpg) FROM mtcarsdb")
dbGetQuery(con, paste("SELECT cyl, AVG(hp) FROM mtcarsdb",
"GROUP BY cyl",
"ORDER BY AVG(hp) DESC")
)
dbGetQuery(con, paste("SELECT cyl, MAX(hp) FROM mtcarsdb",
"GROUP BY cyl",
"ORDER BY AVG(hp) ASC")
)
Use function dbGetQuery() to write SQL queries for tasks 10 - 14. Make use of function paste() to neatly format your SQL statement.
Write a SQL query that returns 10 rows of the variables mpg, gear, and carb.
Write a SQL query that returns the variables mpg, gear, and carb ordered in ascending order by carb.
Write a SQL query that returns the variables mpg, cyl, and carb, where carb is greater than 2 and cyl is greater than 4.
Write a SQL query that returns the sum of wt for each carb value.
Write a SQL query that returns the mean mpg, mean hp, and mean wt grouped by cyl and then carb.
dbDisconnect(con)