A database is a structured set of data. Terminology is a little bit different when working with a database management system compared to working with data in R.
The relationship between R terminology and database terminology is explained below.
R terminology | Database terminology |
---|---|
column | field |
row | record |
data frame | table |
types of columns | table schema |
collection of data frames | database |
SQL (structured query language) allows you to directly interact with a database and perform tasks such as pull data and make updates. R has two packages available that make it easy to run SQL queries: DBI
, RSQLite
.
To get started, load packages DBI
, RSQLite
, and tidyverse
. Install any packages by entering command install.packages("package")
in your Console pane.
Package DBI
provides a database interface definition for communication between R and relational database management systems. Package RSQLite
embeds the ‘SQLite’ database engine in R and provides an interface compliant with the package DBI
.
The below chunk of code will turn data frame mtcars
into a database named mtcarsdb
. In the SQL queries that follow, you will see that the database is referenced by name mtcarsdb
.
At the conclusion of this RMarkdown file there is a code chunk that closes the connection and frees resources. Run that chunk when you complete the document.
con <- dbConnect(SQLite(), ":memory:")
dbWriteTable(con, "mtcarsdb", mtcars)
# see tables
dbListTables(con)
[1] "mtcarsdb"
For each SQL query with function dbGetQuery()
, think about what data is being pulled from the database. Then use object mtcars
in R along with functions in package dplyr
to obtain the same result.
dbGetQuery(con, statement = "SELECT mpg FROM mtcarsdb")
dbGetQuery(con, paste("SELECT * FROM mtcarsdb",
"WHERE mpg > 25")
)
dbGetQuery(con, paste("SELECT * FROM mtcarsdb",
"WHERE mpg > 20",
"ORDER BY cyl")
)
dbGetQuery(con, paste("SELECT mpg, wt, gear FROM mtcarsdb",
"WHERE mpg > 25 AND wt < 3",
"ORDER BY mpg DESC")
)
dbGetQuery(con, paste("SELECT mpg, wt, gear FROM mtcarsdb",
"WHERE mpg > 25 OR wt < 3",
"LIMIT 6")
)
For each SQL query with function dbGetQuery()
, think about what data is being pulled from the database. Then use object mtcars
in R along with functions in package dplyr
to obtain the same result. You will need to make use of summarise()
.
dbGetQuery(con, statement = "SELECT AVG(mpg) FROM mtcarsdb")
dbGetQuery(con, "SELECT COUNT(mpg) FROM mtcarsdb")
dbGetQuery(con, paste("SELECT cyl, AVG(hp) FROM mtcarsdb",
"GROUP BY cyl",
"ORDER BY AVG(hp) DESC")
)
dbGetQuery(con, paste("SELECT cyl, MAX(hp) FROM mtcarsdb",
"GROUP BY cyl",
"ORDER BY AVG(hp) ASC")
)
Use function dbGetQuery()
to write SQL queries for tasks 10 - 14. Make use of function paste()
to neatly format your SQL statement.
Write a SQL query that returns 10 rows of the variables mpg
, gear
, and carb
.
Write a SQL query that returns the variables mpg
, gear
, and carb
ordered in ascending order by carb
.
Write a SQL query that returns the variables mpg
, cyl
, and carb
, where carb
is greater than 2 and cyl
is greater than 4.
Write a SQL query that returns the sum of wt
for each carb
value.
Write a SQL query that returns the mean mpg
, mean hp
, and mean wt
grouped by cyl
and then carb
.
dbDisconnect(con)