MySQL Database Queries and Management using R and the RMySQL Package



May 22, 2017
Prepared by Marcel Merchat



Summary

We examine the project review scores for a group of students that are recorded in a MySQL database. The database is queried and managed by the RMySQL R package that passes Structured Query Language (SQL) commands and queries to it and receive information from the database returns. The SQL commands are highlighted in red.



Table of Contents:

a) Connect to Database
b) Relational Database Structure
c) SQL Commands and Queries
d) Statistics



(a) Connect to Database

We connect to the MySQL database passing the MySQL user name and password. The function below returns an object or handle called “mysqlconnection.

mysqlconnection <- dbConnect(MySQL(), user = 'root', password = 'ialsql1851', dbname = 'studenttest',
                            host = 'localhost')



(b) Relational Database Structure



Tables

The relational database for the students has six tables named absences, classes, reviews, scores, students, and tests. The student review information is in a table called "reviews.”

dbListTables(mysqlconnection)
## [1] "absences" "classes"  "reviews"  "scores"   "students" "tests"



Student Records

We first examine the general table for student information.

print(dbGetQuery(mysqlconnection, "select student_id, first_name, last_name,
                 city, state, zip from students where student_id < 9;"))
##   student_id first_name last_name          city state   zip
## 1          1       Dale    Cooper        Yakima    WA 98901
## 2          2       Andy   Brennan Jacksonvville    NC 28540
## 3          3      Tommy      Hill        Tucson    AZ 85701
## 4          4       Lucy     Moran     Hollywood    CA 98007
## 5          5     Marcel   Merchat       Chicago    IL 60660
## 6          6      Marco   Merchat       Chicago    IL 60660
## 7          7       Marc   Merchat       Chicago    IL 60660
## 8          8     Monica   Merchat       Chicago    IL 60660



The Review Table

This table has four columns or fields of information. Notice the important data type property for each column. The student_id column is designated “PRI” which indicates that it is a primary key for the database.

print(dbGetQuery(mysqlconnection, "Describe reviews;"))
##           Field             Type Null Key Default Extra
## 1    student_id int(10) unsigned   NO PRI    <NA>      
## 2    reviewTime         datetime  YES        <NA>      
## 3 meetingLength          int(11)   NO        <NA>      
## 4         score    decimal(10,0)  YES        <NA>



© SQL Commands and Queries



Review Data

Four of the students have scheduled review appointments. Notice that the Student with ID Number 4 has a review score of 75.

print(dbGetQuery(mysqlconnection, "select * from reviews;"))
##   student_id          reviewTime meetingLength score
## 1          1 2017-05-23 09:10:00            10    88
## 2          2 2017-05-23 09:30:00            10    91
## 3          3 2017-05-23 10:30:00            20    91
## 4          4 2017-05-30 09:00:00            30    80
## 5          5 2017-05-30 09:45:00            30    NA
## 6          6 2017-05-30 10:30:00            30    NA
## 7          7 2017-05-30 11:15:00            30    NA
## 8          8 2017-05-31 09:00:00            20    NA



Change Grade for Student-4

The score of Student Number 4 has been changed from 75 to 80.

dbSendQuery(mysqlconnection, "Update reviews set score=80 where student_id = 4;")
## <MySQLResult:2,0,4>
print(dbGetQuery(mysqlconnection, "select * from reviews;"))
##   student_id          reviewTime meetingLength score
## 1          1 2017-05-23 09:10:00            10    88
## 2          2 2017-05-23 09:30:00            10    91
## 3          3 2017-05-23 10:30:00            20    91
## 4          4 2017-05-30 09:00:00            30    80
## 5          5 2017-05-30 09:45:00            30    NA
## 6          6 2017-05-30 10:30:00            30    NA
## 7          7 2017-05-30 11:15:00            30    NA
## 8          8 2017-05-31 09:00:00            20    NA



Summary Statistics

We exploit basic statistics capability of the MySQL database to compute the average score. We could have imported the scores into the R-program to accomplish this too. But it best to use the database as much as possible for analysis so we would not have to be more concerned about protecting student privacy.



(d) Statistics



Average Student Score

Review_Statistics <- dbGetQuery(mysqlconnection, "Select
                  Max(score) AS Maximum,                 
                  AVG(score) AS Average,
                  MIN(score) AS Minimum
                  from Reviews;")

print(Review_Statistics)
##   Maximum Average Minimum
## 1      91    87.5      80



Histogram of Student Scores

However, as needed we can use the R program to anaylse student data further as in the following histogram.

student_scores <- dbGetQuery(mysqlconnection, "select score from reviews;")
histogram_1

plot of chunk histogram

student_scores <- dbGetQuery(mysqlconnection, "select score from reviews;")

print(student_scores)

score 1 88 2 91 3 91 4 80 5 NA 6 NA 7 NA 8 NA

all_cons <- dbListConnections(MySQL())
for(con in all_cons) 
  dbDisconnect(con)



THE END