We examine the project review scores for a group of students that are recorded in a MySQL database. The database is queried and managed by the RMySQL R package that passes Structured Query Language (SQL) commands and queries to it and receive information from the database returns. The SQL commands are highlighted in red.
We connect to the MySQL database passing the MySQL user name and password. The function below returns an object or handle called “mysqlconnection.
mysqlconnection <- dbConnect(MySQL(), user = 'root', password = 'ialsql1851', dbname = 'studenttest',
host = 'localhost')
The relational database for the students has six tables named absences, classes, reviews, scores, students, and tests. The student review information is in a table called "reviews.”
dbListTables(mysqlconnection)
## [1] "absences" "classes" "reviews" "scores" "students" "tests"
We first examine the general table for student information.
print(dbGetQuery(mysqlconnection, "select student_id, first_name, last_name,
city, state, zip from students where student_id < 9;"))
## student_id first_name last_name city state zip
## 1 1 Dale Cooper Yakima WA 98901
## 2 2 Andy Brennan Jacksonvville NC 28540
## 3 3 Tommy Hill Tucson AZ 85701
## 4 4 Lucy Moran Hollywood CA 98007
## 5 5 Marcel Merchat Chicago IL 60660
## 6 6 Marco Merchat Chicago IL 60660
## 7 7 Marc Merchat Chicago IL 60660
## 8 8 Monica Merchat Chicago IL 60660
This table has four columns or fields of information. Notice the important data type property for each column. The student_id column is designated “PRI” which indicates that it is a primary key for the database.
print(dbGetQuery(mysqlconnection, "Describe reviews;"))
## Field Type Null Key Default Extra
## 1 student_id int(10) unsigned NO PRI <NA>
## 2 reviewTime datetime YES <NA>
## 3 meetingLength int(11) NO <NA>
## 4 score decimal(10,0) YES <NA>
Four of the students have scheduled review appointments. Notice that the Student with ID Number 4 has a review score of 75.
print(dbGetQuery(mysqlconnection, "select * from reviews;"))
## student_id reviewTime meetingLength score
## 1 1 2017-05-23 09:10:00 10 88
## 2 2 2017-05-23 09:30:00 10 91
## 3 3 2017-05-23 10:30:00 20 91
## 4 4 2017-05-30 09:00:00 30 80
## 5 5 2017-05-30 09:45:00 30 NA
## 6 6 2017-05-30 10:30:00 30 NA
## 7 7 2017-05-30 11:15:00 30 NA
## 8 8 2017-05-31 09:00:00 20 NA
The score of Student Number 4 has been changed from 75 to 80.
dbSendQuery(mysqlconnection, "Update reviews set score=80 where student_id = 4;")
## <MySQLResult:2,0,4>
print(dbGetQuery(mysqlconnection, "select * from reviews;"))
## student_id reviewTime meetingLength score
## 1 1 2017-05-23 09:10:00 10 88
## 2 2 2017-05-23 09:30:00 10 91
## 3 3 2017-05-23 10:30:00 20 91
## 4 4 2017-05-30 09:00:00 30 80
## 5 5 2017-05-30 09:45:00 30 NA
## 6 6 2017-05-30 10:30:00 30 NA
## 7 7 2017-05-30 11:15:00 30 NA
## 8 8 2017-05-31 09:00:00 20 NA
We exploit basic statistics capability of the MySQL database to compute the average score. We could have imported the scores into the R-program to accomplish this too. But it best to use the database as much as possible for analysis so we would not have to be more concerned about protecting student privacy.
Review_Statistics <- dbGetQuery(mysqlconnection, "Select
Max(score) AS Maximum,
AVG(score) AS Average,
MIN(score) AS Minimum
from Reviews;")
print(Review_Statistics)
## Maximum Average Minimum
## 1 91 87.5 80
However, as needed we can use the R program to anaylse student data further as in the following histogram.
student_scores <- dbGetQuery(mysqlconnection, "select score from reviews;")
histogram_1
student_scores <- dbGetQuery(mysqlconnection, "select score from reviews;")
print(student_scores)
score 1 88 2 91 3 91 4 80 5 NA 6 NA 7 NA 8 NA
all_cons <- dbListConnections(MySQL())
for(con in all_cons)
dbDisconnect(con)