Importing Data into R Project

Question 1

For this question you will need to read the excel_data.xlsx file into R using the readxl package. There are two sheets in this spreadsheet file named “Sheet1” and “Sheet2”. Use the readxl package to read in “Sheet2” from the spreadsheet file directly and answer the question below. Recall that the [[ operator can be used to subset a column of tibble or data frame.

What is the mean of the column labeled “X12” in “Sheet2”? You can use the mean() function to compute the mean of a column. (Choose the closest answer.)

library(tidyverse)
library(readxl)
exceldata1 <- read_excel("excel_data.xlsx",sheet="Sheet1")
exceldata2 <- read_excel("excel_data.xlsx",sheet="Sheet2")
mean(exceldata2$X12)

## [1] -4.028386

Question 2

Continuing from Question 1 above, use the readxl package to read in both “Sheet1” and “Sheet2” from the excel_data.xlsx file.

What is the correlation between column “X5” in “Sheet1” and column “X8” in “Sheet2”? Use the cor() function to compute the correlation between two columns.

X <- exceldata1$X5
Y <- exceldata2$X8
cor(X,Y)

## [1] -0.03718773

Question 3

For this question you will need to read in the database file sqlite_data.db using the RSQLite package. In this database file there is table named “table1”. You will need to read that table to answer this question.

The “ID” column in “table1” serves as and identification number for elements in the database table. What is the correlation between columns “S2” and “S3” for rows with ID equal to 8 only? (Hint: There should be 100 rows where ID = 8.)

library(RSQLite)
hw_db <- dbConnect(RSQLite::SQLite(), "sqlite_data.db")
thisdata <- dbGetQuery(hw_db, "SELECT * FROM table1")

thisdata %>% group_by(ID) %>% summarise(mycor=cor(S2,S3))

## # A tibble: 20 x 2
##       ID    mycor
##    <dbl>    <dbl>
##  1     1  0.162  
##  2     2  0.0968 
##  3     3  0.203  
##  4     4  0.0546 
##  5     5  0.208  
##  6     6  0.199  
##  7     7  0.229  
##  8     8  0.219  
##  9     9  0.305  
## 10    10  0.310  
## 11    11  0.173  
## 12    12  0.387  
## 13    13  0.124  
## 14    14  0.282  
## 15    15  0.293  
## 16    16  0.00499
## 17    17  0.120  
## 18    18  0.276  
## 19    19 NA      
## 20    20 NA

The correct answer is 0.219

Question 4

For this question you need to read in “Sheet2” from the excel_data.xlsx file using the readxl package and the data from the table2.json file using the jsonlite package.

Then you need to inner join the two tables by their corresponding ID columns to create a new data frame. What is the mean of column “J2” in the joined data frame?

# install.packages("rjson")
library("rjson")
# Give the input file name to the function.
json_data <- fromJSON(file="table2.json")
# Print the result.

json_as_df <- enframe(unlist(json_data))
json_as_df <- json_as_df %>% pivot_wider(names_from = "name",values_from = "value",values_fn = list) %>% 
unnest(c(ID, J1, J2, J3, J4))

json_as_df %>% head(4)

## # A tibble: 4 x 5
##      ID     J1    J2    J3    J4
##   <dbl>  <dbl> <dbl> <dbl> <dbl>
## 1     1  -8.22  7.41  9.28 3.33 
## 2     2 -10.8   7.30 10.5  1.77 
## 3     3  -9.34  8.40  9.52 0.336
## 4     4  -7.82  6.52  9.94 4.00

joindata <- inner_join(exceldata2,json_as_df,by="ID")

## Answer to Question 4
mean(joindata$J2)

## [1] 6.936054

Question 5

Continuing from Question 4 above, what is the correlation between column “X2” and column “J4” in the joined data frame?

## Answer to Question 5
cor(joindata$X2,joindata$J4)

## [1] 0.09077