This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
We will begin by reading the CSV file for our Chicago Train Data, identifying the proper variables for Question 5.
mydata = read.csv(file="Chicago_Public_Schools_Train.csv")
class(mydata)
## [1] "data.frame"
head(mydata)
safetyscore = mydata$Safety.Score
familyinvolve = mydata$Family.Involvement.Score
Now I will add a chunk of code that allows me to do problem 5, where we must calculate the correlation coefficient of Safety Score and Family Involvement Score.
#Question 5
cor(safetyscore, familyinvolve)
## [1] 0.7144638
For this code, we can see that there is a very solid positive correlation between our two variables. This could be indicative of a slight causal relationship, but this cannot be said for sure, only the decently strong correlation can be confirmed from this calculation.
Now I will insert the necessary code for calculating the signal to noise ratio for the variable Environment Score.
#Question 6
environscore = mydata$Environment.Score
envmean = mean(environscore)
envsd = sd(environscore)
signoiseenv = envmean/envsd
signoiseenv
## [1] 2.953136
This signal to noise ratio for Environment Score means that for every instance of a “noise” observation, there are 3 that can be considered signals. This is a good signal to noise ratio, although it could be higher. The worst possible scenario in this case may have been to have had a signal to noise that was less than 1.
Now for Question 7 I will identify the necessary variables for the calculation of outliers in the variable ISAT Exceeding Math. Then I will calulate the upper and lower limits for the variable and use the min and max function to see if our minimum and maximum exceed the values of our limits.
#Question 7
exmath = mydata$ISAT.Exceeding.Math
avgexmath = mean(exmath)
sdexmath = sd(exmath)
upperexmath = avgexmath+(3*sdexmath)
lowerexmath = avgexmath-(3*sdexmath)
upperexmath
## [1] 73.77911
lowerexmath
## [1] -30.12311
max(exmath)
## [1] 92.8
min(exmath)
## [1] 3.2
After conducting our calculations, we can see that there are outliers in the upper end of our data. So we should conduct a data frame to fish out our identification of outliers.
#Question 7 Continued
exmath_df = data.frame(exmath)
exmath_df
After manually checking the data frame, we can see that 75.1 and 92.8 were the only outliers in our data.
Now for Question 8, we will identify all necessary variables, create the two linear regression models, and then pull the necessary values for comparison.
#Question 8
#Reuse exmath from 7 safetyscore from 5
penvscore = mydata$Parent.Environment.Score
rmisconducts = mydata$Rate.of.Misconducts
teacherscore = mydata$Teachers.Score
regression1 = lm(exmath ~ safetyscore+penvscore)
regression2 = lm(exmath ~ rmisconducts+teacherscore)
summary(regression1)
##
## Call:
## lm(formula = exmath ~ safetyscore + penvscore)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.032 -7.241 -1.196 5.276 47.839
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18.40039 14.27842 1.289 0.2006
## safetyscore 0.60722 0.05855 10.371 <2e-16 ***
## penvscore -0.55923 0.28658 -1.951 0.0539 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.04 on 97 degrees of freedom
## Multiple R-squared: 0.5264, Adjusted R-squared: 0.5166
## F-statistic: 53.9 on 2 and 97 DF, p-value: < 2.2e-16
summary(regression2)
##
## Call:
## lm(formula = exmath ~ rmisconducts + teacherscore)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.319 -8.709 -2.169 3.127 58.882
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11.83466 5.33293 2.219 0.02881 *
## rmisconducts -0.25621 0.07865 -3.258 0.00155 **
## teacherscore 0.28462 0.08754 3.251 0.00158 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.18 on 97 degrees of freedom
## Multiple R-squared: 0.2476, Adjusted R-squared: 0.2321
## F-statistic: 15.96 on 2 and 97 DF, p-value: 1.018e-06
From the above data, we can now create our regression equations:
Regression Equation 2: y= 11.83466-0.25621(xrmisconducts)+0.28462(xteacherscore) R-Squared: 0.2476, Adjusted R-Squared: 0.2321
#This code will be used for question 8.c.
exmath_predict = coef(regression1)[1] + coef(regression1)[2]*(42) + coef(regression1)[3]*(55)
exmath_predict
## (Intercept)
## 13.14597
#Answer in this case is 13.14597
Extra Credit a) We can see from the model and the rsquared values, as well as the correlation coefficients, that there is no strong causal relationship between either model and ISAT Exceeding Math. There is hardly even a correlation in model 2, and model 1 barely makes it past .5 in its rsquared, meaning there is weak positive correlation, and barely any evidence to indicate a causal relationship.
plot(exmath, safetyscore)
scatter.smooth(exmath, safetyscore)
plot(exmath)
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.