library(readr)
currentStudy <- read_csv("D:/Education/Statistics/Exam3/currentStudy.csv")
## New names:
## * `` -> ...1
## Rows: 60 Columns: 4
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (2): Ins1, Ins2
## dbl (2): ...1, leakage
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(currentStudy)
hookeSubset <- read_csv("D:/Education/Statistics/Exam3/hookeSubset.csv")
## New names:
## * `` -> ...1
## Rows: 83 Columns: 4
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): spring
## dbl (3): ...1, F, x
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(hookeSubset)
PorterWoodwardMa5e_1 <- read_csv("D:/Education/Statistics/Exam3/PorterWoodwardMa5e-1.csv")
## New names:
## * `` -> ...1
## Rows: 14 Columns: 5
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (5): ...1, y, OneOverN, OneOverN3, OneOverN4
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(PorterWoodwardMa5e_1)
currStud = aov(leakage ~ as.factor(Ins1), data = currentStudy)
summary(currStud)
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(Ins1) 2 0.608 0.30411 3.164 0.0498 *
## Residuals 57 5.478 0.09611
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I compared both insulation one and two to the leakage column, and found that insulation 1 has a p-value less than 0.05, concluding that we cannot accept the hypothesis that there is no difference in leakages whilst using different insulation types. We need to do further testing to find out what the differences are.
First we assumed that mean differnece between different types of insulation were zero, using the analysis of variance test. Then we found that there most definetely is a difference in means(from our p-value and alpha). So yes, If our assumptions were that there is a difference in means, hoping that the aov test would fail, then our assumptions have been met.
TukeyHSD(currStud)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = leakage ~ as.factor(Ins1), data = currentStudy)
##
## $`as.factor(Ins1)`
## diff lwr upr p adj
## PE-Nylon 0.2465 0.01058507 0.4824149 0.0386589
## PVC-Nylon 0.1165 -0.11941493 0.3524149 0.4649239
## PVC-PE -0.1300 -0.36591493 0.1059149 0.3867701
plot(TukeyHSD(currStud), las=1)
wow, R literally just did what I wanted it to do. Impressive.
From looking at our TukeyHSD, we are directly told the difference in means. For the insualtion type PE-Nylon, our mean difference is -> 0.2465, for PVC-Nylon -> 0.1165, and for PVC-PE -> -0.1300. Also, we can look at the plot displayed, on our x-axis we are shown how large the mean difference is from zero.
bothIns = aov(leakage ~ as.factor(Ins1)*as.factor(Ins2), data = currentStudy)
summary(bothIns)
## Df Sum Sq Mean Sq F value Pr(>F)
## as.factor(Ins1) 2 0.608 0.3041 4.169 0.0207 *
## as.factor(Ins2) 1 1.444 1.4443 19.798 4.34e-05 ***
## as.factor(Ins1):as.factor(Ins2) 2 0.095 0.0473 0.648 0.5270
## Residuals 54 3.939 0.0730
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This analysis tells us that both the inner and outer insulations have an effect on the current leakage, but there is no interaction effect between them. We can determine that they don’t have an effect on eachother by looking at the as.factor(Ins1):as.factor(Ins2) p-value of 0.5270 which is much greater than the alpha 0.05.
plot(bothIns)
##library(ggplot2)
##ggplot(currentStudy)+stat_qq(aes(sample = Ins2*Ins1, col = leakage))+stat_qq_line(aes(sample = Ins2*Ins1, col ##=leakage))
Couldn’t get this to work, tried switching the x and y variables but obviously Ins1*Ins2 is non-numeric.
library(ggplot2)
ggplot(hookeSubset)+stat_qq(aes(sample = x, col = F))+stat_qq_line(aes(sample = x, col =F)) + labs(x = "Distance in meters", y = "Force in Newtons", title = "Relationship between Force and Distance" )
b. Compute your estimate of the spring constant k for the first spring. Report numerical estimates of how well this model seems to fit the data set.
firstSpring <- hookeSubset[c(1:31),c(1:4)]
secondSpring <- hookeSubset[c(32:62),c(1:4)]
thirdSpring <- hookeSubset[c(63:83),c(1:4)]
SpringConstant <- firstSpring[,2]/firstSpring[,3]
##View(SpringConstant)
print(c("Estimation of First Spring Constant K", mean(SpringConstant$F)))
## [1] "Estimation of First Spring Constant K"
## [2] "-1.85927446906265"
Tried putting the spring constant into the firstSpring data set, but got errors.
I may have made a mistake somewhere, the spring constant is negative, which it shouldn’t. But this makes sense since a good portion of our distance in meters for x is negative. Hooke’s law can be written as F = -Kx, in which the output force will be positive. If we really wanted to and has some time, we could find out how to use an absolute value on our distance variable or spring constant K.
Providing numerical results, we can use the spring constant K, found above to compute some forces associated with distances. For this I will just keep it simple and not write a specific code for row and column.(just selecting values in data sets)
force and distance for row 1 of the hookesubset are equal to 0.15 and -0.1, respectively. let’s see how close our estimated spring constant can find actual values from the data set given.
print(c("distance times spring constant is equal to", -0.1 * -1.85927446906265))
## [1] "distance times spring constant is equal to"
## [2] "0.185927446906265"
Our answer deviates 0.03 Newtons from the actual value, this may conflict with quality measurements of said spring.
we could also just use linear modeling to do our work.
Hooke = lm(F ~ x, firstSpring)
summary(Hooke)
##
## Call:
## lm(formula = F ~ x, data = firstSpring)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.62240 -0.12314 0.02217 0.15489 0.63063
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.01492 0.11489 -0.130 0.898
## x -1.84859 0.26219 -7.051 9.36e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2611 on 29 degrees of freedom
## Multiple R-squared: 0.6316, Adjusted R-squared: 0.6189
## F-statistic: 49.71 on 1 and 29 DF, p-value: 9.36e-08
Linear modeling gives us F = -1.84859x - 0.01492, which relates to the estimated value of K found earlier.
tried solvign this problem by going into excel to create a new dataframe including a column which has the x distance as 0.3m. This didn’t work out to well, but here’s my code as well as what i would have used to predict the force required to displace an object 0.3m.
##Book1 <- read_excel("D:/Education/Statistics/Exam3/Book1.xlsx")
##View(Book1)
##predict.lm(Hooke, Book1, interval = "c")
Just watched the linear modeling crash course.
newData = data.frame(x = 0.3, Kvalue.F = -1.84859)
##predict.lm(Hooke, newdata, interval = "c")
Not exactly sure why this isn’t working. - Input a K value, while in the object Hooke, there is no actual K value. I had trouble creating a functional new data frame including a new column with our new k-values. I’m probably over doing this whole problem
firstSpring$Kvalue <- firstSpring[,2]/firstSpring[,3]
View(firstSpring)
what I tried ^
Comment on the assumptions of the model developed for part b. The assumptions made in part b were that the absolute value of the spring constant equals about 1.84859 with some change, the y = mx + b can be found above. I have shown 2 ways ti find the spring constant K, one more exact than the other, but both reasonably acceptable. The assumptions of the y - intercept can be accepted since the p-value is greater than 0.05, but we might have to dismiss the hypothesis that our k value is equal to 1.84859, due to our p-value being less than 0.05.
Construct a new model and perform a hypothesis test to assess if the different springs in the dataset have different spring constants to a α = 0.01 level.
newModel = lm(F ~ x, hookeSubset)
summary(newModel)
##
## Call:
## lm(formula = F ~ x, data = hookeSubset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.74926 -0.21953 -0.00601 0.19426 0.89764
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.04218 0.08678 -0.486 0.628
## x -1.88737 0.19793 -9.536 6.95e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3238 on 81 degrees of freedom
## Multiple R-squared: 0.5289, Adjusted R-squared: 0.5231
## F-statistic: 90.93 on 1 and 81 DF, p-value: 6.949e-15
From looking at our single first spring problem, for both cases, we have a p-value greater than our alpha, so we will accept. The y-intercept is relatively close to the value found for our y equation in b. For all springs combined, our k-value doesn’t deviate to far from our k-value found in problem b.
ggplot(hookeSubset, aes(x=F, y = x )) + geom_smooth(method = "lm" ) + geom_point()
## `geom_smooth()` using formula 'y ~ x'
ggplot(firstSpring, aes(x=F, y = x )) + geom_smooth(method = "lm" ) + geom_point()
## `geom_smooth()` using formula 'y ~ x'
Depending on how exact we want to be, I think i would accept that the spring constants are relatively the same. - looking at both plots separately and then together, we notice very similar reactions, just more evidence to help our claim.
PorterWoodwardMa5e_1$Nexp3 <- PorterWoodwardMa5e_1[,3]^3
PorterWoodwardMa5e_1$Nexp4 <- PorterWoodwardMa5e_1[,3]^4
View(PorterWoodwardMa5e_1)
##pWood1 = lm(y ~ Nexp3.OneOverN3 + Nexp4.OneOverN4, PorterWoodwardMa5e_1)
## - mine did'nt work, assuming that's why you provided those columns.
pWood = lm(y ~ OneOverN3 + OneOverN4, PorterWoodwardMa5e_1)
summary(pWood)
##
## Call:
## lm(formula = y ~ OneOverN3 + OneOverN4, data = PorterWoodwardMa5e_1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0133141 -0.0073105 0.0001073 0.0070423 0.0132717
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.046e-02 3.125e-03 3.347 0.00652 **
## OneOverN3 2.084e+02 1.571e+01 13.270 4.11e-08 ***
## OneOverN4 -8.988e+02 9.632e+01 -9.331 1.47e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.009252 on 11 degrees of freedom
## Multiple R-squared: 0.9899, Adjusted R-squared: 0.988
## F-statistic: 538.4 on 2 and 11 DF, p-value: 1.063e-11
coef(pWood)
## (Intercept) OneOverN3 OneOverN4
## 0.01045945 208.42488133 -898.76713999
confint(pWood)
## 2.5 % 97.5 %
## (Intercept) 3.580482e-03 0.01733842
## OneOverN3 1.738565e+02 242.99329879
## OneOverN4 -1.110756e+03 -686.77789326
For our intercept, we have a confidence interval of 3.58x10^-3 to 0.01733 and so on for OneOverN3 and OneOverN4
we can now change the level of the confidence interval to what was asked (95%)
confint(pWood, level = 0.95)
## 2.5 % 97.5 %
## (Intercept) 3.580482e-03 0.01733842
## OneOverN3 1.738565e+02 242.99329879
## OneOverN4 -1.110756e+03 -686.77789326
– 97.5 - 2.5 = 95.. Interesting, our confidence interval value!
plot(pWood)
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
wow. uh.. These plots look absolutely insane. You mentioned that for our data to be reasonable or maybe even linear? that we would like to see the red line to be relatively straight and horizontal. Looking at all of these plots, we notice that none of the red lines associated are straight at all. Maybe this is due to the function being non-linear. our Q-Q plot looks very nice and contained, I wonder what happened.