Exam 3

library(readr)
currentStudy <- read_csv("D:/Education/Statistics/Exam3/currentStudy.csv")

## New names:
## * `` -> ...1

## Rows: 60 Columns: 4

## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (2): Ins1, Ins2
## dbl (2): ...1, leakage

## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(currentStudy)
hookeSubset <- read_csv("D:/Education/Statistics/Exam3/hookeSubset.csv")

## New names:
## * `` -> ...1

## Rows: 83 Columns: 4

## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): spring
## dbl (3): ...1, F, x

## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(hookeSubset)
PorterWoodwardMa5e_1 <- read_csv("D:/Education/Statistics/Exam3/PorterWoodwardMa5e-1.csv")

## New names:
## * `` -> ...1

## Rows: 14 Columns: 5

## -- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (5): ...1, y, OneOverN, OneOverN3, OneOverN4

## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.

View(PorterWoodwardMa5e_1)

In electrical wiring, the non-zero electrical conductivity allows certain (small amounts) of current to be conducted through the insulating wrapper instead of the conducting wire. This current is then unavail- able for use downstream, so efforts are made to minimize this leakage. In the file currentStudy.csv, current leakage (in Amps) amounts are measured along a 1m stretch of copper wire with up to two layers of insulation of varying types.

Perform an analysis to determine if different mean current leakages are present across the different types of inner insulation. Use α = 0.05.

currStud = aov(leakage ~ as.factor(Ins1), data = currentStudy)
summary(currStud)

##                 Df Sum Sq Mean Sq F value Pr(>F)  
## as.factor(Ins1)  2  0.608 0.30411   3.164 0.0498 *
## Residuals       57  5.478 0.09611                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I compared both insulation one and two to the leakage column, and found that insulation 1 has a p-value less than 0.05, concluding that we cannot accept the hypothesis that there is no difference in leakages whilst using different insulation types. We need to do further testing to find out what the differences are.

Comment on whether or not the assumptions of your model in part a are met.

First we assumed that mean differnece between different types of insulation were zero, using the analysis of variance test. Then we found that there most definetely is a difference in means(from our p-value and alpha). So yes, If our assumptions were that there is a difference in means, hoping that the aov test would fail, then our assumptions have been met.

Comment on which inner insulation types have statistically different means, and by how much.

TukeyHSD(currStud)

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = leakage ~ as.factor(Ins1), data = currentStudy)
## 
## $`as.factor(Ins1)`
##              diff         lwr       upr     p adj
## PE-Nylon   0.2465  0.01058507 0.4824149 0.0386589
## PVC-Nylon  0.1165 -0.11941493 0.3524149 0.4649239
## PVC-PE    -0.1300 -0.36591493 0.1059149 0.3867701

plot(TukeyHSD(currStud), las=1)

wow, R literally just did what I wanted it to do. Impressive.
From looking at our TukeyHSD, we are directly told the difference in means. For the insualtion type PE-Nylon, our mean difference is -> 0.2465, for PVC-Nylon -> 0.1165, and for PVC-PE -> -0.1300. Also, we can look at the plot displayed, on our x-axis we are shown how large the mean difference is from zero.

Perform an analysis to determine if inner and outer insulation types effect current leakage, and if there is an interaction effect between them.

bothIns = aov(leakage ~ as.factor(Ins1)*as.factor(Ins2), data = currentStudy)
summary(bothIns)

##                                 Df Sum Sq Mean Sq F value   Pr(>F)    
## as.factor(Ins1)                  2  0.608  0.3041   4.169   0.0207 *  
## as.factor(Ins2)                  1  1.444  1.4443  19.798 4.34e-05 ***
## as.factor(Ins1):as.factor(Ins2)  2  0.095  0.0473   0.648   0.5270    
## Residuals                       54  3.939  0.0730                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This analysis tells us that both the inner and outer insulations have an effect on the current leakage, but there is no interaction effect between them. We can determine that they don’t have an effect on eachother by looking at the as.factor(Ins1):as.factor(Ins2) p-value of 0.5270 which is much greater than the alpha 0.05.

Construct a plot to support your analysis from d), and be sure to include labels.

plot(bothIns)

##library(ggplot2)
##ggplot(currentStudy)+stat_qq(aes(sample = Ins2*Ins1, col = leakage))+stat_qq_line(aes(sample = Ins2*Ins1, col ##=leakage))

Couldn’t get this to work, tried switching the x and y variables but obviously Ins1*Ins2 is non-numeric.

Hooke’s law relates the force F (in newtons) required to displace a spring by distance x (meters). The calculation of Hooke’s law requires the estimate of a spring constant k given in units of Newtons/meter. The data set hookesSubset.csv contains a set of displacements and force measurements for several springs.

Construct a plot that displays the relationship between measured force and displacement. Be sure to include labels/units.

library(ggplot2)
ggplot(hookeSubset)+stat_qq(aes(sample = x, col = F))+stat_qq_line(aes(sample = x, col =F)) + labs(x = "Distance in meters", y = "Force in Newtons", title = "Relationship between Force and Distance" )

b. Compute your estimate of the spring constant k for the first spring. Report numerical estimates of how well this model seems to fit the data set.

firstSpring <- hookeSubset[c(1:31),c(1:4)]
secondSpring <- hookeSubset[c(32:62),c(1:4)]
thirdSpring <- hookeSubset[c(63:83),c(1:4)]
SpringConstant <- firstSpring[,2]/firstSpring[,3]
##View(SpringConstant)
print(c("Estimation of First Spring Constant K", mean(SpringConstant$F)))

## [1] "Estimation of First Spring Constant K"
## [2] "-1.85927446906265"

Tried putting the spring constant into the firstSpring data set, but got errors.

I may have made a mistake somewhere, the spring constant is negative, which it shouldn’t. But this makes sense since a good portion of our distance in meters for x is negative. Hooke’s law can be written as F = -Kx, in which the output force will be positive. If we really wanted to and has some time, we could find out how to use an absolute value on our distance variable or spring constant K.

Providing numerical results, we can use the spring constant K, found above to compute some forces associated with distances. For this I will just keep it simple and not write a specific code for row and column.(just selecting values in data sets)

force and distance for row 1 of the hookesubset are equal to 0.15 and -0.1, respectively. let’s see how close our estimated spring constant can find actual values from the data set given.

print(c("distance times spring constant is equal to", -0.1 * -1.85927446906265))

## [1] "distance times spring constant is equal to"
## [2] "0.185927446906265"

Our answer deviates 0.03 Newtons from the actual value, this may conflict with quality measurements of said spring.

we could also just use linear modeling to do our work.

Hooke = lm(F ~ x, firstSpring)
summary(Hooke)

## 
## Call:
## lm(formula = F ~ x, data = firstSpring)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.62240 -0.12314  0.02217  0.15489  0.63063 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.01492    0.11489  -0.130    0.898    
## x           -1.84859    0.26219  -7.051 9.36e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2611 on 29 degrees of freedom
## Multiple R-squared:  0.6316, Adjusted R-squared:  0.6189 
## F-statistic: 49.71 on 1 and 29 DF,  p-value: 9.36e-08

Linear modeling gives us F = -1.84859x - 0.01492, which relates to the estimated value of K found earlier.

Construct a 95% prediction interval for force required to displace an object 0.3 meters on spring

tried solvign this problem by going into excel to create a new dataframe including a column which has the x distance as 0.3m. This didn’t work out to well, but here’s my code as well as what i would have used to predict the force required to displace an object 0.3m.

##Book1 <- read_excel("D:/Education/Statistics/Exam3/Book1.xlsx")
##View(Book1)

##predict.lm(Hooke, Book1, interval = "c")

Just watched the linear modeling crash course.

newData = data.frame(x = 0.3, Kvalue.F = -1.84859)
##predict.lm(Hooke, newdata, interval = "c")

Not exactly sure why this isn’t working. - Input a K value, while in the object Hooke, there is no actual K value. I had trouble creating a functional new data frame including a new column with our new k-values. I’m probably over doing this whole problem

firstSpring$Kvalue <- firstSpring[,2]/firstSpring[,3]
View(firstSpring)

what I tried ^

Comment on the assumptions of the model developed for part b. The assumptions made in part b were that the absolute value of the spring constant equals about 1.84859 with some change, the y = mx + b can be found above. I have shown 2 ways ti find the spring constant K, one more exact than the other, but both reasonably acceptable. The assumptions of the y - intercept can be accepted since the p-value is greater than 0.05, but we might have to dismiss the hypothesis that our k value is equal to 1.84859, due to our p-value being less than 0.05.
Construct a new model and perform a hypothesis test to assess if the different springs in the dataset have different spring constants to a α = 0.01 level.

newModel = lm(F ~ x, hookeSubset)
summary(newModel)

## 
## Call:
## lm(formula = F ~ x, data = hookeSubset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.74926 -0.21953 -0.00601  0.19426  0.89764 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.04218    0.08678  -0.486    0.628    
## x           -1.88737    0.19793  -9.536 6.95e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3238 on 81 degrees of freedom
## Multiple R-squared:  0.5289, Adjusted R-squared:  0.5231 
## F-statistic: 90.93 on 1 and 81 DF,  p-value: 6.949e-15

From looking at our single first spring problem, for both cases, we have a p-value greater than our alpha, so we will accept. The y-intercept is relatively close to the value found for our y equation in b. For all springs combined, our k-value doesn’t deviate to far from our k-value found in problem b.

ggplot(hookeSubset, aes(x=F, y = x )) + geom_smooth(method = "lm" ) + geom_point()

## `geom_smooth()` using formula 'y ~ x'

ggplot(firstSpring, aes(x=F, y = x )) + geom_smooth(method = "lm" ) + geom_point()

## `geom_smooth()` using formula 'y ~ x'

Depending on how exact we want to be, I think i would accept that the spring constants are relatively the same. - looking at both plots separately and then together, we notice very similar reactions, just more evidence to help our claim.

In the publication High-resolution simulations of compressible convection using the piece wise-parabolic method , the authors propose an analysis scheme to estimate the effects of numerical dissipation on a sound wave. As the the piece wise-parabolic method is nominally third order accurate (that is, errors depend on the third power of the grid size), they fit the following curve. y = Ac(1/N)^3+ Bc(1/N)^4

Create new variables in the dataframe that are 1/N to the third and fourth power

a little confused, those columns are already in the dataframe given.

PorterWoodwardMa5e_1$Nexp3 <- PorterWoodwardMa5e_1[,3]^3
PorterWoodwardMa5e_1$Nexp4 <- PorterWoodwardMa5e_1[,3]^4
View(PorterWoodwardMa5e_1)

Estimate Ac and Bc along with their 95% confidence intervals -estimating, so maybe use linear modeling? potentially

##pWood1 = lm(y ~ Nexp3.OneOverN3 + Nexp4.OneOverN4, PorterWoodwardMa5e_1)
## - mine did'nt work, assuming that's why you provided those columns. 
pWood = lm(y ~ OneOverN3 + OneOverN4, PorterWoodwardMa5e_1)
summary(pWood)

## 
## Call:
## lm(formula = y ~ OneOverN3 + OneOverN4, data = PorterWoodwardMa5e_1)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -0.0133141 -0.0073105  0.0001073  0.0070423  0.0132717 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.046e-02  3.125e-03   3.347  0.00652 ** 
## OneOverN3    2.084e+02  1.571e+01  13.270 4.11e-08 ***
## OneOverN4   -8.988e+02  9.632e+01  -9.331 1.47e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.009252 on 11 degrees of freedom
## Multiple R-squared:  0.9899, Adjusted R-squared:  0.988 
## F-statistic: 538.4 on 2 and 11 DF,  p-value: 1.063e-11

coef(pWood)

##   (Intercept)     OneOverN3     OneOverN4 
##    0.01045945  208.42488133 -898.76713999

Watched a youtube video and said these are my estimated regression coefficients. –now we can determine the confidence interval of these estimates.

confint(pWood)

##                     2.5 %        97.5 %
## (Intercept)  3.580482e-03    0.01733842
## OneOverN3    1.738565e+02  242.99329879
## OneOverN4   -1.110756e+03 -686.77789326

For our intercept, we have a confidence interval of 3.58x10^-3 to 0.01733 and so on for OneOverN3 and OneOverN4

we can now change the level of the confidence interval to what was asked (95%)

confint(pWood, level = 0.95)

##                     2.5 %        97.5 %
## (Intercept)  3.580482e-03    0.01733842
## OneOverN3    1.738565e+02  242.99329879
## OneOverN4   -1.110756e+03 -686.77789326

– 97.5 - 2.5 = 95.. Interesting, our confidence interval value!

Comment on whether or not such a model seems to fit the data appropriately.

plot(pWood)

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

wow. uh.. These plots look absolutely insane. You mentioned that for our data to be reasonable or maybe even linear? that we would like to see the red line to be relatively straight and horizontal. Looking at all of these plots, we notice that none of the red lines associated are straight at all. Maybe this is due to the function being non-linear. our Q-Q plot looks very nice and contained, I wonder what happened.

Exam 3

Pascal Elsinghorst

12/18/2021