Author: Madison DeMaio

Class: Spring 2020

Institution: Roger Williams University

Project Overview

In this project, students are going to use basic R functionality to calculate statistical measures and perform basic data visualization.

Project Expectations

The following are the key expecations from this project:

Question 1: A tire manufacturer wants to determine the inner diameter of a certain grade of tire. Ideally, the diameter would be 570 mm. The data are as follows:

572 572 573 568 569 575 565 570

data<- c(572, 572, 573, 568, 569, 575, 565, 570)

data
## [1] 572 572 573 568 569 575 565 570
print("mean")
## [1] "mean"
mean(data)
## [1] 570.5
data<- c(572, 572, 573, 568, 569, 575, 565, 570)

data
## [1] 572 572 573 568 569 575 565 570
print("The sample variance of the data provided is")
## [1] "The sample variance of the data provided is"
var(data)
## [1] 10
data
## [1] 572 572 573 568 569 575 565 570
print("The standard deviation of the data provided is")
## [1] "The standard deviation of the data provided is"
sd(data)
## [1] 3.162278
data
## [1] 572 572 573 568 569 575 565 570
print("The range of the data provided is")
## [1] "The range of the data provided is"
max(data)-min(data)
## [1] 10

*Question 2: Given the following sample of fuel efficiencies of 6-cylinder vehicles (in miles/gallon)**

21.0, 15.0, 21.0, 21.4, 18.1, 19.2, 17.8, 19.7, 13.0, 35.0

mpg<- c(21.0, 15.0, 21.0, 21.4, 18.1, 19.2, 17.8, 19.7, 13.0, 35.0)

mpg
##  [1] 21.0 15.0 21.0 21.4 18.1 19.2 17.8 19.7 13.0 35.0
print("The sample range of the data is")
## [1] "The sample range of the data is"
max(mpg)-min(mpg)
## [1] 22
mpg
##  [1] 21.0 15.0 21.0 21.4 18.1 19.2 17.8 19.7 13.0 35.0
print("The minimum is")
## [1] "The minimum is"
min(mpg)
## [1] 13
print("The first quartile from the data is")
## [1] "The first quartile from the data is"
quantile(mpg, .25)
##    25% 
## 17.875
print("The median of the data is")
## [1] "The median of the data is"
median(mpg)
## [1] 19.45
print("The third quartile of the data is")
## [1] "The third quartile of the data is"
quantile(mpg, .75)
## 75% 
##  21
print("The maximum is")
## [1] "The maximum is"
max(mpg)
## [1] 35
print("Though given manually, the simple way to do this is by using the fivenum code which is provided below")
## [1] "Though given manually, the simple way to do this is by using the fivenum code which is provided below"
fivenum(mpg)
## [1] 13.00 17.80 19.45 21.00 35.00
mpg
##  [1] 21.0 15.0 21.0 21.4 18.1 19.2 17.8 19.7 13.0 35.0
print("The interquartile range of the data is")
## [1] "The interquartile range of the data is"
IQR(mpg)
## [1] 3.125
mpg
##  [1] 21.0 15.0 21.0 21.4 18.1 19.2 17.8 19.7 13.0 35.0
print("The mild lower and upper bounds respectibly are")
## [1] "The mild lower and upper bounds respectibly are"
(17.875-(1.5*3.125))
## [1] 13.1875
(21+(1.5*3.125))
## [1] 25.6875
print("The extreme lower and upper bounds respectibly are")
## [1] "The extreme lower and upper bounds respectibly are"
(17.875-(3.5*3.125))
## [1] 6.9375
(21+(3.5*3.125))
## [1] 31.9375
print("Therefore, the only outlier we have is 13 and 35")
## [1] "Therefore, the only outlier we have is 13 and 35"
print("The boxplot of the mpg data is")
## [1] "The boxplot of the mpg data is"
boxplot(mpg)

Question 3: The data for blood mercury concentration \((\mu_g/g)\) for adult females near contaminated rivers in Virginia are as following

0.20 , 0.22 , 0.25 , 0.30, 0.34 , 0.41 , 0.55 , 0.56 , 1.42 , 1.70, 1.83 , 2.20 , 2.25, 3.07 , 3.25

bmerc<- c(0.20 , 0.22 ,  0.25 , 0.30,  0.34 , 0.41 , 0.55 , 0.56 , 1.42 , 1.70,  1.83 , 2.20 , 2.25,  3.07 , 3.25)



bmerc
##  [1] 0.20 0.22 0.25 0.30 0.34 0.41 0.55 0.56 1.42 1.70 1.83 2.20 2.25 3.07 3.25
print("The sample variance of the blood mercury data is")
## [1] "The sample variance of the blood mercury data is"
var(bmerc)
## [1] 1.167552
print("The sample standard deviation of the blood mercury data is")
## [1] "The sample standard deviation of the blood mercury data is"
sd(bmerc)
## [1] 1.080533
bmerc
##  [1] 0.20 0.22 0.25 0.30 0.34 0.41 0.55 0.56 1.42 1.70 1.83 2.20 2.25 3.07 3.25
print("The five number summary of the blood mercury data is")
## [1] "The five number summary of the blood mercury data is"
fivenum(bmerc)
## [1] 0.200 0.320 0.560 2.015 3.250
print("The box plot of the blood mercury data is")
## [1] "The box plot of the blood mercury data is"
boxplot(bmerc)

By using the data from the fivenum summary, we can apply the formulas to find the outliers

print("Here we show the five number summary for the blood mercury data found in the previous problem")
## [1] "Here we show the five number summary for the blood mercury data found in the previous problem"
fivenum(bmerc)
## [1] 0.200 0.320 0.560 2.015 3.250
print("From the five number summary, we know that Q1 is .32 and Q3 is 2.015. Now, we will find the IQR and then apply the formulas to find the outliers" )
## [1] "From the five number summary, we know that Q1 is .32 and Q3 is 2.015. Now, we will find the IQR and then apply the formulas to find the outliers"
print("The IQR of the blood mercury is")
## [1] "The IQR of the blood mercury is"
IQR(bmerc)
## [1] 1.695
print("The mild lower and upper bounds respectibly are")
## [1] "The mild lower and upper bounds respectibly are"
(.32-(1.5*1.695))
## [1] -2.2225
(2.015+(1.5*1.695))
## [1] 4.5575
print("The extreme lower and upper bounds respectibly are")
## [1] "The extreme lower and upper bounds respectibly are"
(.32-(3.5*1.695))
## [1] -5.6125
(2.015+(3.5*1.695))
## [1] 7.9475
print("Therefore there is no outliers for this data")
## [1] "Therefore there is no outliers for this data"

Question 4: Exposure to microbial products, especially endotoxin, may have an impact on vulnerability to allergic diseases. The following are data on concentration (EU/mg) in settled dust for one sample of urban homes and another of farm homes

Urban: 6.0 5.0 11.0 33.0 4.0 5.0 86.0 18.0 35.0 17.0 23.0

Farm: 4.0 14.0 11.0 9.0 9.0 8.0 4.0 20.0 5.0 8.9 21.0 9.2 3.0 2.0 0.3

Urban<- c(6.0,  5.0,    11.0,   33.0,   4.0,    5.0,    86.0,   18.0,   35.0,   17.0,   23.0)

Farm<- c(4.0,   14.0,   11.0,   9.0,    9.0,    8.0,    4.0,    20.0,   5.0,    8.9,    21.0,   9.2,    3.0,    2.0,    0.3)

Urban 
##  [1]  6  5 11 33  4  5 86 18 35 17 23
print("The standard deviation of the Urban sample is")
## [1] "The standard deviation of the Urban sample is"
sd(Urban)
## [1] 23.88914
Farm
##  [1]  4.0 14.0 11.0  9.0  9.0  8.0  4.0 20.0  5.0  8.9 21.0  9.2  3.0  2.0  0.3
print("The standard deviation of the Farm sample is")
## [1] "The standard deviation of the Farm sample is"
sd(Farm)
## [1] 6.087669
print("The Urban standard deviation shows that the data is spread out over a larger range while the Farm standard deviation shows that the data are all close together and close to the mean. From the Urban and Farm data, we can see that the Urban area has less of a consistency of dust samples than the Farm area. ")
## [1] "The Urban standard deviation shows that the data is spread out over a larger range while the Farm standard deviation shows that the data are all close together and close to the mean. From the Urban and Farm data, we can see that the Urban area has less of a consistency of dust samples than the Farm area. "
print("The interquartile range of the Urban data is")
## [1] "The interquartile range of the Urban data is"
IQR(Urban)
## [1] 22.5
print("The interquartile range of the Farm data is")
## [1] "The interquartile range of the Farm data is"
IQR(Farm)
## [1] 6.1
print("The message conveyed from the standard deviation and the range share a similar message of their contrast")
## [1] "The message conveyed from the standard deviation and the range share a similar message of their contrast"
boxplot(Urban)

boxplot(Farm)

Question 5: The following statistics are obtained from a sample of size 17

\(\sum_{i =1}^{17} x_i^2 = 197.91\) and sample mean \(\bar{x} = 3.342\). Compute the sample standard deviation.

n = 17
sum_sq = 197.91

xbar =  3.342

sum<-n*xbar

variance <- ((sum_sq - sum^2/n)/(n-1))

Std<- sqrt(variance)

Std
## [1] 0.7087671

Question 6: An article reported the following data on oxidation-induction time (min) for various commercial oils:

85 101 130 160 180 195 135 145 211 105 145 152 154 136 87 99 95 119 129

Oilmin<- c(85,  101,    130,    160,    180, 195,  135,  145,  211, 105,  145,  152,  154,      136,  87,  99,  95,  119,   129)

print("The sample variance of the oil data is")
## [1] "The sample variance of the oil data is"
var(Oilmin)
## [1] 1274.988
print("The standard deviation of the oil data is")
## [1] "The standard deviation of the oil data is"
sd(Oilmin)
## [1] 35.70698
print("I assume that to find the re-expressed in hours, I will divide the sample variance by 60, then recalculate the standard deviation")
## [1] "I assume that to find the re-expressed in hours, I will divide the sample variance by 60, then recalculate the standard deviation"
newvariance= 1274.988/60
newvariance
## [1] 21.2498
newstandard= sqrt(newvariance)
newstandard
## [1] 4.609751