Quiz 3: Basic R Programming

Project Expectations

The following are the key expecations from this project:

Familiarize basice R functions and data types
Calculate measure of location (central tendency) such as sample mean, median, quartiles
Calculate Range, Interquartile range, variance, standard deviation
Draw histograms, boxplots

Question 1: A tire manufacturer wants to determine the inner diameter of a certain grade of tire. Ideally, the diameter would be 570 mm. The data are as follows:

572 572 573 568 569 575 565 570

Compute the sample mean and median.

data<- c(572, 572, 573, 568, 569, 575, 565, 570)

data

## [1] 572 572 573 568 569 575 565 570

print("mean")

## [1] "mean"

mean(data)

## [1] 570.5

Compute the sample variance, standard deviation and range.

data<- c(572, 572, 573, 568, 569, 575, 565, 570)

data

## [1] 572 572 573 568 569 575 565 570

print("The sample variance of the data provided is")

## [1] "The sample variance of the data provided is"

var(data)

## [1] 10

data

## [1] 572 572 573 568 569 575 565 570

print("The standard deviation of the data provided is")

## [1] "The standard deviation of the data provided is"

sd(data)

## [1] 3.162278

data

## [1] 572 572 573 568 569 575 565 570

print("The range of the data provided is")

## [1] "The range of the data provided is"

max(data)-min(data)

## [1] 10

Using the calculated statistics in parts (a) and (b), can you comment on the quality of the tires?
The Quality of the tires is in good standard based off of the statistics calculated. There is a small window of error in the tires based off of the standard deviation calculated.

*Question 2: Given the following sample of fuel efficiencies of 6-cylinder vehicles (in miles/gallon)**

21.0, 15.0, 21.0, 21.4, 18.1, 19.2, 17.8, 19.7, 13.0, 35.0

Compute the sample range, variance(\(s^2\)), and standard deviation(\(s\)).

mpg<- c(21.0, 15.0, 21.0, 21.4, 18.1, 19.2, 17.8, 19.7, 13.0, 35.0)

mpg

##  [1] 21.0 15.0 21.0 21.4 18.1 19.2 17.8 19.7 13.0 35.0

print("The sample range of the data is")

## [1] "The sample range of the data is"

max(mpg)-min(mpg)

## [1] 22

Give the five number summary (minimum, first quartile, median, third quartile, maximum).

mpg

##  [1] 21.0 15.0 21.0 21.4 18.1 19.2 17.8 19.7 13.0 35.0

print("The minimum is")

## [1] "The minimum is"

min(mpg)

## [1] 13

print("The first quartile from the data is")

## [1] "The first quartile from the data is"

quantile(mpg, .25)

##    25% 
## 17.875

print("The median of the data is")

## [1] "The median of the data is"

median(mpg)

## [1] 19.45

print("The third quartile of the data is")

## [1] "The third quartile of the data is"

quantile(mpg, .75)

## 75% 
##  21

print("The maximum is")

## [1] "The maximum is"

max(mpg)

## [1] 35

print("Though given manually, the simple way to do this is by using the fivenum code which is provided below")

## [1] "Though given manually, the simple way to do this is by using the fivenum code which is provided below"

fivenum(mpg)

## [1] 13.00 17.80 19.45 21.00 35.00

Compute the interquartile range (\(IQR\)).

mpg

##  [1] 21.0 15.0 21.0 21.4 18.1 19.2 17.8 19.7 13.0 35.0

print("The interquartile range of the data is")

## [1] "The interquartile range of the data is"

IQR(mpg)

## [1] 3.125

Compute mild and extreme outliers boundaries. Identify, if any, mild & extreme outliers in the sample data. To find the outliers via Rstudio, I will use my calculations for Q1, Q3, and my IQR and add it to the respective product for both mild and extreme outliers.

mpg

##  [1] 21.0 15.0 21.0 21.4 18.1 19.2 17.8 19.7 13.0 35.0

print("The mild lower and upper bounds respectibly are")

## [1] "The mild lower and upper bounds respectibly are"

(17.875-(1.5*3.125))

## [1] 13.1875

(21+(1.5*3.125))

## [1] 25.6875

print("The extreme lower and upper bounds respectibly are")

## [1] "The extreme lower and upper bounds respectibly are"

(17.875-(3.5*3.125))

## [1] 6.9375

(21+(3.5*3.125))

## [1] 31.9375

print("Therefore, the only outlier we have is 13 and 35")

## [1] "Therefore, the only outlier we have is 13 and 35"

Construct a box plot for the sample.

print("The boxplot of the mpg data is")

## [1] "The boxplot of the mpg data is"

boxplot(mpg)

Question 3: The data for blood mercury concentration \((\mu_g/g)\) for adult females near contaminated rivers in Virginia are as following

0.20 , 0.22 , 0.25 , 0.30, 0.34 , 0.41 , 0.55 , 0.56 , 1.42 , 1.70, 1.83 , 2.20 , 2.25, 3.07 , 3.25

Calculate the sample variance(\(s^2\)) and sample standard deviation(\(s\)).

bmerc<- c(0.20 , 0.22 ,  0.25 , 0.30,  0.34 , 0.41 , 0.55 , 0.56 , 1.42 , 1.70,  1.83 , 2.20 , 2.25,  3.07 , 3.25)



bmerc

##  [1] 0.20 0.22 0.25 0.30 0.34 0.41 0.55 0.56 1.42 1.70 1.83 2.20 2.25 3.07 3.25

print("The sample variance of the blood mercury data is")

## [1] "The sample variance of the blood mercury data is"

var(bmerc)

## [1] 1.167552

print("The sample standard deviation of the blood mercury data is")

## [1] "The sample standard deviation of the blood mercury data is"

sd(bmerc)

## [1] 1.080533

Give the five number summary and construct a box plot.

bmerc

##  [1] 0.20 0.22 0.25 0.30 0.34 0.41 0.55 0.56 1.42 1.70 1.83 2.20 2.25 3.07 3.25

print("The five number summary of the blood mercury data is")

## [1] "The five number summary of the blood mercury data is"

fivenum(bmerc)

## [1] 0.200 0.320 0.560 2.015 3.250

print("The box plot of the blood mercury data is")

## [1] "The box plot of the blood mercury data is"

boxplot(bmerc)

Compute mild and extreme outliers(if they exists).

By using the data from the fivenum summary, we can apply the formulas to find the outliers

print("Here we show the five number summary for the blood mercury data found in the previous problem")

## [1] "Here we show the five number summary for the blood mercury data found in the previous problem"

fivenum(bmerc)

## [1] 0.200 0.320 0.560 2.015 3.250

print("From the five number summary, we know that Q1 is .32 and Q3 is 2.015. Now, we will find the IQR and then apply the formulas to find the outliers" )

## [1] "From the five number summary, we know that Q1 is .32 and Q3 is 2.015. Now, we will find the IQR and then apply the formulas to find the outliers"

print("The IQR of the blood mercury is")

## [1] "The IQR of the blood mercury is"

IQR(bmerc)

## [1] 1.695

print("The mild lower and upper bounds respectibly are")

## [1] "The mild lower and upper bounds respectibly are"

(.32-(1.5*1.695))

## [1] -2.2225

(2.015+(1.5*1.695))

## [1] 4.5575

print("The extreme lower and upper bounds respectibly are")

## [1] "The extreme lower and upper bounds respectibly are"

(.32-(3.5*1.695))

## [1] -5.6125

(2.015+(3.5*1.695))

## [1] 7.9475

print("Therefore there is no outliers for this data")

## [1] "Therefore there is no outliers for this data"

Question 4: Exposure to microbial products, especially endotoxin, may have an impact on vulnerability to allergic diseases. The following are data on concentration (EU/mg) in settled dust for one sample of urban homes and another of farm homes

Urban: 6.0 5.0 11.0 33.0 4.0 5.0 86.0 18.0 35.0 17.0 23.0

Farm: 4.0 14.0 11.0 9.0 9.0 8.0 4.0 20.0 5.0 8.9 21.0 9.2 3.0 2.0 0.3

Calculate the sample standard deviation for each sample, interpret theses values, and then contrast variability in the two samples.

Urban<- c(6.0,  5.0,    11.0,   33.0,   4.0,    5.0,    86.0,   18.0,   35.0,   17.0,   23.0)

Farm<- c(4.0,   14.0,   11.0,   9.0,    9.0,    8.0,    4.0,    20.0,   5.0,    8.9,    21.0,   9.2,    3.0,    2.0,    0.3)

Urban

##  [1]  6  5 11 33  4  5 86 18 35 17 23

print("The standard deviation of the Urban sample is")

## [1] "The standard deviation of the Urban sample is"

sd(Urban)

## [1] 23.88914

Farm

##  [1]  4.0 14.0 11.0  9.0  9.0  8.0  4.0 20.0  5.0  8.9 21.0  9.2  3.0  2.0  0.3

print("The standard deviation of the Farm sample is")

## [1] "The standard deviation of the Farm sample is"

sd(Farm)

## [1] 6.087669

print("The Urban standard deviation shows that the data is spread out over a larger range while the Farm standard deviation shows that the data are all close together and close to the mean. From the Urban and Farm data, we can see that the Urban area has less of a consistency of dust samples than the Farm area. ")

## [1] "The Urban standard deviation shows that the data is spread out over a larger range while the Farm standard deviation shows that the data are all close together and close to the mean. From the Urban and Farm data, we can see that the Urban area has less of a consistency of dust samples than the Farm area. "

Compute the inter quartile range of each sample and compare. Do the interquartile range convey the same message as about variability that the standard deviation?

print("The interquartile range of the Urban data is")

## [1] "The interquartile range of the Urban data is"

IQR(Urban)

## [1] 22.5

print("The interquartile range of the Farm data is")

## [1] "The interquartile range of the Farm data is"

IQR(Farm)

## [1] 6.1

print("The message conveyed from the standard deviation and the range share a similar message of their contrast")

## [1] "The message conveyed from the standard deviation and the range share a similar message of their contrast"

Construct a comparative box plots for the two samples.

boxplot(Urban)

boxplot(Farm)

Question 5: The following statistics are obtained from a sample of size 17

\(\sum_{i =1}^{17} x_i^2 = 197.91\) and sample mean \(\bar{x} = 3.342\). Compute the sample standard deviation.

n = 17
sum_sq = 197.91

xbar =  3.342

sum<-n*xbar

variance <- ((sum_sq - sum^2/n)/(n-1))

Std<- sqrt(variance)

Std

## [1] 0.7087671

Question 6: An article reported the following data on oxidation-induction time (min) for various commercial oils:

85 101 130 160 180 195 135 145 211 105 145 152 154 136 87 99 95 119 129

Calculate the sample variance and standard deviation

Oilmin<- c(85,  101,    130,    160,    180, 195,  135,  145,  211, 105,  145,  152,  154,      136,  87,  99,  95,  119,   129)

print("The sample variance of the oil data is")

## [1] "The sample variance of the oil data is"

var(Oilmin)

## [1] 1274.988

print("The standard deviation of the oil data is")

## [1] "The standard deviation of the oil data is"

sd(Oilmin)

## [1] 35.70698

If the observations were re-expressed in hours, what would be the resulting values of the sample variance and sample standard deviation? Answer without actually performing the re expression(Round your answer to three decimal places).

print("I assume that to find the re-expressed in hours, I will divide the sample variance by 60, then recalculate the standard deviation")

## [1] "I assume that to find the re-expressed in hours, I will divide the sample variance by 60, then recalculate the standard deviation"

newvariance= 1274.988/60
newvariance

## [1] 21.2498

newstandard= sqrt(newvariance)
newstandard

## [1] 4.609751

Quiz 3: Basic R Programming

Project Overview

Project Expectations