1. Load the concrete data set from the Sets" shared dropbox folder. The measurements are in kg=m3 and Pa.

(a) Create (and turn in) a box plot of Water density in this sample.

ConcreteData <- read.csv("C:/Users/heinsenj/Dropbox/Class Info and shared files/Data Sets/ConcreteData.csv")
View(ConcreteData)
attach(ConcreteData)
summary(Water)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   121.8   164.9   185.0   181.6   192.0   247.0
boxplot(Water,horizontal=TRUE)

(b) What is the minimum measurement of Water density in this sample?

Minimum: 121.8

(c) What is the average (mean) measurement of Water density in this sample?

Mean: 181.6

(d) What is the interquartile range of Water density?

Interquartile range: 27.1

(e) How big is the range between the 10th and 90 percentile for Water density?

quantile(Water,c(0.1,0.9))
##   10%   90% 
## 154.6 203.5

Range: 48.9

(f) What is the sample standard deviation for Water density?

sd(Water)
## [1] 21.35422

(g) Create (and turn in) a scatter plot where the Y axis is compressive strength and the X axis is water density.

head(cbind(Water,CompressiveStrength))
##      Water CompressiveStrength
## [1,]   162               79.99
## [2,]   162               61.89
## [3,]   228               40.27
## [4,]   228               41.05
## [5,]   192               44.30
## [6,]   228               47.03
plot(Water,CompressiveStrength,main="Water vs Compressive Strength",ylab="Compressive Strength",xlab="Water")

(h) Find the correlation between water density and compressive strength.

cor(CompressiveStrength,Water)
## [1] -0.2896334

(i) What is the equation of the least squares line that predicts compressive strength (Y) as a function of water density (X)?

lm(CompressiveStrength~Water)
## 
## Call:
## lm(formula = CompressiveStrength ~ Water)
## 
## Coefficients:
## (Intercept)        Water  
##     76.9583      -0.2266

CompressiveStrength = - 0.2266(Water) + 76.9583

(j) Using that line, what would you predict is the compressive strength of a batch of concrete that has 130kg of water per cubic meter?

-0.2266*130+76.9583
## [1] 47.5003

2.Consider the following frequency table which summarizes the number of years that employees have worked at a company.

YearsOfService <- c("1-5","6-10","11-15","16-20")
Frequency <- c(6,24,31,4)
head(cbind(YearsOfService, Frequency))
##      YearsOfService Frequency
## [1,] "1-5"          "6"      
## [2,] "6-10"         "24"     
## [3,] "11-15"        "31"     
## [4,] "16-20"        "4"

(a) How many people work at the company?

sum(Frequency)
## [1] 65

(b) Explain why you cannot calculate the exact mean number of years worked. What can you say about the possible values for the mean number of years worked?

It is not possible to obtain the mean from the data collected because values are given as ranges and the distribution of the frequency can not be determined. The possible values for the mean will be between 8.54 and 12.54.

(c) What is the smallest possible median number of years worked?

Median Minimum: 11

3. Consider the following frequency table which summarizes the number of pets that employees at a company have.

NumberOfPets <- c(0,1,2,3,"4+")
Frequency <- c(6,24,31,4,0)
head(cbind(NumberOfPets, Frequency))
##      NumberOfPets Frequency
## [1,] "0"          "6"      
## [2,] "1"          "24"     
## [3,] "2"          "31"     
## [4,] "3"          "4"      
## [5,] "4+"         "0"

(a) How many people work at the company?

sum(Frequency)
## [1] 65

(b) Calculate the mean number of pets that employees have.

(0*6+1*24+2*31+3*4+4*0)/sum(Frequency)
## [1] 1.507692

(c) Calculate the median number of pets that employees have.

Median: 2

(d) If instead of 0, there was 1 person who had +" pets, could you still calculate the mean and/or the media?

Not the mean because 4+ could be 4 or more but the median would still be the same value as above.

4. For each question, write down a list of 5 numbers that have the following property.(Include R-code to show that you are correct.)

(a) The median equals the mean.

AA<-c(1,2,3,4,5)
summary(AA)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1       2       3       3       4       5

(b) The median is larger than the mean.

AB<-c(1,2,4,4,5)
summary(AB)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     1.0     2.0     4.0     3.2     4.0     5.0

(c) The mean is 50 and the sample standard deviation is at least 30.

AC<-c(1,40,50,60,99)
summary(AC)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1      40      50      50      60      99
sd(AC)
## [1] 35.36241

(d) The sample standard deviation is zero.

AD<-c(1,1,1,1,1)
summary(AD)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1       1       1       1       1       1
sd(AD)
## [1] 0

(e) The standard deviation is larger than the mean.

AE<-c(1,40,50,60,999)
summary(AE)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1      40      50     230      60     999
sd(AE)
## [1] 430.4654

5. The probability that a certain car airbag is defective is 1/1000. You have designed a test to find defective airbags. If the airbag is actually defective, then your test warns you of the defect with probability 999/1000. On the other hand, if the airbag is not defective, then your test incorrectly warns you of the defect anyway with probability 1/100.

(a) You pick a random airbag. What is the probability that it is both defective, and that your test will warn you of the defect.

P(Defective and Warns): 0.999/10000

(b) You pick a random airbag. What is the probability that your test will warn you of a defect (real or not)?

P(Warns): 100.989/10000

(c) You test a random airbag, and your test warns you that it is defective. With that information, what is the probability that it actually is defective?

P(Defective|Warns): 999/100989