Data 606 Homework 7

Practice: 7.23, 7.25, 7.29, 7.39 Graded: 7.24, 7.26, 7.30, 7.40

7.23 a) There is a strong, positive assosication between tourist and spending. b) explanatory: tourist response: spending c) We want to fit a regression line to this data to see if we should be encouraging more tourism, how much money we should expect with a rise in tourism, and how much tourist will be spending. d) The data shows linearity but the histogram and the residual plot shows a non-linear relationshop between the spending and tourism.

7.24 a) There is a weak, positive associationn between number of calories and the amount of carbs in the starbucks food menu. b) explanatory: calories response: carbs c) We want to fit a regression line to this data to see if we know the calories we can predict the amount of carbs in starbucks food. d) the data shows linearity, the residual plot shows no obvious pattern, the histogram does not have a normal distribution.

7.25 a)

meantravel <- 129
sdtravel <- 113
meanstop <- 108
sdstop <- 99
correlation <- .636

b1 <- (sdtravel/sdstop)*correlation
b0 <- meantravel - b1 * meanstop
b0
## [1] 50.59855
b1
## [1] 0.7259394

equation is y = b0 + b1 x distance b) For each extra mile in distance, there will be about .72 minutes of travel. c)

R2 = (correlation^2)
R2
## [1] 0.404496

40% variability in travel time is accounted for in this model d)

traveltime <- b0 + b1 * 103
traveltime
## [1] 125.3703

about 125 minutes e)

i <- 103
yi <- 168
ei <- yi - traveltime
ei
## [1] 42.6297

A postive residual means that the model underestimates the travel time. f) No, not enough information

7.26 a)

meanshoulder <- 107.20
sdshoulder <- 10.37
meanheight <- 171.14
sdheight <- 9.41
corr <- .67

b1 <- (sdheight/sdshoulder)*corr
b0 <- meanheight - b1 * meanshoulder
b0
## [1] 105.9651
b1
## [1] 0.6079749

equation is: y = 105.96 + .608 * x b) For every cm of increased height, the cm increase of shoulder girth. c)

r2 <- (corr^2)
r2
## [1] 0.4489

the model explains about 45% of the variation in heights. d)

student <- b0 + b1 * 100
student
## [1] 166.7626

We predict that the student will be 166 centimeters high. e)

i <- 100
yi <- 160
ei <- yi - student
ei
## [1] -6.762581

The residual is negative which the model overestimated the height of the student. f) Not enough information to predict.

7.29 a)

yhat = -29.901 + 2.559 * x

  1. the intercept will be at -29.901, which tells us that this model will predict areas with no poverty to have a negative murder rate. This is not realistic.
  2. The slope tells us that for each percentage increase in poverty we can an average of 2.559 million murders.
  3. The r^2 value is 70.52% which tells us that 68.89 of the variation in murder rate.
sqrt(.7052)
## [1] 0.8397619

7.30 a)

yhat = -0.357 + 4.034 * x

  1. the intercept will be at -0.357, which tell us that this modelt will predict a negative weight for the cat’s heart when the body weight is at zero.
  2. the slope is 4.034 which means that the weight of the cat’s heart will increase 4.034 grams for every 1 kg of cat body weight increase
  3. R^2 is 64.66% which tells us that the linear model estimates a 64.66% in variation in cat heart weight
sqrt(.6466)
## [1] 0.8041144

7.39 a) the correlation is .529. based on the graph we can tell that it is negative.

sqrt(.28)
## [1] 0.5291503

b)There seems to be no pattern in the data. Is it not appropriate to apply the least sqaures fit to this data.

7.40 a)

b0 <- 4.010

x <- -0.0883
y <- 3.9983

b1 <- (y - b0) / x
b1
## [1] 0.1325028

The slope of this case is .1325

  1. Since the slope is positive, we can assume that the relationship is positive. We can set up a hypothesis test to where H0: b1 = 0 and HA: b1 > 0 We look at the p-values which are close to zero, which leads us to reject the null hypothesis. It appears that there is a relationship between beauty and teaching evaluation.
  2. linerarity: there is a weak, positive linear relationship. There is correlation coefficient or R^2, we accept that this condition is met.

nearly normal residuals: based on the histogram, it appears that the residual distibution is nearly normal.

constant variability: the scatterplot data does not appear to have a constant variable.

independent observations: Observations are assumed to be independent.