Chapter 7: Pratice homework.

7.23 Tourism spending.

The Association of Turkish Travel Agencies reports the number of foreign tourists visiting Turkey and tourist spending by year.20 Three plots are provided: scatterplot showing the relationship between these two variables along with the least squares fit, residuals plot, and histogram of residuals.

7.23

7.23

  1. Describe the relationship between number of tourists and spending.

Ans: The sctter plot shows the relation is linear and the correlation looks to be strong.

  1. What are the explanatory and response variables? Explanatory variable is ‘number of tourists’ response variable is ‘spending’

  2. Why might we want to fit a regression line to these data?

Scatter plot shows that there is linear relation, so inorder to predict spending for a new obserrvation, you are going fit a regression line.

  1. Do the data meet the conditions required for fitting a least squares line? In addition to the scatterplot, use the residual plot and histogram to answer this question.

Linear trend: The first plot shows the linear trend, so this condition is met. Residuals should be nearly normal: The histogram of the residuals is nearly normal so this condition is met.

7.25 The Coast Starlight, Part II.

Exercise 7.13 introduces data on the Coast Starlight Amtrak train that runs from Seattle to Los Angeles. The mean travel time from one stop to the next on the Coast Starlight is 129 mins, with a standard deviation of 113 minutes. The mean distance traveled from one stop to the next is 108 miles with a standard deviation of 99 miles. The correlation between travel time and distance is 0.636.

  1. Write the equation of the regression line for predicting travel time.

Slope is calcualted by slope = correlation * (sd_y/sd_x)

corr <- 0.636
sd_y <- 113
sd_x <- 99

m <- corr * (sd_y/sd_x)
m
## [1] 0.7259394

Now, we have slope. Lets write the equation.

y = mx + c

we need to find c. We have mean and the regression line of above slope will pass through the mean.

m_y <- 129
m_x <- 108

c <- m_y - m * m_x 
c
## [1] 50.59855

So, the final equation is

y = 0.72 * x + 50.6

  1. Interpret the slope and the intercept in this context.

The slope of the line predicts that it will require 0.726 minutes for each additional mile travelled.

  1. Calculate R2 of the regression line for predicting travel time from distance traveled for the Coast Starlight, and interpret R2 in the context of the application.
R2 <- round(corr^2,3)
R2
## [1] 0.404
  1. The distance between Santa Barbara and Los Angeles is 103 miles. Use the model to estimate the time it takes for the Starlight to travel between these two cities.
x_santa <- 103
y_santa <- 0.72 * x_santa + 50.6
y_santa
## [1] 124.76
  1. It actually takes the Coast Starlight about 168 mins to travel from Santa Barbara to Los Angeles. Calculate the residual and explain the meaning of this residual value.
y_actual <- 168
residual <- y_actual - y_santa
residual
## [1] 43.24

The residual is positive which means that the travel time is underestimated by the model

  1. Suppose Amtrak is considering adding a stop to the Coast Starlight 500 miles away from Los Angeles. Would it be appropriate to use this linear model to predict the travel time from Los Angeles to this point?

This would be treated an outlier and we may not be using linear model to predict response variable. Because 500 is away above the 2* sd where sd is 99.

7.29 Murders and poverty, Part I.

The following regression output is for predicting annual murders per million from percentage living in poverty in a random sample of 20 metropolitan areas.

7.29

7.29

7.29

7.29

  1. Write out the linear model.

y = 2.559 * x -29.901

  1. Interpret the intercept.

The intercept is -29.901. This value tells us that this model will predict negative crime when there is no poverty.

  1. Interpret the slope.

The slope of the line predicts that it will require 2.55 annual murders per million for each % increase in percentage living in poverty.

  1. Interpret R2.

R2 is 70.52%

this means 70.52% of response variable is contribued by the explanatory variable(percentage living in poverty)

  1. Calculate the correlation coefficient.

The correlation is .80 which means strong corelation.

correlation <- sqrt(.70)
correlation
## [1] 0.83666