The Association of Turkish Travel Agencies reports the number of foreign tourists visiting Turkey and tourist spending by year.20 Three plots are provided: scatterplot showing the relationship between these two variables along with the least squares fit, residuals plot, and histogram of residuals.
7.23
Ans: The sctter plot shows the relation is linear and the correlation looks to be strong.
What are the explanatory and response variables? Explanatory variable is ‘number of tourists’ response variable is ‘spending’
Why might we want to fit a regression line to these data?
Scatter plot shows that there is linear relation, so inorder to predict spending for a new obserrvation, you are going fit a regression line.
Linear trend: The first plot shows the linear trend, so this condition is met. Residuals should be nearly normal: The histogram of the residuals is nearly normal so this condition is met.
7.25 The Coast Starlight, Part II.
Exercise 7.13 introduces data on the Coast Starlight Amtrak train that runs from Seattle to Los Angeles. The mean travel time from one stop to the next on the Coast Starlight is 129 mins, with a standard deviation of 113 minutes. The mean distance traveled from one stop to the next is 108 miles with a standard deviation of 99 miles. The correlation between travel time and distance is 0.636.
Slope is calcualted by slope = correlation * (sd_y/sd_x)
corr <- 0.636
sd_y <- 113
sd_x <- 99
m <- corr * (sd_y/sd_x)
m
## [1] 0.7259394
Now, we have slope. Lets write the equation.
y = mx + c
we need to find c. We have mean and the regression line of above slope will pass through the mean.
m_y <- 129
m_x <- 108
c <- m_y - m * m_x
c
## [1] 50.59855
So, the final equation is
y = 0.72 * x + 50.6
The slope of the line predicts that it will require 0.726 minutes for each additional mile travelled.
R2 <- round(corr^2,3)
R2
## [1] 0.404
x_santa <- 103
y_santa <- 0.72 * x_santa + 50.6
y_santa
## [1] 124.76
y_actual <- 168
residual <- y_actual - y_santa
residual
## [1] 43.24
The residual is positive which means that the travel time is underestimated by the model
This would be treated an outlier and we may not be using linear model to predict response variable. Because 500 is away above the 2* sd where sd is 99.
The following regression output is for predicting annual murders per million from percentage living in poverty in a random sample of 20 metropolitan areas.
7.29
7.29
y = 2.559 * x -29.901
The intercept is -29.901. This value tells us that this model will predict negative crime when there is no poverty.
The slope of the line predicts that it will require 2.55 annual murders per million for each % increase in percentage living in poverty.
R2 is 70.52%
this means 70.52% of response variable is contribued by the explanatory variable(percentage living in poverty)
The correlation is .80 which means strong corelation.
correlation <- sqrt(.70)
correlation
## [1] 0.83666