setwd("/Users/ginaocchipinti/Documents/ADEC 7310  Data Analytics/Week 6") 

# then pull up the data from our WD and assign it to variables resorts and snow for merging
resorts <- read.csv("resorts.csv")
snow <- read.csv("snow.csv")

#apply the merge function
resort_snowfall <- merge(resorts, snow, by = c("ID", "ID"))

price <- resort_snowfall$Price
total_lifts <- resort_snowfall$Total.lifts
  1. I am going to continue using the data set I used previously, the merged data from snow and resorts, in particular the price and total lifts variables.

A. Tell us what are the dependent and independent variable.

Dependent variable: price, where price is influenced by various factors as ski resorts, such as the total number of lifts. This is the value that we want to predict using total lifts.

Independent variable: The independent variable is total lifts, which is what we are assessing as influence price. Total lifts itself is not influenced. This is the predictor variable we are using to explain variation in price.

Our estimation equation:

\[ Y_i \sim X_i\beta_0 + X_i\beta_1 + \epsilon_i\]

Where:

B. Estimate the linear regression in R using the lm() command.

reg1 <- lm(price ~ total_lifts, data = resort_snowfall)
summary(reg1)
## 
## Call:
## lm(formula = price ~ total_lifts, data = resort_snowfall)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -50.877 -12.306  -3.740   4.824  92.301 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 46.60689    1.28690  36.216   <2e-16 ***
## total_lifts  0.08715    0.03499   2.491   0.0131 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21.61 on 497 degrees of freedom
## Multiple R-squared:  0.01233,    Adjusted R-squared:  0.01034 
## F-statistic: 6.205 on 1 and 497 DF,  p-value: 0.01306

C. Interpret the slope and intercept parameters

The slope represents the change in price for 1 unit change in total lifts. In this case, the slope is 0.08715, so for every increase in the number of lifts, the price increases by 0.08 Euros.

The intercept represents the value of price when total lifts is zero. So when total lifts is zero, price is 46.61 Euros.

D. Replicate the slope and intercept parameter using the covariance/variance formulas

#slope 

slope <- cov(total_lifts, price)/var(total_lifts)
print(slope)
## [1] 0.08715285
#intercept

beta_0 <- mean(price) - mean(total_lifts) * slope
print(beta_0)
## [1] 46.60689