setwd("/Users/ginaocchipinti/Documents/ADEC 7310 Data Analytics/Week 6")
# then pull up the data from our WD and assign it to variables resorts and snow for merging
resorts <- read.csv("resorts.csv")
snow <- read.csv("snow.csv")
#apply the merge function
resort_snowfall <- merge(resorts, snow, by = c("ID", "ID"))
price <- resort_snowfall$Price
total_lifts <- resort_snowfall$Total.lifts
A. Tell us what are the dependent and independent variable.
Dependent variable: price, where price is influenced by various factors as ski resorts, such as the total number of lifts. This is the value that we want to predict using total lifts.
Independent variable: The independent variable is total lifts, which is what we are assessing as influence price. Total lifts itself is not influenced. This is the predictor variable we are using to explain variation in price.
Our estimation equation:
\[ Y_i \sim X_i\beta_0 + X_i\beta_1 + \epsilon_i\]
Where:
\(Y_i\) is the dependent variable, Price
\(X_i\) is the independent variable, Total Lifts
\(\beta_0\) is the y-intercept, representing the predicted value of \(Y_i\) when \(X_i\) = 0
\(\beta_1\) is the slope of the regression line representing the change in \(Y_i\) for a one-unit change in \(X_i\)
\(\epsilon_i\) represents the error term for the ith observation, this accounts for variability in \(Y_i\) not explained by the relationship with \(X_i\)
B. Estimate the linear regression in R using the lm() command.
reg1 <- lm(price ~ total_lifts, data = resort_snowfall)
summary(reg1)
##
## Call:
## lm(formula = price ~ total_lifts, data = resort_snowfall)
##
## Residuals:
## Min 1Q Median 3Q Max
## -50.877 -12.306 -3.740 4.824 92.301
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.60689 1.28690 36.216 <2e-16 ***
## total_lifts 0.08715 0.03499 2.491 0.0131 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 21.61 on 497 degrees of freedom
## Multiple R-squared: 0.01233, Adjusted R-squared: 0.01034
## F-statistic: 6.205 on 1 and 497 DF, p-value: 0.01306
C. Interpret the slope and intercept parameters
The slope represents the change in price for 1 unit change in total lifts. In this case, the slope is 0.08715, so for every increase in the number of lifts, the price increases by 0.08 Euros.
The intercept represents the value of price when total lifts is zero. So when total lifts is zero, price is 46.61 Euros.
D. Replicate the slope and intercept parameter using the covariance/variance formulas
#slope
slope <- cov(total_lifts, price)/var(total_lifts)
print(slope)
## [1] 0.08715285
#intercept
beta_0 <- mean(price) - mean(total_lifts) * slope
print(beta_0)
## [1] 46.60689