The purpose of this analysis is to analyze the simple linear regression of Days (Y) and Index (X). We use simple linear regression to model the relationship between two variables: one independent variable, the predictor, and one dependent variable, the response. It assumes a linear relationship between the variables making it great for predictive analysis. Meteorologists care about this relationship between Days and Index because it helps them understand the relationship between various meteorological factors and the likelihood of precipitation.
Days<-c(91,105,106,108,88,91,58,82,81,65,61,48,61,43,33,36)
Index<-c(16.7,17.1,18.2,18.1,17.2,18.2,16.0,17.2,18.0,17.2,16.9,17.1,18.2,17.3,17.5,16.6)
model<-data.frame(Days,Index)
“Index” is going to be my x variable and “Days” is going to be my y variable and our sample size is 16.
plot(Days,Index)
The relationship between Days and Index does not show a linear relationship. The data is skewed everywhere and from what I can tell there are no outliers. The variability of this data is constant, you can see the plot shows no real pattern.
model<-lm(Days~Index)
summary(model)
##
## Call:
## lm(formula = Days ~ Index)
##
## Residuals:
## Min 1Q Median 3Q Max
## -41.70 -21.54 2.12 18.56 36.42
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.984 163.503 -1.180 0.258
## Index 15.296 9.421 1.624 0.127
##
## Residual standard error: 23.79 on 14 degrees of freedom
## Multiple R-squared: 0.1585, Adjusted R-squared: 0.09835
## F-statistic: 2.636 on 1 and 14 DF, p-value: 0.1267
Intercept: -192.984
Slope: 15.296
Standard Errors: Intercept Std Er=163.503, Slope Std Er=9.421
P-values: Intercept p-value=0.258, Slope p-value=0.127
R^2: 0.1585
The intercept represents the models predicted value of “Days” when “Index” is zero. The slope indicates how much “Days” is expected to change for each one-unit increase in “Index”. A positive slope means Days tends to increase as Index increases. The standard errors measure the variability in the estimated coefficients. The smaller standard error relative to the size of the coefficients suggest the estimates are more precise. The p-value tests whether the true slope is zero. A small p-value provides evidence that Index is significantly associated with Days in a linear way. And finally, the R^2 value represents the proportion of variability in Days that is explained by Index.
plot(model)
After fitting the simple linear regression model, these plots helped me asses whether the assumptions of linear regression are reasonably met.
In the Residuals vs Fitted plot, I was able to check the linearity assumption, I looked for a random scatter of residuals around zero.
In the Q-Q Plot, I was able to asses whether residuals are approximately normally distributed. Points falling close to the reference line indicate normality; systematic deviations suggest skewness or heavy tails.
In the Scale-Location plot is used to evaluate constant variance. A horizontal band of points suggest homoscedasticity, while a funnel shape indicates non-constant variance.
Finally, the Residuals vs Leverage plot helps to identify influential observations. Points with high leverage or large Cook’s distance may have a strong impact on the fitted model.
To test whether Index is a useful predictor of Days, I examined the hypothesis test on the slope from the regression output in RStudio. The summary of the fitted model provides the estimated slope, its standard error, the t‑statistic, and the p‑value.
Null hypothesis: B_1=0
Alternative hypothesis: B_1 does not equal zero
The p-value for the slope appears in the regression table under the “Pr(>t)” column for the index variable. If the p‑value is below 0.05, I conclude that the slope is statistically significant. This means the data provide evidence of a linear relationship between Index and Days. A statistically significant slope indicates that Index is a meaningful predictor of Days. In practical terms, changes in Index are associated with predictable changes in the number of Days, which can help the meteorologist make more informed forecasts.
predict(model,data=model,interval="confidence",level=0.95)
## fit lwr upr
## 1 62.46546 44.24504 80.68589
## 2 68.58401 54.90761 82.26042
## 3 85.41001 63.91295 106.90708
## 4 83.88038 63.97338 103.78737
## 5 70.11365 57.02842 83.19887
## 6 85.41001 63.91295 106.90708
## 7 51.75801 21.75791 81.75811
## 8 70.11365 57.02842 83.19887
## 9 82.35074 63.94915 100.75233
## 10 70.11365 57.02842 83.19887
## 11 65.52474 49.93042 81.11906
## 12 68.58401 54.90761 82.26042
## 13 85.41001 63.91295 106.90708
## 14 71.64328 58.85392 84.43265
## 15 74.70256 61.55896 87.84616
## 16 60.93583 41.22205 80.64961
Using the fitted model, I computed a 95% confidence interval for the mean response and a 95% prediction interval for an individual response at selected Index values. The CI gives a range of plausible values for the average Days for all observations with that Index value. The PI gives a range of plausible values for a single new observation at that Index value. Because individual observations vary more than the mean, the PI is always wider than the CI.
The analysis indicates that Index is a useful predictor of Days, supported by a statistically significant slope. The value of shows how much of the variation in Days is explained by Index. A higher indicates a stronger linear relationship. Based on the diagnostic plots, the assumptions of linearity, constant variance, and approximate normality of residuals appear reasonably satisfied. The fitted regression equation allows the meteorologist to estimate the expected number of Days for a given Index value, along with uncertainty ranges through confidence and prediction intervals. As with any simple linear model, predictions rely on the assumption of a linear relationship and stable variance. Extreme Index values or conditions not represented in the data may lead to less reliable predictions.