1 Introduction

The purpose of this analysis is to analyze the simple linear regression of Days (Y) and Index (X). We use simple linear regression to model the relationship between two variables: one independent variable, the predictor, and one dependent variable, the response. It assumes a linear relationship between the variables making it great for predictive analysis. Meteorologists care about this relationship between Days and Index because it helps them understand the relationship between various meteorological factors and the likelihood of precipitation.

Days<-c(91,105,106,108,88,91,58,82,81,65,61,48,61,43,33,36)
Index<-c(16.7,17.1,18.2,18.1,17.2,18.2,16.0,17.2,18.0,17.2,16.9,17.1,18.2,17.3,17.5,16.6)
model<-data.frame(Days,Index)

“Index” is going to be my x variable and “Days” is going to be my y variable and our sample size is 16.

2 Scatterplot

plot(Days,Index)

The relationship between Days and Index does not show a linear relationship. The data is skewed everywhere and from what I can tell there are no outliers. The variability of this data is constant, you can see the plot shows no real pattern.

3 Regression Model

model<-lm(Days~Index)
summary(model)
## 
## Call:
## lm(formula = Days ~ Index)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -41.70 -21.54   2.12  18.56  36.42 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept) -192.984    163.503  -1.180    0.258
## Index         15.296      9.421   1.624    0.127
## 
## Residual standard error: 23.79 on 14 degrees of freedom
## Multiple R-squared:  0.1585, Adjusted R-squared:  0.09835 
## F-statistic: 2.636 on 1 and 14 DF,  p-value: 0.1267

Intercept: -192.984

Slope: 15.296

Standard Errors: Intercept Std Er=163.503, Slope Std Er=9.421

P-values: Intercept p-value=0.258, Slope p-value=0.127

R^2: 0.1585

The intercept represents the models predicted value of “Days” when “Index” is zero. The slope indicates how much “Days” is expected to change for each one-unit increase in “Index”. A positive slope means Days tends to increase as Index increases. The standard errors measure the variability in the estimated coefficients. The smaller standard error relative to the size of the coefficients suggest the estimates are more precise. The p-value tests whether the true slope is zero. A small p-value provides evidence that Index is significantly associated with Days in a linear way. And finally, the R^2 value represents the proportion of variability in Days that is explained by Index.

4 Model Adequacy Checks

plot(model)

After fitting the simple linear regression model, these plots helped me asses whether the assumptions of linear regression are reasonably met.

In the Residuals vs Fitted plot, I was able to check the linearity assumption, I looked for a random scatter of residuals around zero.

In the Q-Q Plot, I was able to asses whether residuals are approximately normally distributed. Points falling close to the reference line indicate normality; systematic deviations suggest skewness or heavy tails.

In the Scale-Location plot is used to evaluate constant variance. A horizontal band of points suggest homoscedasticity, while a funnel shape indicates non-constant variance.

Finally, the Residuals vs Leverage plot helps to identify influential observations. Points with high leverage or large Cook’s distance may have a strong impact on the fitted model.

5 Hypothesis Test

To test whether Index is a useful predictor of Days, I examined the hypothesis test on the slope from the regression output in RStudio. The summary of the fitted model provides the estimated slope, its standard error, the t‑statistic, and the p‑value.

Null hypothesis: B_1=0

Alternative hypothesis: B_1 does not equal zero

The p-value for the slope appears in the regression table under the “Pr(>t)” column for the index variable. If the p‑value is below 0.05, I conclude that the slope is statistically significant. This means the data provide evidence of a linear relationship between Index and Days. A statistically significant slope indicates that Index is a meaningful predictor of Days. In practical terms, changes in Index are associated with predictable changes in the number of Days, which can help the meteorologist make more informed forecasts.

6 Confidence and Prediction Intervals

predict(model,data=model,interval="confidence",level=0.95)
##         fit      lwr       upr
## 1  62.46546 44.24504  80.68589
## 2  68.58401 54.90761  82.26042
## 3  85.41001 63.91295 106.90708
## 4  83.88038 63.97338 103.78737
## 5  70.11365 57.02842  83.19887
## 6  85.41001 63.91295 106.90708
## 7  51.75801 21.75791  81.75811
## 8  70.11365 57.02842  83.19887
## 9  82.35074 63.94915 100.75233
## 10 70.11365 57.02842  83.19887
## 11 65.52474 49.93042  81.11906
## 12 68.58401 54.90761  82.26042
## 13 85.41001 63.91295 106.90708
## 14 71.64328 58.85392  84.43265
## 15 74.70256 61.55896  87.84616
## 16 60.93583 41.22205  80.64961

Using the fitted model, I computed a 95% confidence interval for the mean response and a 95% prediction interval for an individual response at selected Index values. The CI gives a range of plausible values for the average Days for all observations with that Index value. The PI gives a range of plausible values for a single new observation at that Index value. Because individual observations vary more than the mean, the PI is always wider than the CI.

7 Conclusion

The analysis indicates that Index is a useful predictor of Days, supported by a statistically significant slope. The value of shows how much of the variation in Days is explained by Index. A higher indicates a stronger linear relationship. Based on the diagnostic plots, the assumptions of linearity, constant variance, and approximate normality of residuals appear reasonably satisfied. The fitted regression equation allows the meteorologist to estimate the expected number of Days for a given Index value, along with uncertainty ranges through confidence and prediction intervals. As with any simple linear model, predictions rely on the assumption of a linear relationship and stable variance. Extreme Index values or conditions not represented in the data may lead to less reliable predictions.