This paper provides a summary of the key points and R codes from Lessons 6 to 8 of the swirl course on Regression Models. These lessons are a continuation of Lesson 5, providing different kinds of examples for Multivariable Regression.
Lesson 6: MultiVar Examples
In Lesson 6 of Regression Analysis, the Swiss dataset is used to explore multivariable regression models. The dataset contains six demographic variables from 47 Swiss provinces, with Fertility as the dependent variable. Key R functions include lm() for creating linear models, summary() for analyzing these models, and cor() for calculating correlations between variables. The lesson highlights how correlations (e.g., between Agriculture and Education) can affect regression coefficients, leading to changes in sign depending on which variables are included in the model. The makelms() function demonstrates this by showing how adding variables shifts coefficients, particularly for Agriculture. Additionally, adding a redundant variable (ec), which combines Examination and Catholic, illustrates that such additions do not alter the model’s coefficients, emphasizing the importance of independent variables in regression. This lesson provides a comprehensive understanding of multicollinearity and the effects of variable inclusion in regression analysis.
Lesson 7: MultiVar Examples2
In this lesson, we tackled regression models with more than one independent variable. Using R commands, we explored some properties of the InsectSprays data. We then considered multilevel factor levels and how to interpret linear models of data with more than 2 factors.
The equation representing the relationship between a particular outcome and several factors contains binary variables. Using the lm function, we generated the linear model in which count is the dependent variable and spray is the independent variable. The estimates of this model represent the coefficients of the binary variables associated with sprays. Also, omitting the intercept affects the model. The R function relevel refits the model with a different reference group. It is important to understand how different reference groups affect the interpretation of the results.
Lesson 8: MultiVar Examples3
This lesson covers the use of multiple regression models in R to analyze hunger rates among children under 5, using WHO data. The dataset, contained in the hunger data frame, includes variables such as “Year,” “Sex,” and “Numeric,” which represents the percentage of underweight children. The lesson begins by building a simple linear regression model (lm(hunger$Numeric ~ hunger$Year)) that shows a decrease in hunger rates over time, evidenced by a negative coefficient for Year. Separate models for male and female children reveal that, while both genders experience a decline, their initial hunger rates differ.
The analysis is then extended to a multiple regression model that includes both Year and Sex as predictors (lm(hunger$Numeric ~ hunger$Year + hunger$Sex)), highlighting differences in intercepts between genders. To capture the interaction between Year and Sex, an interaction model (lm(hunger$Numeric ~ hunger$Year + hunger$Sex + hunger$Year * hunger$Sex)) is introduced, showing that the decline in hunger rates is slightly steeper for males. The lesson concludes with a discussion on interpreting interactions between continuous predictors, providing insights into how multiple variables can influence outcomes in complex regression models.