library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggrepel)

msleep <- read.csv("C:/Users/ABHIRAM/Downloads/msleep.csv")

Let’s consider building a linear model to predict the amount of REM sleep (“sleep_rem”) based on some other variables in the dataset.

Selecting Response and Explanatory Variables:

Response Variable:

“sleep_rem” (Amount of REM sleep in hours)

Explanatory Variables:

“sleep_total” (Total sleep time in hours) “awake” (Amount of time awake in hours) “brainwt” (Brain weight in kg) “bodywt” (Body weight in kg)

We can use these variables to build our linear model.

Now, let’s proceed to build the linear model using R:

chooseCRANmirror(ind=1)

install.packages("lmtest")
## Installing package into 'C:/Users/ABHIRAM/AppData/Local/R/win-library/4.3'
## (as 'lib' is unspecified)
## package 'lmtest' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\ABHIRAM\AppData\Local\Temp\RtmpSAO95e\downloaded_packages
# Loading the required libraries
library(dplyr)
library(lmtest)
## Warning: package 'lmtest' was built under R version 4.3.2
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.3.2
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
# Loading the msleep dataset
data(msleep)

# Creating a linear model
model <- lm(sleep_rem ~ sleep_total + awake + brainwt + bodywt, data = msleep)

# Displaying the summary of the model
summary(model)
## 
## Call:
## lm(formula = sleep_rem ~ sleep_total + awake + brainwt + bodywt, 
##     data = msleep)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.8247 -0.5768 -0.1758  0.5116  2.5960 
## 
## Coefficients: (1 not defined because of singularities)
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.2809108  0.4000656  -0.702    0.486    
## sleep_total  0.2063069  0.0330030   6.251 1.44e-07 ***
## awake               NA         NA      NA       NA    
## brainwt      0.2300009  0.6476446   0.355    0.724    
## bodywt       0.0005355  0.0013135   0.408    0.685    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8528 on 44 degrees of freedom
##   (35 observations deleted due to missingness)
## Multiple R-squared:  0.5147, Adjusted R-squared:  0.4816 
## F-statistic: 15.55 on 3 and 44 DF,  p-value: 4.879e-07

Model Diagnosis:

Coefficient Interpretation:

The coefficients represent the estimated effect of each explanatory variable on the response variable (sleep_rem). For example, if the coefficient for “sleep_total” is 0.3, it means that for each additional hour of total sleep, the REM sleep increases by 0.3 hours, assuming all other variables remain constant.

Standard Errors and Confidence Intervals:

We can use the standard errors of the coefficients to calculate confidence intervals. The confidence interval for a coefficient provides a range of values within which we are confident that the true population coefficient lies.

Variable Transformation:

Depending on the results, we might need to consider transformations (e.g., log transformation) if assumptions like linearity, normality, and homoscedasticity are not met. However, this will become apparent during the model diagnosis.

Scatter Plots:

Scatter plots can be used to visualize the relationships between the response and explanatory variables. These can help identify potential issues like outliers or non-linearity.

Residual Analysis:

Checking the model residuals for normality, homoscedasticity, and independence is essential.