practice

Author

Mfundo

Code
library(ggplot2)
Warning: package 'ggplot2' was built under R version 4.2.3
Code
ggplot(mtcars, aes(factor(carb), fill=factor(cyl))) + geom_bar()

loading libs

data visualisation

Code
ggplot(mtcars, aes(x=mpg, y=hp))+
  geom_point() +
  xlab("Miles per Gallon")+
  ylab("Horsepower")+
  theme_minimal()

Modelling

Code
model <- lm(mpg~ hp, data = mtcars)
summary(model)

Call:
lm(formula = mpg ~ hp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7121 -2.1122 -0.8854  1.5819  8.2360 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
hp          -0.06823    0.01012  -6.742 1.79e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-squared:  0.6024,    Adjusted R-squared:  0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07
Note

Based on this model summary, the coefficient for hp is statistically significant (p-value < 0.001), indicating that there is a significant relationship between horsepower and miles per gallon. The negative coefficient (-0.06823) suggests that as horsepower increases, the miles per gallon tends to decrease.

Code
par(mfrow = c(1,2))
plot(model)

Code
library(performance)
Warning: package 'performance' was built under R version 4.2.3
Code
library(see)
Warning: package 'see' was built under R version 4.2.3
Code
library(patchwork)
Warning: package 'patchwork' was built under R version 4.2.3
Code
theme_set(theme_classic(base_size = 2))
check_model(model)
Not enough model terms in the conditional part of the model to check for
  multicollinearity.

Code
check_outliers(model)
1 outlier detected: case 31.
- Based on the following method and threshold: cook (0.709).
- For variable: (Whole model).
Code
check_normality(model)
Warning: Non-normality of residuals detected (p = 0.022).
Code
check_distribution(model)
# Distribution of Model Family

Predicted Distribution of Residuals

 Distribution Probability
       normal         53%
       cauchy         38%
      tweedie          6%

Predicted Distribution of Response

  Distribution Probability
       tweedie         44%
           chi         28%
 beta-binomial         16%
Code
model_1 <- lm(mpg ~hp + drat, data = mtcars)
summary(model_1)

Call:
lm(formula = mpg ~ hp + drat, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.0369 -2.3487 -0.6034  1.1897  7.7500 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 10.789861   5.077752   2.125 0.042238 *  
hp          -0.051787   0.009293  -5.573 5.17e-06 ***
drat         4.698158   1.191633   3.943 0.000467 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.17 on 29 degrees of freedom
Multiple R-squared:  0.7412,    Adjusted R-squared:  0.7233 
F-statistic: 41.52 on 2 and 29 DF,  p-value: 3.081e-09
tip
  1. Residuals: These are the differences between the observed mpg values and the predicted values from the model. The summary provides statistics such as the minimum, 1st quartile, median, 3rd quartile, and maximum values of the residuals.

  2. Coefficients: The coefficients section presents the estimates, standard errors, t-values, and p-values for each predictor variable, as well as the intercept. Here’s the interpretation of the coefficients:

    • Intercept: The estimated intercept is 10.789861, indicating the expected value of mpg when both hp and drat are zero. The p-value (0.042238) suggests that the intercept is statistically significant at a significance level of 0.05.

    • hp: The estimated coefficient is -0.051787, indicating that, on average, each unit increase in horsepower (hp) is associated with a decrease of 0.051787 in the predicted mpg value. The low p-value (5.17e-06) suggests that this coefficient is statistically significant.

    • drat: The estimated coefficient is 4.698158, suggesting that, on average, each unit increase in the rear axle ratio (drat) is associated with an increase of 4.698158 in the predicted mpg value. The low p-value (0.000467) indicates that this coefficient is statistically significant.

  3. Residual standard error: This value represents the estimated standard deviation of the residuals, indicating the average distance between the observed mpg values and the predicted values from the model.

  4. Multiple R-squared and Adjusted R-squared: These values measure the goodness of fit of the model. The multiple R-squared represents the proportion of variance in the response variable (mpg) explained by the predictor variables (hp and drat). The adjusted R-squared takes into account the number of predictors and the sample size. In this case, the multiple R-squared is 0.7412, indicating that approximately 74.12% of the variability in mpg can be explained by the predictor variables in the model.

  5. F-statistic and p-value: The F-statistic tests the overall significance of the model, comparing the variance explained by the model to the residual variance. The low p-value (3.081e-09) indicates that the model as a whole is statistically significant, suggesting that at least one of the predictor variables is significantly associated with the mpg values.

Code
check_outliers(model_1)
OK: No outliers detected.
- Based on the following method and threshold: cook (0.808).
- For variable: (Whole model)
Code
check_collinearity(model_1)
# Check for Multicollinearity

Low Correlation

 Term  VIF   VIF 95% CI Increased SE Tolerance Tolerance 95% CI
   hp 1.25 [1.04, 2.42]         1.12      0.80     [0.41, 0.96]
 drat 1.25 [1.04, 2.42]         1.12      0.80     [0.41, 0.96]
Code
check_distribution(model_1)
# Distribution of Model Family

Predicted Distribution of Residuals

 Distribution Probability
       normal         50%
       cauchy         41%
            F          3%

Predicted Distribution of Response

  Distribution Probability
       tweedie         44%
           chi         28%
 beta-binomial         16%
Code
check_normality(model_1)
Warning: Non-normality of residuals detected (p = 0.024).
Code
check_predictions(model_1)

Code
check_posterior_predictions(model_1)