The analysis focuses on buil-in datasets in R to demonstrate how wide range of factors relates with each other.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
The data comes from car dataset showing the car speed and the stopping distance(dist)
cars %>% head(5)
## speed dist
## 1 4 2
## 2 4 10
## 3 7 4
## 4 7 22
## 5 8 16
cars %>% lm(dist~speed,data=.) %>% summary()
##
## Call:
## lm(formula = dist ~ speed, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -29.069 -9.525 -2.272 9.215 43.201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.5791 6.7584 -2.601 0.0123 *
## speed 3.9324 0.4155 9.464 1.49e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15.38 on 48 degrees of freedom
## Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438
## F-statistic: 89.57 on 1 and 48 DF, p-value: 1.49e-12
P-value associated with F-statistics is very small showing that at least one of the predictor variable is significant predictor of Outcome variable.
Individual P-value of the speed is significant showing that speed is a significant predictor of Stopping Distance (dist)
Multiple R-squared=0.65,Indicates that 65% of the change in stopping distance can be explained by explained by the change in speed.
model: dist=-17.58+3.93(speed) Increasing the speed of the car by 1 unit increases the car stopping distance by 3.93 ft
The data is sourced from “Trees” dataset and it shows the Diameter(girth), Height and Volume for Black Cherry Trees. The analysis seeks to investigate whether Volume depends on both diameter(girth) and the height of the tree.
trees %>% head(5)
## Girth Height Volume
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 4 10.5 72 16.4
## 5 10.7 81 18.8
trees %>% lm(Volume~Girth+Height,data=.) %>% summary()
##
## Call:
## lm(formula = Volume ~ Girth + Height, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.4065 -2.6493 -0.2876 2.2003 8.4847
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -57.9877 8.6382 -6.713 2.75e-07 ***
## Girth 4.7082 0.2643 17.816 < 2e-16 ***
## Height 0.3393 0.1302 2.607 0.0145 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.882 on 28 degrees of freedom
## Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
## F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
P-value associated with F-statistics is less than 0.5,showing that at least one of the predictor variable is significant.
Individual p-values of the coefficients are all significant indicating that the height and girth is a significant predictor of volume.
Adjusted R-squared=0.94,showing that 94% of the change in volume can be explained by both the change in girth and height.
model: Volume=-57.99+4.71(Girth)+0.34(Height)