[COMPLETED] (CW) Open an R Markdown file to use for today’s classwork [COMPLETED]
(CW) Load the bike sharing data from last class
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
bike_sharing <- read_csv("~/Downloads/bikesharing.csv")
## Rows: 731 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (4): season, month, weekday, weather
## dbl (7): year, temperature_F, casual, registered, count, humidity, windspeed
## lgl (2): holiday, workingday
## date (2): date, date_noyear
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
bike_sharing %>%
ggplot(aes(reorder(season, count), count)) +
geom_boxplot()
bike_sharing %>%
ggplot(aes(reorder(season, windspeed), windspeed, fill = month)) +
geom_boxplot()
# Example: t.test(count ~ workingday, data = bike_sharing)
ais <- read_csv("~/Downloads/ais.csv")
## Rows: 202 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): sex, sport
## dbl (7): rcc, wcc, hc, hg, ferr, ht, wt
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Answer: I detect that there is a difference in mean ht by sex since the p-value is less than 0.05, which rejects the null hypothesis.
t.test(ht ~ sex, data = ais)
##
## Welch Two Sample t-test
##
## data: ht by sex
## t = -9.6009, df = 199.24, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group f and group m is not equal to 0
## 95 percent confidence interval:
## -13.153090 -8.670675
## sample estimates:
## mean in group f mean in group m
## 174.5940 185.5059
Answer: I detect that there is not a difference in mean wwx by sex since the p-value is greater than 0.05, which fails to reject the null hypothesis.
t.test(wcc ~ sex, data = ais)
##
## Welch Two Sample t-test
##
## data: wcc by sex
## t = -0.8988, df = 198.28, p-value = 0.3699
## alternative hypothesis: true difference in means between group f and group m is not equal to 0
## 95 percent confidence interval:
## -0.7268643 0.2717271
## sample estimates:
## mean in group f mean in group m
## 6.994000 7.221569
cor(select(ais, rcc, wcc, hc, hg, ferr, ht, wt))
## rcc wcc hc hg ferr ht wt
## rcc 1.0000000 0.14706422 0.9249639 0.8887998 0.2508655 0.35885396 0.4037635
## wcc 0.1470642 1.00000000 0.1533326 0.1347199 0.1320729 0.07681056 0.1556625
## hc 0.9249639 0.15333265 1.0000000 0.9507567 0.2582395 0.37119150 0.4237113
## hg 0.8887998 0.13471992 0.9507567 1.0000000 0.3083911 0.35232222 0.4552628
## ferr 0.2508655 0.13207288 0.2582395 0.3083911 1.0000000 0.12325468 0.2737023
## ht 0.3588540 0.07681056 0.3711915 0.3523222 0.1232547 1.00000000 0.7809321
## wt 0.4037635 0.15566247 0.4237113 0.4552628 0.2737023 0.78093207 1.0000000
pairs(select(ais, rcc, wcc, hc, hg, ferr, ht, wt))
Answer: From the correlation matrix, the two variables that seem to have the highest correlation are hg and hc. The p-value from these variables is 2.2e-16, which is smaller than our alpha of 0.05. The confidence interval for the correlation coefficient is between 0.9354917 and 0.9624795 with 95% confidence. From this, we can see that there is a strong to near perfect positive linear correlation between hg and hc.
cor.test(ais$hg, ais$hc)
##
## Pearson's product-moment correlation
##
## data: ais$hg and ais$hc
## t = 43.382, df = 200, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.9354917 0.9624795
## sample estimates:
## cor
## 0.9507567
lm(count ~ temperature_F, data = bike_sharing)
##
## Call:
## lm(formula = count ~ temperature_F, data = bike_sharing)
##
## Coefficients:
## (Intercept) temperature_F
## -1663.15 89.96
summary(lm(count ~ temperature_F, data = bike_sharing))
##
## Call:
## lm(formula = count ~ temperature_F, data = bike_sharing)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4616 -1135 -105 1046 3741
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1663.154 288.972 -5.755 1.27e-08 ***
## temperature_F 89.957 4.135 21.753 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1510 on 729 degrees of freedom
## Multiple R-squared: 0.3936, Adjusted R-squared: 0.3928
## F-statistic: 473.2 on 1 and 729 DF, p-value: < 2.2e-16