# Install and load tidyverse
if (!require("tidyverse"))
install.packages("tidyverse")
## Loading required package: tidyverse
## Warning: package 'tidyverse' was built under R version 4.3.3
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
if (!require("gmodels"))
install.packages("gmodels")
## Loading required package: gmodels
## Warning: package 'gmodels' was built under R version 4.3.3
library(gmodels)
library(tidyverse)
mydata <- read.csv("https://raw.githubusercontent.com/drkblake/Data/main/DormTemps.csv")
head(mydata,10)
## DormID RoomTemp Range
## 1 1 61.0 Out of range
## 2 2 72.9 In range
## 3 3 67.0 In range
## 4 4 64.2 Out of range
## 5 5 62.2 Out of range
## 6 6 70.4 In range
## 7 7 62.7 Out of range
## 8 8 62.3 Out of range
## 9 9 62.2 Out of range
## 10 10 64.2 Out of range
mydata$V1 <- mydata$RoomTemp
test_value = 70.0
ggplot(mydata, aes(x = V1)) +
geom_histogram(color = "black", fill = "#1f78b4") +
geom_vline(aes(xintercept = mean(V1)))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
mydata %>%
select(V1) %>%
summarise(
count = n(),
mean = mean(V1, na.rm = TRUE),
sd = sd(V1, na.rm = TRUE),
min = min(V1, na.rm = TRUE),
max = max(V1, na.rm = TRUE),
`W Statistic` = shapiro.test(V1)$statistic,
`p-value` = shapiro.test(V1)$p.value)
## count mean sd min max W Statistic p-value
## 1 175 62.73486 3.957923 53 76 0.9884747 0.1650714
t.test(mydata$V1, mu = test_value)
##
## One Sample t-test
##
## data: mydata$V1
## t = -24.283, df = 174, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 70
## 95 percent confidence interval:
## 62.14435 63.32537
## sample estimates:
## mean of x
## 62.73486
# Specify the variable and test value
mydata$V1 <- mydata$Range
ggplot(mydata, aes(x = V1)) +
geom_bar(fill = "royalblue")
# Make the crosstab table
CrossTable(
mydata$V1,
prop.chisq = FALSE,
prop.t = FALSE,
prop.r = FALSE)
##
##
## Cell Contents
## |-------------------------|
## | N |
## |-------------------------|
##
##
## Total Observations in Table: 175
##
##
## | In range | Out of range |
## |--------------|--------------|
## | 26 | 149 |
## | 0.149 | 0.851 |
## |--------------|--------------|
##
##
##
##
# Run the chi-squared test
options(scipen = 999)
test <- chisq.test(table(mydata$V1),
p = c(.60,.40))
test
##
## Chi-squared test for given probabilities
##
## data: table(mydata$V1)
## X-squared = 148.6, df = 1, p-value < 0.00000000000000022
Investigate what the dorm temp sample’s average temperature is and whether the sample average differs significantly from the 70-degree average the university says the system is producing. A significant difference, of course, would indicate that the university’s claim is incorrect.
-For the 1st half of this analysis I was able to find that the dorm’s assesment of the average temperature was off by a signifcant amount of almost seven degrees. This can be seen with the mean temperature being 62 degrees.
Investigate what percentage of the dorm temp sample’s readings are “In range,” what percentage are “Out of range,” and whether those percentages differ significantly from the 60 percent “In range” and 40 percent “Out of range” split claimed by the university. A significant difference, of course, would indicate that the university’s claim is incorrect.
15 Percent of the dorm temp samples are in range whilst 85 percent of them are considered out of range, with there being a significant enough difference that the University’s claim is ultimately disproved.