Save this file as “Math_2305_data_science_assignment_your_first_name_last_name.Rmd” (e.g., assignment4_jonny_appleseed.Rmd).
For each task, provide appropriate R command(s) in the code chunk, and execute the code chunk to generate an outcome.
After completing all tasks, save the your Rmd file, and produce an HTML report. 3a. Make sure to delete all intermediate code chunks before creating an HTML report.
Submit your Rmd file and the rendered HTML report to D2L by its due date.
Math 2305 Data Science Assignment_data.csv and
save it in an R object so that you can use in the subsequent analysis.
Use tidyverse package in this RMD file as you will use data
science tools and techniques when analyzing the data.library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.5.2
## Warning: package 'ggplot2' was built under R version 4.5.2
## Warning: package 'tibble' was built under R version 4.5.2
## Warning: package 'tidyr' was built under R version 4.5.2
## Warning: package 'readr' was built under R version 4.5.2
## Warning: package 'purrr' was built under R version 4.5.2
## Warning: package 'dplyr' was built under R version 4.5.2
## Warning: package 'stringr' was built under R version 4.5.2
## Warning: package 'forcats' was built under R version 4.5.2
## Warning: package 'lubridate' was built under R version 4.5.2
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
df <- read.csv("D:/LS 2025/Discrete/Week 13 RStudio Assignments/Data Assignment/D_Assignment_data.csv")
df
dim(df)
## [1] 77 17
head(df)
teachers,
compstu, and testscr values (4 point)library(tidyverse)
df2 <- df %>%
summarize(mean_teachers = mean(teachers),
sd_teachers = sd(teachers),
mean_compstu = mean(compstu),
sd_compstu = sd(compstu),
mean_testscr = mean(testscr),
sd_testscr = sd(testscr))
df2
readscr
vs. mathscr (5 point)readscr for the X axismathscr for the Y axisdf %>%
ggplot(aes(x = readscr, y = mathscr)) +
geom_point() +
geom_smooth(method = "lm" ) +
labs(x = "Reading Score", y = "Math Score", title = "Reading Score vs Math Score")
## `geom_smooth()` using formula = 'y ~ x'
readscr and
mathscr (30 points)readscr and mathscr is statistically
significant and not a result of random variations in our
sample.readscr and mathscr happened due to
random occurrences in the dataset. A small p-value (like less than 0.05)
suggests that the observed relationship has ______statistical__________
____significance_____________; this means that we can reasonably
generalize this relationship to the broader population, implying it’s
_____not ______ just a result of random occurrences in our sample.cor.test(df$readscr, df$mathscr)
##
## Pearson's product-moment correlation
##
## data: df$readscr and df$mathscr
## t = 21.843, df = 75, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8911715 0.9547822
## sample estimates:
## cor
## 0.9295991
You will have a total of 100 points for this assignment.