Save this file as “Math_2305_data_science_assignment_your_first_name_last_name.Rmd” (e.g., assignment4_jonny_appleseed.Rmd).
For each task, provide appropriate R command(s) in the code chunk, and execute the code chunk to generate an outcome.
After completing all tasks, save the your Rmd file, and produce an HTML report. 3a. Make sure to delete all intermediate code chunks before creating an HTML report.
Submit your Rmd file and the rendered HTML report to D2L by its due date.
Math 2305 Data Science Assignment_data.csv and
save it in an R object so that you can use in the subsequent analysis.
Use tidyverse package in this RMD file as you will use data
science tools and techniques when analyzing the data.library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
df <- read_csv("Math 2305 Data Science Assignment_data.csv")
## Rows: 77 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): county, district, grspan
## dbl (14): distcod, teachers, calwpct, mealpct, computer, testscr, compstu, e...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df
dim(df)
## [1] 77 17
head(df)
teachers,
compstu, and testscr values (4 point)df2 <- df %>%
summarize(mean_teachers = mean(teachers),
sd_teachers = sd(teachers),
mean_compstu = mean(compstu),
sd_compstu = sd(compstu),
mean_testscr = mean(testscr),
sd_testscr = sd(testscr))
df2
readscr
vs. mathscr (5 point)readscr for the X axismathscr for the Y axisdf %>%
ggplot(aes(x = readscr, y = mathscr)) +
geom_point() +
geom_smooth(method = lm) +
labs(x = "Reading Score", y = "Math Score", title = "readscr versus mathscr")
## `geom_smooth()` using formula = 'y ~ x'
readscr and
mathscr (30 points)readscr and
mathscr is statistically significant and is not a result of
random variations in our sample.readscr and mathscr could have
happened due to random occurrences in the dataset. A small p-value (like
less than 0.05) suggests that the observed relationship has statistical
signifigance; this means that we can reasonably generalize this
relationship to the broader population, implying it’s not just a result
of random occurrences in our sample.cor.test(df$readscr, df$mathscr)
##
## Pearson's product-moment correlation
##
## data: df$readscr and df$mathscr
## t = 21.843, df = 75, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8911715 0.9547822
## sample estimates:
## cor
## 0.9295991
You will have a total of 100 points for this assignment.