Save this file as “Math_2305_data_science_assignment_your_first_name_last_name.Rmd” (e.g., assignment4_jonny_appleseed.Rmd).
For each task, provide appropriate R command(s) in the code chunk, and execute the code chunk to generate an outcome.
After completing all tasks, save the your Rmd file, and produce an HTML report. 3a. Make sure to delete all intermediate code chunks before creating an HTML report.
Submit your Rmd file and the rendered HTML report to D2L by its due date.
Math 2305 Data Science Assignment_data.csv
and
save it in an R object so that you can use in the subsequent analysis.
Use tidyverse
package in this RMD file as you will use data
science tools and techniques when analyzing the data.library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
df <- read_csv("Math 2305 Data Science Assignment_data.csv")
## Rows: 77 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): county, district, grspan
## dbl (14): distcod, teachers, calwpct, mealpct, computer, testscr, compstu, e...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df
dim(df)
## [1] 77 17
head(df)
teachers
,
compstu
, and testscr
values (4 point)df2 <- df %>%
summarize(mean_teachers = mean(teachers),
sd_teachers = sd(teachers),
mean_compstu = mean(compstu),
sd_compstu = sd(compstu),
mean_testscr = mean(testscr),
sd_testscr = sd(testscr))
df2
readscr
vs. mathscr
(5 point)readscr
for the X axismathscr
for the Y axisdf %>%
ggplot(aes(x = readscr, y = mathscr)) +
geom_point() +
geom_smooth(method = "lm") +
labs(x = "Reading score", y = "Math score", title = "READSCR versus MATHSCR")
## `geom_smooth()` using formula = 'y ~ x'
readscr
and
mathscr
(30 points)readscr
and
mathscr
.readscr
and mathscr
.readscr
and mathscr
is
statistically significant and is not a result of random variations in
our sample.readscr
and mathscr
could have
happened due to random occurrences in the dataset. A small p-value (like
less than 0.05) suggests that the observed relationship has statistical
significance; this means that we can reasonably generalize this
relationship to the broader population, implying it’s not just a result
of random occurrences in our sample.cor.test(df$readscr, df$mathscr)
##
## Pearson's product-moment correlation
##
## data: df$readscr and df$mathscr
## t = 21.843, df = 75, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.8911715 0.9547822
## sample estimates:
## cor
## 0.9295991
You will have a total of 100 points for this assignment.