{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE)
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
#Gaussian Project - Elizabeth Rudolph
# data1 is Math 19500 section 1 grades: DWF rate = 18% enrollment 47
# data2 is Math 19500 section 2 grades: DWF rate = 26% enrollment 36
# are the results similar or statistically different?
# followed DG text and reused some of his variable names. AS I get more comfortable
# with R, I will rename variables in a more logical way
precalc1 <- c(94, 95, 94, 95, 94, 95, 94, 90, 91, 92, 93, 90, 85, 84, 84, 83,
85, 85, 84, 80, 81, 82, 81, 88, 89, 74, 75, 74, 76, 74, 75, 74,
76, 78, 79, 78, 79, 78, 50, 48, 52, 55, 50)
precalc2 <- c(81, 74, 100, 76, 55, 50, 80, 90, 99, 98, 66, 74, 75, 76, 91, 87, 74, 85, 67, 94, 88, 100, 91, 66, 49, 88, 82,
92, 81, 100, 99)
#finding the mean of the data sets-precalc1 & precalc2
mean(precalc1)
mean(precalc2)
#finding the standard deviation of the data sets-
#precalc1 & precalc2
sd(precalc1)
sd(precalc2)
mean1 <-80.30233
mean2 <-81.54839
sd1 <-12.76628
sd2 <-14.46107
#running t test comparing data sets - precalc1 & precalc2
t.test(precalc1,precalc2)
#Welch Two Sample t-test
# data: precalc1 and precalc2
# t = -0.38388, df = 59.716, p-value = 0.7024
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
# -7.739526 5.247403
# sample estimates:
# mean of x mean of y
# 80.30233 81.54839
#-0.38 t suggests null hypothesis is true: value between +2 and -2
# no significant difference between the data sets
# now to make a more appealing Gaussian distribution, replaced my data set with
# a set of numbers from 40 to 100 and assigned my calculated Mean and SD and plotted
# the two data sets
C1 <- seq(40,140,1)
C1
dp <-dnorm(C1, mean1,sd1)
dp
fdp <-dnorm(C1,81.54839, 14.46107)
# to better center my plot, I changed the upper limit of my sequence
plot(C1,dp, xlab= "grade", ylab="prob", ylim=c(0,0.04), main="Math 19500 sec 1 (x) and Math 19500 sec 2 (o) Grades Comparison")
points(C1, fdp, pch=4)
You can also embed plots, for example:
{r pressure, echo=FALSE} plot(pressure)
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.