heights that contains the heights, in inches, of yourself and two students near you. Print the contents of this vector.heights <- c(62, 63, 74)
heights
## [1] 62 63 74
names that contains the names of these people. Print the contents of this vector.names <- c("Adrienne", "Geraldine", "Saichandra")
names
## [1] "Adrienne" "Geraldine" "Saichandra"
cbind(heights, names). What did this command do? What class is this new object?cbind(heights, names)
## heights names
## [1,] "62" "Adrienne"
## [2,] "63" "Geraldine"
## [3,] "74" "Saichandra"
class(cbind(heights,names))
## [1] "matrix"
#This command takes two different inputs which are heights and names and assigns the names to the particularly ordered height. An individual vector component is mapped to the other vector component. To illustrate this, “Adrienne” and “62” are individual vectors under their height or name vectors. With cbind, the name and the corresponding height are put next to each other. In this case, Adrienne’s name was put under the name vector and with cbind, it mapped her name to the corresponding height vector for her which was 62 inches. The class of the new object would be in the matrix form.
births.csv from CCLE and read it into R (or RStudio). Name the data frame NCbirths.NCbirths <- read.csv ("births.csv")
head(NCbirths).head(NCbirths)
weights <- NCbirths$weight# Type your CODE in here
weights <- NCbirths$weight
head(weights)
## [1] 124 177 107 144 117 98
#The units of the data would be in ounces since they are larger than 98.
weights_in_pounds which are the weights of the babies in pounds. You can look up conversion factors on the internet. Demonstrate your success by typing weights_in_pounds[1:20].# Type your CODE in here
weights.in.pounds <- weights/16
weights.in.pounds[1:20]
## [1] 7.7500 11.0625 6.6875 9.0000 7.3125 6.1250 9.1875 8.6250 6.5000
## [10] 7.6875 9.5625 8.0625 7.4375 6.7500 6.6250 7.8125 7.1875 8.0000
## [19] 8.2500 5.1875
The functions summary(), mean(), sd(), max(), and min() all produce helpful results to help us understand how quantitative data is distributed. These functions will take in a numeric vector inside the parentheses. As an example, max(NCbirths$weight) will give us the maximum weight of all the babies in out sample.
We can also use the tally() function in the mosaic package to help us summarize categorical data. The tally() function requires a categorical vector inside the parentheses. For example, after you have installed and loaded the mosaic package, try typing tally(NCbirths$Racemom).
mean(weights.in.pounds)
## [1] 7.253691
#The mean weight of babies in pounds would be around 7.25 pounds. ### Question 9
tally() function with the format argument. Use the help screen for guidance.tally(NCbirths$Habit,format="percent")
## X
## NonSmoker Smoker
## 0.3003003 90.3403403 9.3593594
#Observing the data, the percentage that the mothers of the sample smoke is 9.36%. ## Simulating the Chance Model
According to the Centers for Disease Control, approximately 21% of adult Americans are smokers.
The following command simulates 10 repetitions of 200 coin tosses with the probability (long-run proportion) of a coin landing on heads to be 0.5. The 10 values produced represent the number of successes (heads) which were observed in each trial.
set.seed(123)
output <- do(10) * rflip(200, prob = 0.5)
head(output)
set.seed(123) command. This assures that everyone will get the same “random” outcome. We will elaborate more on this command in a later lab.set.seed(123)
output <- do(1000) * rflip(1998, prob = 0.21)
output$prop contains the vector of simulated proportions of “smokers” under the chance model. Use the dotPlot() and histogram() functions to visualize the proportion of “smokers” under the chance model.dotPlot(output$prop, cex = 5)
histogram(output$prop)
#Based on the plots I drew for number 11 as well as my answer for question 9, there is evidence that the proportion of mothers who smoke is not exactly 21%. In the dotplot and histogram, the null distribution centers around 21% and within those 2 plots, while 9.36% is a really extreme value that falls on the outer bounds of the histogram inconsistent with the chance model.Other values like .18 and .24 shown in the histogram are also extreme values, but not as extreme as .0936 which isn’t even shown on the plot .21 represents the percent of smokers represented in the study and this statistic represents the null statistic making up the null hypothesis(the proportion under the chance model). #The p-value would be 0 because it is 0/1000 would be zero since the observed statistic is farther away from the null statistic.