heights that contains the heights, in inches, of yourself and two students near you. Print the contents of this vector.heights <- c(72, 68, 70)
print(heights)
## [1] 72 68 70
names that contains the names of these people. Print the contents of this vector.names <- c("Gavin", "John", "Greg")
print(names)
## [1] "Gavin" "John" "Greg"
cbind(heights, names). What did this command do? What class is this new object?cbind(heights, names)
## heights names
## [1,] "72" "Gavin"
## [2,] "68" "John"
## [3,] "70" "Greg"
class(cbind(heights,names))
## [1] "matrix" "array"
It places the two vectors, heights and names, side by side. The new object’s class is a matrix.
births.csv from CCLE and read it into R (or RStudio). Name the data frame NCbirths.NCbirths <- read.csv("births.csv")
head(NCbirths).head(NCbirths)
weights <- NCbirths$weightweights <- NCbirths$weight
Ounces
weights_in_pounds which are the weights of the babies in pounds. You can look up conversion factors on the internet. Demonstrate your success by typing weights_in_pounds[1:20].weights_in_pounds <- weights / 16
weights_in_pounds[1:20]
## [1] 7.7500 11.0625 6.6875 9.0000 7.3125 6.1250 9.1875 8.6250 6.5000
## [10] 7.6875 9.5625 8.0625 7.4375 6.7500 6.6250 7.8125 7.1875 8.0000
## [19] 8.2500 5.1875
The functions summary(), mean(), sd(), max(), and min() all produce helpful results to help us understand how quantitative data is distributed. These functions will take in a numeric vector inside the parentheses. As an example, max(NCbirths$weight) will give us the maximum weight of all the babies in out sample.
We can also use the tally() function in the mosaic package to help us summarize categorical data. The tally() function requires a categorical vector inside the parentheses. For example, after you have installed and loaded the mosaic package, try typing tally(NCbirths$Racemom).
mean(weights_in_pounds)
## [1] 7.253691
7.253691
tally() function with the format argument. Use the help screen for guidance.tally(NCbirths$Habit)
## X
## NonSmoker Smoker
## 6 1805 187
tally(NCbirths$Habit, format="percent")
## X
## NonSmoker Smoker
## 0.3003003 90.3403403 9.3593594
9.3593594% smoke
According to the Centers for Disease Control, approximately 21% of adult Americans are smokers.
The following command simulates 10 repetitions of 200 coin tosses with the probability (long-run proportion) of a coin landing on heads to be 0.5. The 10 values produced represent the number of successes (heads) which were observed in each trial.
set.seed(123)
output <- do(10) * rflip(200, prob = 0.5)
set.seed(123) command. This assures that everyone will get the same “random” outcome. We will elaborate more on this command in a later lab.set.seed(123)
output <- do(1000) * rflip(1998, prob = 0.21)
head(output)
output$prop contains the vector of simulated proportions of “smokers” under the chance model. Use the dotPlot() and histogram() functions to visualize the proportion of “smokers” under the chance model.dotPlot(output$prop, cex = 5)
histogram(output$prop)
9.6% falls to the far left on this plot and is much smaller than 18% which is the smallest point on the output. This makes it an extreme value and provides evidence for the proportion being different from 21%.