Object Classes

Question 1

Create a vector named heights that contains the heights, in inches, of yourself and two students near you. Print the contents of this vector.

heights <- c(72, 68, 70)
print(heights)
## [1] 72 68 70

Question 2

Create a vector named names that contains the names of these people. Print the contents of this vector.

names <- c("Gavin", "John", "Greg")
print(names)
## [1] "Gavin" "John"  "Greg"

Question 3

Try typing cbind(heights, names). What did this command do? What class is this new object?

cbind(heights, names)
##      heights names  
## [1,] "72"    "Gavin"
## [2,] "68"    "John" 
## [3,] "70"    "Greg"
class(cbind(heights,names))
## [1] "matrix" "array"

It places the two vectors, heights and names, side by side. The new object’s class is a matrix.

Reading Data Into R Studio

Question 4

Download the data set births.csv from CCLE and read it into R (or RStudio). Name the data frame NCbirths.

NCbirths <- read.csv("births.csv")

Question 5

Demonstrate that you have been successful by typing head(NCbirths).

head(NCbirths)

Question 6

Extract the weight variable as a vector from the data frame by typing the command,

weights <- NCbirths$weight

What units do you think the weights are in?

weights <- NCbirths$weight

Ounces

Question 7

Create a new vector named weights_in_pounds which are the weights of the babies in pounds. You can look up conversion factors on the internet. Demonstrate your success by typing weights_in_pounds[1:20].

weights_in_pounds <- weights / 16
weights_in_pounds[1:20]
##  [1]  7.7500 11.0625  6.6875  9.0000  7.3125  6.1250  9.1875  8.6250  6.5000
## [10]  7.6875  9.5625  8.0625  7.4375  6.7500  6.6250  7.8125  7.1875  8.0000
## [19]  8.2500  5.1875

Summarizing Data (One Variable)

The functions summary(), mean(), sd(), max(), and min() all produce helpful results to help us understand how quantitative data is distributed. These functions will take in a numeric vector inside the parentheses. As an example, max(NCbirths$weight) will give us the maximum weight of all the babies in out sample.

We can also use the tally() function in the mosaic package to help us summarize categorical data. The tally() function requires a categorical vector inside the parentheses. For example, after you have installed and loaded the mosaic package, try typing tally(NCbirths$Racemom).

Question 8:

What is the mean weight of the babies in pounds?

mean(weights_in_pounds)
## [1] 7.253691

7.253691

Question 9

What percentage of the mothers in the sample smoke?

Hint: Use the tally() function with the format argument. Use the help screen for guidance.

tally(NCbirths$Habit)
## X
##           NonSmoker    Smoker 
##         6      1805       187
tally(NCbirths$Habit, format="percent") 
## X
##             NonSmoker     Smoker 
##  0.3003003 90.3403403  9.3593594

9.3593594% smoke

Simulating the Chance Model

According to the Centers for Disease Control, approximately 21% of adult Americans are smokers.

The following command simulates 10 repetitions of 200 coin tosses with the probability (long-run proportion) of a coin landing on heads to be 0.5. The 10 values produced represent the number of successes (heads) which were observed in each trial.

set.seed(123)
output <- do(10) * rflip(200, prob = 0.5)

Question 10

Modify the code above to simulate selecting 1998 people so that the probability of selecting a smoker is 21%. Repeat this simulation 1000 times, each time saving the percent of “smokers” in your sample.

Note: Keep the set.seed(123) command. This assures that everyone will get the same “random” outcome. We will elaborate more on this command in a later lab.

set.seed(123)
output <- do(1000) * rflip(1998, prob = 0.21)
head(output)

Question 11

The object output$prop contains the vector of simulated proportions of “smokers” under the chance model. Use the dotPlot() and histogram() functions to visualize the proportion of “smokers” under the chance model.

dotPlot(output$prop, cex = 5)

histogram(output$prop)

Question 12

Based on your plots in Question 11 and your answer in Question 9, is there evidence that the proportion of mothers who smoke is different from 21%? Explain.

9.6% falls to the far left on this plot and is much smaller than 18% which is the smallest point on the output. This makes it an extreme value and provides evidence for the proportion being different from 21%.