Object Classes

Question 1

Create a vector named heights that contains the heights, in inches, of yourself and two students near you. Print the contents of this vector.

heights <- c(62, 63, 74)
heights
## [1] 62 63 74

Question 2

Create a vector named names that contains the names of these people. Print the contents of this vector.

names <- c("Adrienne", "Geraldine", "Saichandra")
names
## [1] "Adrienne"   "Geraldine"  "Saichandra"

Question 3

Try typing cbind(heights, names). What did this command do? What class is this new object?

cbind(heights, names)
##      heights names       
## [1,] "62"    "Adrienne"  
## [2,] "63"    "Geraldine" 
## [3,] "74"    "Saichandra"
class(cbind(heights,names))
## [1] "matrix"

#This command takes two different inputs which are heights and names and assigns the names to the particularly ordered height. An individual vector component is mapped to the other vector component. To illustrate this, “Adrienne” and “62” are individual vectors under their height or name vectors. With cbind, the name and the corresponding height are put next to each other. In this case, Adrienne’s name was put under the name vector and with cbind, it mapped her name to the corresponding height vector for her which was 62 inches. The class of the new object would be in the matrix form.

Reading Data Into R Studio

Question 4

Download the data set births.csv from CCLE and read it into R (or RStudio). Name the data frame NCbirths.

NCbirths <- read.csv ("births.csv")

Question 5

Demonstrate that you have been successful by typing head(NCbirths).

head(NCbirths)

Question 6

Extract the weight variable as a vector from the data frame by typing the command,

weights <- NCbirths$weight

What units do you think the weights are in?

# Type your CODE in here
weights <- NCbirths$weight
head(weights)
## [1] 124 177 107 144 117  98

#The units of the data would be in ounces since they are larger than 98.

Question 7

Create a new vector named weights_in_pounds which are the weights of the babies in pounds. You can look up conversion factors on the internet. Demonstrate your success by typing weights_in_pounds[1:20].

# Type your CODE in here
weights.in.pounds <- weights/16
weights.in.pounds[1:20]
##  [1]  7.7500 11.0625  6.6875  9.0000  7.3125  6.1250  9.1875  8.6250  6.5000
## [10]  7.6875  9.5625  8.0625  7.4375  6.7500  6.6250  7.8125  7.1875  8.0000
## [19]  8.2500  5.1875

Summarizing Data (One Variable)

The functions summary(), mean(), sd(), max(), and min() all produce helpful results to help us understand how quantitative data is distributed. These functions will take in a numeric vector inside the parentheses. As an example, max(NCbirths$weight) will give us the maximum weight of all the babies in out sample.

We can also use the tally() function in the mosaic package to help us summarize categorical data. The tally() function requires a categorical vector inside the parentheses. For example, after you have installed and loaded the mosaic package, try typing tally(NCbirths$Racemom).

Question 8:

What is the mean weight of the babies in pounds?

mean(weights.in.pounds)
## [1] 7.253691

#The mean weight of babies in pounds would be around 7.25 pounds. ### Question 9

What percentage of the mothers in the sample smoke?

Hint: Use the tally() function with the format argument. Use the help screen for guidance.

tally(NCbirths$Habit,format="percent")
## X
##             NonSmoker     Smoker 
##  0.3003003 90.3403403  9.3593594

#Observing the data, the percentage that the mothers of the sample smoke is 9.36%. ## Simulating the Chance Model

According to the Centers for Disease Control, approximately 21% of adult Americans are smokers.

The following command simulates 10 repetitions of 200 coin tosses with the probability (long-run proportion) of a coin landing on heads to be 0.5. The 10 values produced represent the number of successes (heads) which were observed in each trial.

set.seed(123)
output <- do(10) * rflip(200, prob = 0.5)
head(output)

Question 10

Modify the code above to simulate selecting 1998 people so that the probability of selecting a smoker is 21%. Repeat this simulation 1000 times, each time saving the percent of “smokers” in your sample.

Note: Keep the set.seed(123) command. This assures that everyone will get the same “random” outcome. We will elaborate more on this command in a later lab.

set.seed(123)
output <- do(1000) * rflip(1998, prob = 0.21)

Question 11

The object output$prop contains the vector of simulated proportions of “smokers” under the chance model. Use the dotPlot() and histogram() functions to visualize the proportion of “smokers” under the chance model.

dotPlot(output$prop, cex = 5)

histogram(output$prop)

Question 12

Based on your plots in Question 11 and your answer in Question 9, is there evidence that the proportion of mothers who smoke is different from 21%? Explain.

#Based on the plots I drew for number 11 as well as my answer for question 9, there is evidence that the proportion of mothers who smoke is not exactly 21%. In the dotplot and histogram, the null distribution centers around 21% and within those 2 plots, while 9.36% is a really extreme value that falls on the outer bounds of the histogram inconsistent with the chance model.Other values like .18 and .24 shown in the histogram are also extreme values, but not as extreme as .0936 which isn’t even shown on the plot .21 represents the percent of smokers represented in the study and this statistic represents the null statistic making up the null hypothesis(the proportion under the chance model). #The p-value would be 0 because it is 0/1000 would be zero since the observed statistic is farther away from the null statistic.