#Source the source code
source("spring2018_class6_source_code.R")
#Import the data
spring2018_class6_data <- read.csv("spring2018_class6_data.csv", header = T)
#Attach the data
attach(spring2018_class6_data)
Exercise 1: Make the list “val” and generate the new list “laura.”
The value of val is {2, 5, 5, 5, 7, 7, 10, 31}. Its mean is 9. The value of laura is {6, 15, 15, 15, 21, 30, 93}
val <- c(2, 5, 5, 5, 7, 7, 10, 31)
mean(val)
## [1] 9
laura <- val*3
Exercise 2: The categorical variable I will examine is handedness in BHSEC Stats students.
table(handedness)
## handedness
## Left Right
## 6 47
barplot(handedness, main="Handedness Across BHSEC Stats Students", xlab="Dominant Hand", ylab="Number of Students")
Observation: The number of righthanded BHSEC Stats students are almost 8 times as many as the number of lefthanded BHSEC Stats students.
Exercise 3: The two categorical variables I will be comparing are preferences in drinks (Coke vs. Pepsi) and borough of residency of BHSEC Stats students.
table(borough, coke.pepsi)
## coke.pepsi
## borough Coke Pepsi
## Brooklyn 2 4
## Manhattan 2 1
## Queens 21 18
## The Bronx 2 0
plot(table(borough, coke.pepsi), main="Coke vs. Pepsi Across NYC Boroughs", xlab="Borough of Residence", ylab="Number of People who Prefer Coke or Pepsi")
Observation: Coke seems to be more prefered than Pepsi in the Bronx, while there seems to be little preference between both drinks in Queens. It is hard to tell whether there is an association between the preferencce of a drink and the borough. Certainly, a drink may seem to be more popular than the other (ex. Coke being the preference in the Bronx), but for the remaining boroughs, the data is not far from each other, considering the small number of people that were part of this sample.
Exercise 4: The quantitative variable I will examine is hrs.exercise.
summary(hrs.exercise)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.000 4.000 5.575 8.000 20.000
hist(hrs.exercise, main="Hours of Exercise Among BHSEC Stats Students", xlab="Hours of Exercise", ylab="Number of Students")
queens.HrsExercise <- subset(hrs.exercise, borough=="Queens")
summary(queens.HrsExercise)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 3.000 4.000 5.869 8.500 20.000
hist(queens.HrsExercise, main="Hours of Exercise Among BHSEC Stats Students from Queens", xlab="Hours of Exercise", ylab="Number of Students")
The following data relates to BHSEC Stats students: The mean hours of exercise are 5.58 hours per week. The median in the data of hours of exercise are 4 hours per week. The range of hours of exercise are 0 to 20 hours per week.
The large majority of BHSEC Stats students do less than 5 hours of exercise. Moreover, the shape of the histogram is similar to a staircase, but with a large gap between students doing 0 to 5 hours of exercise and students doing 5 to 10 hours of exercise. As the hours of exercise increase, the number of students who do exercise decreases significantly.
When considering only BHSEC Stats students from Queens, the data does not change significantly. The mean hours of exercise is 5.86 hours per week. The median in the data is 4 hours per week. The range of hours of exercise are 0 to 20 hours per week.
In the histogram, the data closely reflects that of the BHSEC Stats students across all boroughs in NYC, showing a large gap between students who exercise for 0 to 5 hours vs. students who exercise 5 to 10 hours. The histogram also reflects the decrease in the number of students who exercise as the number of hours increases.
Exercise 5: The quantitative variable I will compare across years is height.
summary(height)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 59.00 64.00 67.00 66.93 69.25 76.00 1
year1Height <- subset(height, year=="Year 1")
year2Height <- subset(height, year=="Year 2")
boxplot(height~year, main="Heights in Year 1's vs. Year 2's", xlab="Year in School", ylab="Height of Student")
The following data relates to BHSEC Stats students: The average height is 66.93 cm. The range of heights are 59.00 cm to 76.00 cm.
When dividing the data according to the students’ year, the following is found: The average height of a Year 1 is 65.71 cm while the average height of a Year 2 is 67.12 cm. The range of heights among Year 1’s are 63 to 70 cm whereas the range of heights among Year 2’s are 59 to 76 cm, indicating a large gap in height between Year 1’s and Year 2’s.
The boxplot further clarifies that there is a much wider range between Year 2’s than in Year 1’s; however, height and year do not seem to be correlated. Sure, some Year 2’s may be much taller than Year 1’s, which could suggest it’s due because of age. But there are also many Year 2’s much shorter than the average Year 1.