Lab 2 - Introduction to data

Name:Sonora Williams

Section:01L

Date:September 10, 2013

Exercises

Load data:

source("http://www.openintro.org/stat/data/cdc.R")

Exercise 1: There are 20,000 observation cases and 9 variables including:

general health-ordinal categorical recent exercise-regular categorical health plan-regular categorical smoker status-regular categorical height-numerical, discrete weight-numerical, discrete desired weight-numerical, discrete age-numerical, discrete gender-regular categorical

Exercise 2:

IQR(Height)=70-64=6 IQR(Age)=57-31=26

# enter code for Ex2 below
summary(cdc$height)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    48.0    64.0    67.0    67.2    70.0    93.0
70 - 64
## [1] 6
summary(cdc$age)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    18.0    31.0    43.0    45.1    57.0    99.0
57 - 31
## [1] 26

Exercise 3:

9569 males in the survey

# enter code for Ex3 below
table(cdc$gender)/20000
## 
##      m      f 
## 0.4784 0.5215
table(cdc$genhlth)/20000
## 
## excellent very good      good      fair      poor 
##   0.23285   0.34860   0.28375   0.10095   0.03385
table(cdc$gender)
## 
##     m     f 
##  9569 10431

Exercise 4:The mosaic plot shows that there is a greater precentage of the men that have smoked 100 cigarettes or more in the last month than the percentage of the females that have smoked the same quantity.

# code for Ex4 already given in lab
mosaicplot(table(cdc$gender, cdc$smoke100))

plot of chunk unnamed-chunk-4

Exercise 5: 620 respondants fit this criteria

# enter code for Ex5 below
under23_and_smoke <- subset(cdc, cdc$age < 23 & cdc$smoke100 == "1")

Exercise 6: The box plot shows that the better the general health of the individual the lower the mody mass index of the individual for the data set. For the variable of my choosing I chose the exerany variable as I believe that if people have exercised in the last month, their general health is probably better, therefore suggesting that their body mass index would be lower. The box plot shows a slightly lower median body mass index for those individuals that have exercised in the last month thatn those that have not exercised in the last month.

# code for bmi vs. genhlth already given in lab
bmi = (cdc$weight/cdc$height^2) * 703
boxplot(bmi ~ cdc$genhlth, main = "BMI vs. general health")

plot of chunk unnamed-chunk-6

# enter code for Ex6 below (boxplot for bmi vs. your chosen variable)
boxplot(bmi ~ +cdc$exerany, main = "BMI vs. exercised in the past month")

plot of chunk unnamed-chunk-7

Exercise 7:It seems as if there is a positive correlation between the variables as the less people way the less they wish they weighed.

# enter code for Ex7 below
plot(x = cdc$weight, y = cdc$wtdesire)

plot of chunk unnamed-chunk-8

Exercise 8:

No text needed for this question, just code.

# enter code for Ex8 below
wdiff <- (cdc$weight - cdc$wtdesire)

Exercise 9:This is a numerical data set, but not discrete as there is the posibility of negative numbers if a person wishes to weigh more than they currently do.It is also not continuous as the numbers are still whole integers. If the value for an individual is 0, this means that this individual is content with the current weight as they desire to remain the same weight. As I have the data set representing the current weight subtracted by the desired weight, a positive value would represent someone that wishes to weigh less than they currently do, which is often the case, and a negative value would represent an individual that would like weigh more than they currently do.

Exercise 10: The summary and the boxplot show that most people want to loose weight more so than gain weight, which jives with the current societal normals of today. The median is a value of ten, meaning that the median of theindividuals would prefer to be ten pounds lighter than they currently are. This may mean that most people are unhappy with their current weight, striving to be thinner, or it could mean that peopl simply desire to be healthier or able to drive smaller cars for the sake of the environment.

# enter code for Ex10 below - numerical summary
summary(wdiff)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -500.0     0.0    10.0    14.6    21.0   300.0
# enter code for Q4 below - plot(s)
boxplot(wdiff)

plot of chunk unnamed-chunk-11

Exercise 11:

# enter code dfor Ex11 below - numerical summary
malecdc = cdc[cdc$gender == "m", ]
malewtdiff = malecdc$weight - malecdc$wtdesire
femalecdc = cdc[cdc$gender == "f", ]
femalewtdiff = femalecdc$weight - femalecdc$wtdesire
summary(malewtdiff)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -500.0     0.0     5.0    10.7    20.0   300.0
summary(femalewtdiff)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   -83.0     0.0    10.0    18.2    27.0   300.0
# enter code for Ex11 below - side-by-side box plot
boxplot(wdiff ~ cdc$gender)

plot of chunk unnamed-chunk-13

Exercise 12:

# enter code for Ex12 below
summary(cdc$weight)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      68     140     165     170     190     500
sd(cdc$weight)
## [1] 40.08
length(cdc$weight[cdc$weight < 209.78097 & cdc$weight > 129.61903])
## [1] 14152
14152/20000
## [1] 0.7076