summary(pima_te)
## npreg glu bp skin
## Min. : 0.000 Min. : 65.0 Min. : 24.00 Min. : 7.00
## 1st Qu.: 1.000 1st Qu.: 96.0 1st Qu.: 64.00 1st Qu.:22.00
## Median : 2.000 Median :112.0 Median : 72.00 Median :29.00
## Mean : 3.485 Mean :119.3 Mean : 71.65 Mean :29.16
## 3rd Qu.: 5.000 3rd Qu.:136.2 3rd Qu.: 80.00 3rd Qu.:36.00
## Max. :17.000 Max. :197.0 Max. :110.00 Max. :63.00
## bmi ped age type
## Min. :19.40 Min. :0.0850 Min. :21.00 No :223
## 1st Qu.:28.18 1st Qu.:0.2660 1st Qu.:23.00 Yes:109
## Median :32.90 Median :0.4400 Median :27.00
## Mean :33.24 Mean :0.5284 Mean :31.32
## 3rd Qu.:37.20 3rd Qu.:0.6793 3rd Qu.:37.00
## Max. :67.10 Max. :2.4200 Max. :81.00
str(pima_te)
## 'data.frame': 332 obs. of 8 variables:
## $ npreg: int 6 1 1 3 2 5 0 1 3 9 ...
## $ glu : int 148 85 89 78 197 166 118 103 126 119 ...
## $ bp : int 72 66 66 50 70 72 84 30 88 80 ...
## $ skin : int 35 29 23 32 45 19 47 38 41 35 ...
## $ bmi : num 33.6 26.6 28.1 31 30.5 25.8 45.8 43.3 39.3 29 ...
## $ ped : num 0.627 0.351 0.167 0.248 0.158 0.587 0.551 0.183 0.704 0.263 ...
## $ age : int 50 31 21 26 53 51 31 33 27 29 ...
## $ type : Factor w/ 2 levels "No","Yes": 2 1 1 2 2 2 2 1 1 2 ...
names(pima_te)
## [1] "npreg" "glu" "bp" "skin" "bmi" "ped" "age" "type"
four_header<-c("Mean", "Median", "Max", "Min", "Range", "# of Observations")
bmi_results<-c(mean(pima_te$bmi),median(pima_te$bmi), max(pima_te$bmi), min(pima_te$bmi), range(pima_te$bmi), nrow(pima_te$bmi))
age_results<-c(mean(pima_te$age),median(pima_te$age), max(pima_te$age), min(pima_te$age), range(pima_te$age), nrow(pima_te$age))
names(bmi_results)<-four_header
names(age_results)<-four_header
BMI Results:
print(bmi_results)
## Mean Median Max Min
## 33.23976 32.90000 67.10000 19.40000
## Range # of Observations
## 19.40000 67.10000
Age Results:
print(age_results)
## Mean Median Max Min
## 31.31627 27.00000 81.00000 21.00000
## Range # of Observations
## 21.00000 81.00000
totalWomen<-nrow(pima_frame)
print(totalWomen)
## [1] 332
pima_frame[1:5, 1:4]
## npreg glu bp skin
## 1 6 148 72 35
## 2 1 85 66 29
## 3 1 89 66 23
## 4 3 78 50 32
## 5 2 197 70 45
highBMI <-pima_te[which(pima_te$bmi>=50),]
print(highBMI)
## npreg glu bp skin bmi ped age type
## 55 0 162 76 56 53.2 0.759 25 Yes
## 57 1 88 30 42 55.0 0.496 26 Yes
## 70 7 152 88 44 50.0 0.337 36 Yes
## 79 0 129 110 46 67.1 0.319 26 Yes
## 107 0 165 90 33 52.3 0.427 23 No
## 198 0 180 78 63 59.4 2.420 25 Yes
## 292 3 123 100 35 57.3 0.880 22 No
numOfDiabetics<-nrow(pima_te[which(pima_te$type=='Yes'),])
percentOfDiabetics<-(numOfDiabetics/totalWomen)
percent(percentOfDiabetics)
## [1] "32.8%"
hist(pima_te$bmi, col="blue")
bmi_names<-c("Mean", "Median")
bmi_DoubleMs<-c(mean(pima_te$bmi),median(pima_te$bmi))
names(bmi_DoubleMs)<-bmi_names
print(bmi_DoubleMs)
## Mean Median
## 33.23976 32.90000
The above 2 values are only 0.33976 apart.
hist(vlbw$hospstay, col="red")
Answer: Yes. There are negative values for length of stay for a number of infants. Also the range of the historgram is poorly distributed that no good analysis from the data can be drawn from it.
boxplot(vlbw$lowph)
Based on the boxplot above, the median appears to be approximately 7.2. Q1, Q2, Q3, and Q4 look to be around 6.8, 7.1, 7.3, and 7.5 respectively.
boxplot.stats(vlbw$lowph)
## $stats
## [1] 6.859997 7.129997 7.209999 7.309998 7.549999
##
## $n
## [1] 609
##
## $conf
## [1] 7.198475 7.221524
##
## $out
## [1] 6.829998 6.849998 6.529999 6.809998 6.779999 6.849998 6.759998
## [8] 6.699997 6.699997 6.820000 6.809998 6.719997 6.809998 6.739998
My estimates for question 14 do seem about the same based on the stats of the boxplot above.
qplot(vlbw$lowph, geom="histogram",
main = "Histogram for lowph", binwidth=0.02 ,xlab = "lowest pH", fill=I("blue"), col=I("white"))
## Warning: Removed 62 rows containing non-finite values (stat_bin).