#create vector COHb
COHb <-c(6.4, 2.6, 3.5, 2.9, 3.9, 2.2, 5.5, 4.4, 3.5, 3.2, 2.8, 2.4, 3.5,3.3, 3.7, 2.6, 3.5, 4.5, 4.2, 2.9, 3.1, 3.3, 4.3, 2.6, 4.1, 3.7)
summary(COHb)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.200 2.900 3.500 3.562 4.050 6.400
IQRCOHb <-sum(4.05-2.9)
print(IQRCOHb)
## [1] 1.15
sdCOHb <-sd(COHb)
print(sdCOHb)
## [1] 0.9533423
range(COHb)
## [1] 2.2 6.4
###create frequency distribution
breaks = seq(2,6.5, by = 0.84)
COHb.cut=cut(COHb, breaks, right = FALSE)
COHb.freq=table(COHb.cut)
cbind(COHb.freq)
## COHb.freq
## [2,2.84) 6
## [2.84,3.68) 10
## [3.68,4.52) 8
## [4.52,5.36) 0
## [5.36,6.2) 1
boxplot(COHb, data= COHb, main="boxplot of % increase COHb in pregnant women using tobacco", xlab = "% change COHb",horizontal = TRUE)
abline(v=summary(COHb), col= "blue", lwd = 0.5)
hist(COHb, breaks = seq(2,7,by =0.83), col = "maroon", main = "Histogram of % increase COHb in pregant smoking women ")
abline(v=summary(COHb), col= "blue", lwd = 0.5) # this displays 5 # summary on histogram
###interpretation - the dataset COHb is unimodal and skewed the right.the median y~= 3.5 is a better measure of central tendency compared to mean x bar = 3.522 as the median is more resistant to the right pull of the outlier compared to the mean.
2. A plant physiologist investigated the effect of light on the growth of soybean plants. 13 different types of soybean seedlings were randomly allocated to two treatments: low light and moderate light. After 16 days of growth, plants were harvested, and the total leaf area cm2 of each plant was measured. In the space below, create a scatterplot of the data. Try to include axes labels on each of the axes. If you can, overlay a regression line on your scatterplot.
generate scatterplot with low light on x axis and moderate light on y axis. Add a linear regression line
lowlight <- c(264,200,225,268,215,241,232,256,229,288,253,288,230)
mediumlight <- c(314,320,310,340,299,268,354,271,285,309,337,282,273)
#generate histograms to assess the distribution
hist(lowlight)
hist(mediumlight)
plot(x=lowlight, y=mediumlight, main="scatterplot soybean leaf growth(cm2) lowlight vs moderate at 16 days ", xlab="lowlight", ylab="mediumlight")
abline(lm(mediumlight~lowlight), col="red") # regression line (y~x)
calculate correlation coefficient using cor fx(independent, dependent)
cor(lowlight,mediumlight)
## [1] -0.02651767
r coefficient is negative and suggests an indirect relationship.
3. The following histogram shows the same data that are shown in one of the four boxplots. Which boxplot (a, b, c, or d) goes with the histogram? Explain your answer.
ANS: boxplot a) appears to match the histogram distribution most closely. the range is the same at min = 25 and max = 65, the fulcrum point of the histogram is approximately around bin 35-40. The median for boxplot a) is 35. therefore boxplot a) most closely matches the histogram.