library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ stringr 1.4.0
## ✓ tidyr   1.1.3     ✓ forcats 0.5.1
## ✓ readr   1.4.0
## Warning: package 'tibble' was built under R version 4.1.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

1. The carbon monoxide in cigarettes is thought to be hazardous to the fetus of a pregnant woman who smokes. In a study of this hypothesis, blood was drawn from pregnant women before and after smoking a cigarette. Measurements were made of the percent increase of blood hemoglobin bound to carbon monoxide (COHb). The results for 26 women are:

a <- c(6.4, 2.6, 3.5, 2.9, 3.9, 2.2, 5.5, 4.4, 3.5, 3.2, 2.8, 2.4, 3.5, 3.3, 3.7, 2.6, 3.5, 4.5, 4.2, 2.9, 3.1, 3.3, 4.3, 2.6, 4.1, 3.7)

NROW(a)
## [1] 26
summary(a)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.200   2.900   3.500   3.562   4.050   6.400
sd(a)
## [1] 0.9533423
4.050-2.900
## [1] 1.15

a. Find the mean, median, sample standard deviation, and IQR.

Mean \(\overline{y}\) = 3.564
Median = 3.5
Sample N = 26
Standard Deviation s = 0.9533423
IQR = 1.15

b, Create a boxplot of these observations.

boxplot(a)

c. Create a histogram of these observations.

hist(a)

2. A plant physiologist investigated the effect of light on the growth of soybean plants. 13 different types of soybean seedlings were randomly allocated to two treatments: low light and moderate light. After 16 days of growth, plants were harvested, and the total leaf area of each plant was measured.

  Low Light Moderate Light
1 264 314
2 200 320
3 225 310
4 268 340
5 215 299
6 241 268
7 232 345
8 256 271
9 229 285
10 288 309
11 253 337
12 288 282
13 230 273

In the space below, create a scatterplot of the data. Try to include axes labels on each of the axes. If you can, overlay a regression line on your scatterplot.

low <- c(264, 200, 225, 268, 215, 241, 232, 256, 229, 288, 253, 288, 230)
moderate <- c(314, 320, 310, 340, 299, 268, 345, 271, 285, 309, 337, 282, 273)

soybean <- data.frame(low, moderate)
plot(low, moderate, main="Soybeen Plant 16 Day Growth \nLow & Moderate Light")
abline(lm(moderate ~ low))

The following histogram shows the same data that are shown in one of the four boxplots. Which boxplot (a, b, c, or d) goes with the histogram? Explain your answer.

C is the answer. The histogram shows a range of approximately 25 to 65. 95% of the observations are roughly between 30 and 55 with an interval length of 25 (55-30). Using the estimate of s formula, s is 6.25 (25/4). This gives a +- 3s, or +- 18.75, from the mean to cover >99% of the occurrences. This boxplot that best fits this description due to its larger IQR is boxplot C.

(55-30)/4
## [1] 6.25