Mode = 12-13mm
550/1017
I would recommend decreasing the intervals in the histogram between 14 and 17 mm so that the possible trend could be easier to explore.
Bimodal frequency distribution
The two variables displayed are both numerical and continuous.
This is a scatter plot.
The relationship between flicker fusion frequency and temperature is nonlinear and positive.
The 20 measurements cannot be treated as a random sample because more than one measurement was taken for some of the fish, not all of them. In addition, measurements that were taken from the same fish were not independent.
This is a histogram.
The distribution is skewed left.
This distribution is not bimodal. The mode is between the ages of 80 and 85.
Toxo_data <- read_csv(here::here("Data", "chapter02", "chap02q32ToxoplasmaAccidents.csv"))
## Rows: 308 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): driverType, infectionStatus
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Toxo_data)
## # A tibble: 6 × 2
## driverType infectionStatus
## <chr> <chr>
## 1 accidents infected
## 2 accidents infected
## 3 accidents infected
## 4 accidents infected
## 5 accidents infected
## 6 accidents infected
str(Toxo_data)
## spec_tbl_df [308 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ driverType : chr [1:308] "accidents" "accidents" "accidents" "accidents" ...
## $ infectionStatus: chr [1:308] "infected" "infected" "infected" "infected" ...
## - attr(*, "spec")=
## .. cols(
## .. driverType = col_character(),
## .. infectionStatus = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
Toxo_table <- table(Toxo_data$driverType, Toxo_data$infectionStatus)
mosaicplot(t(Toxo_table),
xlab = "Driver Type",
ylab = "Infection Status",
main = "",
col = c("forestgreen", "goldenrod1"))
This is a two-way contingency table.
Driver type and infection status are being compared. There is no explanatory and response variable regarding this data table; one does not seem to have an effect on the other.
There is a possible association in this data set because the mosaic plot does not contain a “plus sign” figure.
ADHD_data <- read_csv(here::here("Data", "chapter02", "chap02q33BirthMonthADHD.csv"))
## Rows: 4 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): birthMonth, diagnosis
## dbl (1): frequencies
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(ADHD_data)
## # A tibble: 4 × 3
## birthMonth diagnosis frequencies
## <chr> <chr> <dbl>
## 1 January ADHD 2219
## 2 January no ADHD 36917
## 3 December ADHD 2870
## 4 December no ADHD 36107
str(ADHD_data)
## spec_tbl_df [4 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ birthMonth : chr [1:4] "January" "January" "December" "December"
## $ diagnosis : chr [1:4] "ADHD" "no ADHD" "ADHD" "no ADHD"
## $ frequencies: num [1:4] 2219 36917 2870 36107
## - attr(*, "spec")=
## .. cols(
## .. birthMonth = col_character(),
## .. diagnosis = col_character(),
## .. frequencies = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
ADHD_matrix <- matrix(ADHD_data$frequencies,
byrow = FALSE,
ncol = 2,
dimnames = list(c("Diagnosed ADHD", "Not diagnosed"),
c("Jan", "Dec")))
barplot(ADHD_matrix, beside= TRUE, xlab= "birth month", ylab = "frequency", legend.text = rownames(ADHD_matrix))
There is an association; the calculated value of x^2 is greater than the critical value.
This is a violin plot.
Group A
Group B
Group C
FRL_data <- read_csv(here::here("Data", "chapter02", "chap02q35FoodReductionLifespan.csv"))
## Rows: 34 Columns: 3
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): sex, foodTreatment
## dbl (1): lifespan
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(FRL_data)
## # A tibble: 6 × 3
## sex foodTreatment lifespan
## <chr> <chr> <dbl>
## 1 female reduced 16.5
## 2 female reduced 18.9
## 3 female reduced 22.6
## 4 female reduced 27.8
## 5 female reduced 30.2
## 6 female reduced 30.7
str(FRL_data)
## spec_tbl_df [34 × 3] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ sex : chr [1:34] "female" "female" "female" "female" ...
## $ foodTreatment: chr [1:34] "reduced" "reduced" "reduced" "reduced" ...
## $ lifespan : num [1:34] 16.5 18.9 22.6 27.8 30.2 30.7 35.9 23.7 24.5 24.7 ...
## - attr(*, "spec")=
## .. cols(
## .. sex = col_character(),
## .. foodTreatment = col_character(),
## .. lifespan = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
#FRL_matrix <- matrix(FRL_data$lifespan,
# byrow = FALSE,
# ncol = 2,
# dimnames = list(c("Female", "Male"),
# c("control", "reduced"))
stripchart(data = FRL_data, lifespan~sex*foodTreatment, vertical = TRUE)
(b) According to your graph, which difference in life span is greater:
that between the sexes or that between diet groups?
The difference in life span between diet groups is greater.
This is a scatter plot.
Line plot
122 years old.
Vaso_data <- read_csv(here::here("Data", "chapter03", "chap03q15VasopressinVoles.csv"))
## Rows: 31 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): treatment
## dbl (1): percent
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Vaso_data)
## # A tibble: 6 × 2
## treatment percent
## <chr> <dbl>
## 1 control 98
## 2 control 96
## 3 control 94
## 4 control 88
## 5 control 86
## 6 control 82
str(Vaso_data)
## spec_tbl_df [31 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ treatment: chr [1:31] "control" "control" "control" "control" ...
## $ percent : num [1:31] 98 96 94 88 86 82 77 74 70 60 ...
## - attr(*, "spec")=
## .. cols(
## .. treatment = col_character(),
## .. percent = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
boxplot(percent ~ treatment, data = Vaso_data)
Display these data in a graph. Explain your choice of graph.
I chose to represent the data in a box plot because there is one categorical variable (Control or Enhanced) and one numerical variable (Percent) represented by the data, and having two box plots plotted next to each other was easy to compare the data.
Which group has the higher mean percentage of time spent huddling with females?
The Enhanced group has the higher mean percentage (mean ~ 85%) compared to the Control group (mean ~ 60%).
Which group has the higher standard deviation in percentage of time spent huddling with females?
The Control group has the higher standard deviation (s ~ 23) compared to the Enhanced group (s ~ 10).
Diet_data <- read_csv(here::here("Data", "chapter03", "chap03q16DietBreadthElVerde.csv"))
## Rows: 127 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (1): breadth
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Diet_data)
## # A tibble: 6 × 1
## breadth
## <dbl>
## 1 1
## 2 1
## 3 1
## 4 1
## 5 1
## 6 1
str(Diet_data)
## spec_tbl_df [127 × 1] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ breadth: num [1:127] 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "spec")=
## .. cols(
## .. breadth = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
The median is 8.
First quartile = 3 Third quartile = 17 Interquartile range = 14
You cannot calculate the mean number of prey types in the diet becasue there is no way to know how many prey types are represented by the diet breadth of more than 20.
This is a histogram.
Approx. 1000 yards per minute. The histogram creates a bell curve with relatively even distribution, and 1000 yards per minute is the value that represents the bell curve.
The median is at approximately 200 yards per minute. This is the value at the middle of the y valuesin the data set.
1100 yards per minute is the value with the highest frequency, thus being the approximate value of the mode.
250 yards per minute is the approximate standard deviation because the total speed (1500 yards/minute) divided by all of the samples (6) is 250.
Yeast_data <- read_csv(here::here("Data", "chapter03", "chap03q22YeastMutantGrowth.csv"))
## Rows: 11 Columns: 1
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (1): mutantGrowthRate
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Yeast_data)
## # A tibble: 6 × 1
## mutantGrowthRate
## <dbl>
## 1 0.86
## 2 1.02
## 3 1.02
## 4 1.01
## 5 1.02
## 6 1
str(Yeast_data)
## spec_tbl_df [11 × 1] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ mutantGrowthRate: num [1:11] 0.86 1.02 1.02 1.01 1.02 1 0.99 1.01 0.91 0.83 ...
## - attr(*, "spec")=
## .. cols(
## .. mutantGrowthRate = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
0.971
Two digits after the decimal should be used because that is how many places after the decimal we are given from the data set, so it needs to be mirrored.
1.01
0.00488909
0.06992203
Zebra_data <- read_csv(here::here("Data", "chapter03", "chap03q23ZebraFishBoldness.csv"))
## Rows: 21 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): genotype
## dbl (1): secondsAggressiveActivity
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(Zebra_data)
## # A tibble: 6 × 2
## genotype secondsAggressiveActivity
## <chr> <dbl>
## 1 wild type 0
## 2 wild type 21
## 3 wild type 22
## 4 wild type 28
## 5 wild type 60
## 6 wild type 80
str(Zebra_data)
## spec_tbl_df [21 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ genotype : chr [1:21] "wild type" "wild type" "wild type" "wild type" ...
## $ secondsAggressiveActivity: num [1:21] 0 21 22 28 60 80 99 101 106 129 ...
## - attr(*, "spec")=
## .. cols(
## .. genotype = col_character(),
## .. secondsAggressiveActivity = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
boxplot(secondsAggressiveActivity ~ genotype, data = Zebra_data)
The Spd genotype has the higher aggression scores.
The wild type genotype has the higher range of values for aggression scores.
The wild type sample has the larger interquartile range.
The vertical lines represent the maximum and minimum values for each box plot.
This is a cumulative frequency distribution graph.
Females have the earliest median emergence date; They reach their middle range of cumulative relative frequency values before males reach theirs.
The female sex has the greater interquartile range; based off of estimates of the cumulative relative frequencies (.75 - .25 for each set), the females’ interquartile range was approximately 9, while male interquartile range was about 6.
SeaUrchin_data <- read_csv(here::here("Data", "chapter03", "chap03q28SeaUrchinBindin.csv"))
## Rows: 19 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): populationOfFemale
## dbl (1): percentAAfertilization
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(SeaUrchin_data)
## # A tibble: 6 × 2
## populationOfFemale percentAAfertilization
## <chr> <dbl>
## 1 AA 0.58
## 2 AA 0.59
## 3 AA 0.69
## 4 AA 0.72
## 5 AA 0.78
## 6 AA 0.78
str(SeaUrchin_data)
## spec_tbl_df [19 × 2] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ populationOfFemale : chr [1:19] "AA" "AA" "AA" "AA" ...
## $ percentAAfertilization: num [1:19] 0.58 0.59 0.69 0.72 0.78 0.78 0.81 0.85 0.85 0.92 ...
## - attr(*, "spec")=
## .. cols(
## .. populationOfFemale = col_character(),
## .. percentAAfertilization = col_double()
## .. )
## - attr(*, "problems")=<externalptr>
stripchart(percentAAfertilization ~ populationOfFemale, data = SeaUrchin_data, vertical = TRUE, method = "jitter")
There is an association in these data.
Using the median would be best to compare the locations of the frequency distributions because there is an outlier in the data. The median of the AA sperm group is 0.795, which is much higher than the median of the BB sperm group is 0.37. The outlier in the data does not affect the median to chnage this comparison.
Standard deviation would be the best method to compare the spread of the frequency distributions because it is a good partner with the mean, and outliers affect the value of the standard deviation. The standard deviation for the AA sperm group is 0.1239, which is lower than the BB sperm group standard deviation of 0.2639. This could be due to the fact that the AA group had a bigger sample size, minimizing deviation/sample error.