library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Question 1

Using the class’ Ciliate population data, produce a scatterplot of population (y-axis) vs. week (x-axis) (do not alter the color of the points based on the ciliate species - leave them grouped together for this activity). Describe the direction (positive or negative), strength (weak, moderate, strong) and type (linear, quadratic, log, exponential, etc.) of the relationship you observe between population age and population size. Hint: You may need to change the data frame format and tell Program R to treat your week variable as numeric.

ciliategrowth <- read.csv("CiliateGrowth.csv")
str(ciliategrowth)
## 'data.frame':    48 obs. of  4 variables:
##  $ Group     : int  1 1 1 1 1 1 2 2 2 2 ...
##  $ Week      : int  1 2 3 4 5 6 1 2 3 4 ...
##  $ Ciliate   : chr  "Paramecium" "Paramecium" "Paramecium" "Paramecium" ...
##  $ Population: num  0.0667 0.6667 7.6667 14 18.4 ...
ciliategrowth %>% 
  ggplot(aes(x = Week,
             y = Population,
             color = Ciliate)) +
  geom_point() +
  labs(x = "Time(Weeks)",
       y = "Average Population (Individuals/Drop)",
       color = "Ciliate",
       caption = "Population mean of different ciliate populations over six weeks") +
  scale_color_manual(values=c("black", "black")) +
  theme_classic()

Answer:

Question 2

Determine the correlation between population age and population size and interpret this number.

ciliategrowth %>% with(cor.test(Week, Population))
## 
##  Pearson's product-moment correlation
## 
## data:  Week and Population
## t = 9.0364, df = 46, p-value = 9.219e-12
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.6672931 0.8832140
## sample estimates:
##       cor 
## 0.7997873

Answer: The correlation we got was 0.79978. This means that there is a moderately strong positive correlation between population age and population size.


Question 3

Fit a simple linear regression model to the population age (x) and population size (y) data. Provide the equation of the least squares regression line predicting population age and population size using good statistical notation.

lm(Population ~ Week, data=ciliategrowth) %>% summary()
## 
## Call:
## lm(formula = Population ~ Week, data = ciliategrowth)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.1085  -2.7644  -0.3963   2.2235  21.4370 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -6.9524     2.0739  -3.352  0.00161 ** 
## Week          4.8122     0.5325   9.036 9.22e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.301 on 46 degrees of freedom
## Multiple R-squared:  0.6397, Adjusted R-squared:  0.6318 
## F-statistic: 81.66 on 1 and 46 DF,  p-value: 9.219e-12

Answer: y = 4.8122x - 6.9524 *** #### Question 4 Interpret the slope and intercept of your least squares regression line from question #3.

Answer:

***The Y-intercept of this equation means that there are -6.9524 ciliates per drop before any time has passed. The slope is showing that there is a 4.8122 ciliates per week. #### Question 5 Is there a statistically significant linear association between population age and population size? Explain, citing an appropriate test statistic and p-value from your analysis to support your answer.

Answer: ***Based on the P-Value, there is a moderately strong positive correlation between population age and population size(P-Value = 9.219e-12, R^2 = 0.6318). However, as our graph shows exponential growth, this R^2 value would continue to decrease and wouldn’t fit our data with a strong correlation. As population age increases, the population size also exponentially increased.

Question 6

Use the least-squares regression equation to predict the population size at 3 weeks. Show your work.

Answer:

*** y = 4.8122x - 6.9524 y = 4.8122(3) - 6.9524 y = 14.4266 - 6.9524 y = 7.4846 #### Question 7 Use the least-squares regression equation to predict the population size at 10 weeks. What are you assuming about the relationship between population age and population size when using the least-squares regression line to extrapolate to this population size?

Answer:

*** y = 4.8122x - 6.9524 y = 4.8122(10) - 6.9524 y = 48.122 - 6.9524 y = 41.1696 #### Question 8A Make a publication quality figure of your lab 2 data (PVC in two locations), with figure caption.

Weektwo <- read.csv("Week2.csv")
str(Weektwo)
## 'data.frame':    48 obs. of  2 variables:
##  $ Type: chr  "Grassland" "Grassland" "Grassland" "Grassland" ...
##  $ PVC : int  95 85 60 90 93 80 63 85 97 90 ...
Weektwo %>% 
  ggplot(aes(x = Type,
             y = PVC)) +
  geom_boxplot() +
  theme_classic() +
  labs(x = "Type of Environment",
       y = "Percent Vegetation Cover",
       caption = "The mean Percent Vegetation Coverage of a grassland environment compared to a riparian environment through random quadrant sampling (IQR bars are +1.5)") 


Question 8B

Perform an appropriate statistical test on your lab 2 data (PVC in two locations) and write a properly formatted results statement.

t.test(PVC ~ Type, data=Weektwo)
## 
##  Welch Two Sample t-test
## 
## data:  PVC by Type
## t = 0.068653, df = 42.746, p-value = 0.9456
## alternative hypothesis: true difference in means between group Grassland and group Riparian is not equal to 0
## 95 percent confidence interval:
##  -11.90776  12.74692
## sample estimates:
## mean in group Grassland  mean in group Riparian 
##                81.72727                81.30769

Results Statement: There was no statistical significance in the difference between the Percent Vegetation Coverage of the Riparian area and the Grassland area, (t = 0.068653, df = 42.746, p-value = 0.9456). *** #### Question 9A Make a publication quality figure of your lab 4 data (H_Prime in two locations), with figure caption.

HValues <- read.csv("HVALUES.csv")
str(HValues)
## 'data.frame':    41 obs. of  2 variables:
##  $ X        : chr  "Rocky Outcrop" "Rocky Outcrop" "Rocky Outcrop" "Rocky Outcrop" ...
##  $ H..Values: num  0.185 0.18 0.35 0.355 0.366 ...
HValues %>% ggplot(aes(x = X,
                       y = H..Values)) +
  geom_boxplot() +
  theme_classic() +
  labs(x = "Type of Environment",
       y = "H`Values",
       caption = "Shannon Diversity Index values of rocky outcrop versus riparian area (IQR bars are +1.5)") 


Question 9B

Perform an appropriate statistical test on your lab 4 data (H_Prime in two locations) and write a properly formatted results statement.

t.test(H..Values ~ X, data = HValues)
## 
##  Welch Two Sample t-test
## 
## data:  H..Values by X
## t = 2.071, df = 27.121, p-value = 0.048
## alternative hypothesis: true difference in means between group Riparian and group Rocky Outcrop is not equal to 0
## 95 percent confidence interval:
##  0.007477928 1.576099063
## sample estimates:
##      mean in group Riparian mean in group Rocky Outcrop 
##                   1.4588680                   0.6670795

Results Statement: The Riparian area had a statistically higher diversity on the scale of the Shannon Index when compared to the Rocky Outcrop area,(t = 2.071, df = 27.121, p-value = 0.048). ***