Markdown Author: Jessie Bell, 2023

Libraries Used: none

Answers: green

caffeineData <- read.csv("caffeine.csv")

level_1  <-  caffeineData$exer[caffeineData$caff == 'yes']
level_2  <-  caffeineData$exer[caffeineData$caff == 'no']

If you are not able to get the data to read into your environment, double check your working directory under Session.

—————————————————————————————————————————-

1. Create a histogram

The histogram we created below is skewed to the right (positive).

hist(level_1, col="#c77cff")

—————————————————————————————————————————-

2. Create a stripchart

a <- caffeineData$caff #this pulls out the caffeine column in the caffeineData
b <- caffeineData$exer #this pulls out the exercise column in the caffeineData
stripchart(b~a, vertical=T, col="#00bfc4") #you can also do this the way that is shown in the lab handout.

—————————————————————————————————————————-

3. Create a boxplot

boxplot(b ~ a, vertical=T, col="#e68613")

—————————————————————————————————————————-

4. Create a dot and whisker plot.

The code that you were given to complete this task did not work. Graph 4 will not count against you.

# the code for this given in the lab did not work. This problem will not be counted against you. 

boxplot(caffeineData$exer ~ caffeineData$caff, col="#ff68a1")

—————————————————————————————————————————-

5. Create a barplot

The code that you were given to complete this task did not work. Graph 5 will not count against you. Some of you chose to use your island data to create bars though, and there is an example of how to do it using tapply() function below.

—————————————————————————————————————————-

Jessie practicing tapply() function, NOT GRADED

islandData <- read.csv("island.csv")
algalDensity <- islandData$AlgalDensity
face <- islandData$Face
island <- islandData$Island

tapply(algalDensity, face, FUN = "mean") #This is actually increadibly useful! It allows you to create a quick table where you call out the column in the data you would like to summarize, then you call out how to separate/categorize the data, then you tell FUN what function you would like to apply to the data

##    North    South     West 
## 47.43733 42.11800 42.23533

tapply(algalDensity, face, FUN="sd") #tells you the SD really quick.

##    North    South     West 
## 4.288589 8.727338 8.173183

barplot(tapply(algalDensity, island, FUN = "mean"), col="#8494ff")

—————————————————————————–

Read in the finch.csv file

finchData <- read.csv("kenyanfinches.csv")

—————————————————————————————————————————-

6. Data Types

6a & 6b. What are the columns named? What type of data are these?

a. See data by using head() or other function.

b. **The columns are named species is categorical and nominal data, mass is numerical continuous data, and beaklength is also numerical continuous data.**

head(finchData)#tells you headers

##    species mass beaklength
## 1 WB.SPARW   40       10.6
## 2 WB.SPARW   43       10.8
## 3 WB.SPARW   37       10.9
## 4 WB.SPARW   38       11.3
## 5 WB.SPARW   43       10.9
## 6 WB.SPARW   33       10.1

summary(finchData) #tells you data type, species is listed as character data. This is categorical, mass and beaklength are numerical.

##    species               mass         beaklength    
##  Length:45          Min.   : 6.00   Min.   : 6.700  
##  Class :character   1st Qu.: 8.00   1st Qu.: 7.600  
##  Mode  :character   Median :16.00   Median : 8.000  
##                     Mean   :20.44   Mean   : 8.744  
##                     3rd Qu.:36.00   3rd Qu.:10.700  
##                     Max.   :43.00   Max.   :11.400

—————————————————————————————————————————-

7. Separate

7a & 7b. What is each species summary statistic (mean, sd, and cv)?

I don’t know why these were separated into a and b, both a and b are asking for the same thing in my opinion. Please note that there is a long way to do this, and a short way. They are both done and labelled in the code below. There are most likely other ways to complete this task too.

#you can do this in many ways. I will try to show you two ways. 

whitebrow <- subset(finchData, species=="WB.SPARW")
crimson <- subset(finchData, species=="CRU.WAXB")
cutthroat <- subset(finchData, species=="CUTTHROA")


#THE LONG WAY TO DO SUMMARY STATS
whitebrow_mean_mass <- mean(whitebrow$mass, na.rm = F)
whitebrow_mean_beak <- mean(whitebrow$beaklength, na.rm = F)
whitebrow_sd_mass <- sd(whitebrow$mass, na.rm = F)
whitebrow_sd_beak <- sd(whitebrow$beaklength, na.rm = F)

crimson_mean_mass <- mean(crimson$mass, na.rm = F)
crimson_mean_beak <- mean(crimson$beaklength, na.rm = F)
crimson_sd_mass <- sd(crimson$mass, na.rm = F)
crimson_sd_beak <- sd(crimson$beaklength, na.rm = F)

cutthroat_mean_mass <- mean(cutthroat$mass, na.rm = F)
cutthroat_mean_beak <- mean(cutthroat$beaklength, na.rm = F)
cutthroat_sd_mass <- sd(cutthroat$mass, na.rm = F)
cutthroat_sd_beak <- sd(cutthroat$beaklength, na.rm = F)

#THE SHORTER WAY
mean_mass <- tapply(finchData$mass, finchData$species, FUN="mean")
sd_mass <- tapply(finchData$mass, finchData$species, FUN="sd")

cv_mass <- sd_mass/mean_mass #this is the coefficient of variation for finch mass

summarystatstable <- cbind(mean_mass, sd_mass, cv_mass) #bind your tapply data together from above using cbind() function
summarystatstable #print out your table.

##          mean_mass   sd_mass    cv_mass
## CRU.WAXB  7.529412 0.6242643 0.08291010
## CUTTHROA 15.416667 1.2401124 0.08043972
## WB.SPARW 37.937500 3.1084562 0.08193624

Summary stats for beak length not provided, but this could be done the same way.

—————————————————————————————————————————-

8. Histograms

8a & 8b. Describe the distributions above. Are there outliers?

Whitebrow finch mass has a right skew (toward positive). Outliers are 1.5 x IQR, and they are hard to see unless you run a boxplot or calculate IQR. We made boxplots in problem 9 and you can see how to calculate outliers below, no outliers shown or calculated.

hist(whitebrow$mass, breaks = "sturges", main = "Distribution of whitebrow finch mass", xlab = "mass", col="#00a9ff")

Crimson finch mass has a left skew (toward negative). Outliers not present.

hist(crimson$mass, main = "Distribution of crimson finch mass", xlab = "mass", col="#f8766d")

Cutthroat finch mass has a bit of a left skew again. That left skew is actually an outlier.

hist(cutthroat$mass, main = "Distribution of cutthroat finch mass", xlab = "mass", col="#aba300")

Please note: IQR can be found by determining the median of the data. Then find the median of both sides of that median. This is your Interquartile Range. Outliers are determined by mutiplying IQR by 1.5. See calculation below.

IQR_whitebrow <- IQR(whitebrow$mass) * 1.5 
# the range is on both sides of the median. So the upper range is 
median(whitebrow$mass)+IQR_whitebrow #are there datapoints outside of this range? If yes ,its an outlier. If no, no outliers.

## [1] 43.375

# now lets look at teh lower quartile
median(whitebrow$mass)-IQR_whitebrow #are there any data points for whitebrow mass that lay outside of this range? if so, its an outlier!

## [1] 30.625

IQR_cutthroat <- IQR(cutthroat$mass)*1.5

median(cutthroat$mass)+IQR_cutthroat

## [1] 17.5

median(cutthroat$mass)-IQR_cutthroat

## [1] 14.5

# there are 3 values of 7 in the mass column for cutthroat finches. These are all outliers.

—————————————————————————————————————————-

9. Boxplots

Descriptions of the data should include mean or median, standard deviation or IQR, and outliers if they are present.

boxplot(whitebrow$mass, col="#00a9ff")

boxplot(crimson$mass, col="#f8766d")

boxplot(cutthroat$mass, col="#aba300")

Lab 03: Data Visualization and Summarization

—————————————————————————————————————————-

1. Create a histogram

The histogram we created below is skewed to the right (positive).

—————————————————————————————————————————-

2. Create a stripchart

—————————————————————————————————————————-

3. Create a boxplot

—————————————————————————————————————————-

4. Create a dot and whisker plot.

The code that you were given to complete this task did not work. Graph 4 will not count against you.

—————————————————————————————————————————-

5. Create a barplot

The code that you were given to complete this task did not work. Graph 5 will not count against you. Some of you chose to use your island data to create bars though, and there is an example of how to do it using tapply() function below.

—————————————————————————————————————————-

Jessie practicing tapply() function, NOT GRADED

—————————————————————————–

Read in the finch.csv file

—————————————————————————————————————————-

6. Data Types

6a & 6b. What are the columns named? What type of data are these?

a. See data by using head() or other function.

b. The columns are named species is categorical and nominal data, mass is numerical continuous data, and beaklength is also numerical continuous data.

—————————————————————————————————————————-

7. Separate

7a & 7b. What is each species summary statistic (mean, sd, and cv)?

I don’t know why these were separated into a and b, both a and b are asking for the same thing in my opinion. Please note that there is a long way to do this, and a short way. They are both done and labelled in the code below. There are most likely other ways to complete this task too.

Summary stats for beak length not provided, but this could be done the same way.

—————————————————————————————————————————-

8. Histograms

8a & 8b. Describe the distributions above. Are there outliers?

Whitebrow finch mass has a right skew (toward positive). Outliers are 1.5 x IQR, and they are hard to see unless you run a boxplot or calculate IQR. We made boxplots in problem 9 and you can see how to calculate outliers below, no outliers shown or calculated.

Crimson finch mass has a left skew (toward negative). Outliers not present.

Cutthroat finch mass has a bit of a left skew again. That left skew is actually an outlier.

Please note: IQR can be found by determining the median of the data. Then find the median of both sides of that median. This is your Interquartile Range. Outliers are determined by mutiplying IQR by 1.5. See calculation below.

—————————————————————————————————————————-

9. Boxplots

Descriptions of the data should include mean or median, standard deviation or IQR, and outliers if they are present.

b. **The columns are named species is categorical and nominal data, mass is numerical continuous data, and beaklength is also numerical continuous data.**