Download chickens.csv to your working directory. Make sure to set your working directory appropriately! This dataset was created by modifying the R built-in dataset chickwts.
Import the chickens.csv data into R. Store it in a data.frame named ch_df and print out the entire ch_df to the screen.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ch_df <- read.csv("chickens.csv")
There are some missing values in this dataset. Unfortunately they are represented in a number of different ways.
sum(is.na(ch_df))
## [1] 7
ch_df[is.na(ch_df)] <- NA
Now that the dataset is clean, let’s see what percentage of our data is missing.
((sum(is.na(ch_df$weight))/(length(ch_df$weight))*100))
## [1] 5.633803
((sum(is.na(ch_df$feed))/(length(ch_df$feed))*100))
## [1] 4.225352
((sum(is.na(ch_df))/(length(ch_df$weight)+length(ch_df$feed))*100))
## [1] 4.929577
EXTRA CREDIT (Optional): Figure out how to create these print statements so that the name and percentage number are not hard-coded into the statement. In other words, so that the name and percentage number are read in dynamically (for example, from a variable, from a function call, etc.) instead of just written in the statement. Please ask me for clarification if necessary.
# fill in your code here
str(ch_df)
## 'data.frame': 71 obs. of 2 variables:
## $ weight: chr "206" "140" NA "318" ...
## $ feed : chr "meatmeal" "horsebean" NA "sunflower" ...
ch_df$weight <- as.character(ch_df$weight)
ch_df$weight <- as.numeric(ch_df$weight)
## Warning: NAs introduced by coercion
ch_df2 <- ch_df %>%
group_by(feed) %>%
summarise(weight_mean=mean(weight, na.rm = TRUE), weight_median=median(weight, na.rm = TRUE))
feed_median_weights <- ch_df %>%
group_by(feed) %>%
summarize(median_weight = median(weight, na.rm = TRUE))
max_median_feed <- feed_median_weights %>%
filter(median_weight == max(median_weight))
max_median_feed
## # A tibble: 1 × 2
## feed median_weight
## <chr> <dbl>
## 1 <NA> 360
hist(ch_df$weight)
boxplot(ch_df$weight ~ ch_df$feed)
# fill in your code here
# fill in your code here