library(tidyverse)
library(scales)
library(lemon)
library(plotly)
library(lubridate)
library(dplyr)
library (forcats)
library (gapminder)
state_milk_production <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/state_milk_production.csv")
view(state_milk_production)
milkcow_facts <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/milkcow_facts.csv")
view(milkcow_facts)
milk_products_facts <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/milk_products_facts.csv")
view(milk_products_facts)
fluid_milk_sales <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/fluid_milk_sales.csv")
view(fluid_milk_sales)
clean_cheese <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-29/clean_cheese.csv")
view(clean_cheese)
production_graph<- function(x)
{
ggplot(data=state_milk_production)+
aes(x=state_milk_production$year, y=state_milk_production$milk_produced, color=region)+
geom_point()+
scale_y_continuous(labels=dollar_format())+
labs (x="Years", y="Milk Production", color="Regions", title="Milk Production By Region Over The Years")
}
production_graph()
A function was created to automate the above graph. The results show that the Pacific region has the largest increased in milk production over the latest year; however, this was not always the case. The graph shows that before the 90s Lake States had the highest milk production.
state_milk_production %>%
ggplot() +
aes(x = state_milk_production$region, y = state_milk_production$milk_produced, fill =region ) + geom_boxplot(show.legend = FALSE) +
scale_y_continuous(labels = dollar_format()) +
labs(x="Regions", y="Milk Production", title="Distrobution of Milk Production by Region")+
coord_flip()
The visualization above depicts the variance in milk production overtime by Region. With this graph, we can visualize how milk production has varied overtime. For example, we can see that the pacific region has varied a lot in its production. In fact, it has varies so much that a significant amount of production years are considered outliers. The delta states on the other hand, have varied very little in there production. They produced a relatively equal amount over the years.
state_milk_production %>%
filter (region=="Pacific") %>%
ggplot()+
aes(x=year, y=milk_produced, fill=state)+
geom_bar(stat="identity")+
scale_y_continuous(labels=dollar_format())+
labs (x="Years", y="Milk Production ($)", title="Milk Production By State In The Pacific Region", fill= "States")
The above graph shows that California is responsible for the overwhelming majority of the milk produced within the pacific region. It also shows that there has been little to no milk production in Alaska or Hawaii in the same time period.The following graph gives a better picture of the production of milk by state.
Pacific_Milk <- state_milk_production %>%
filter (region=="Pacific") %>%
ggplot()+
aes(x=year, y=milk_produced, fill=state)+
geom_histogram(stat="identity", bins=30)+
theme(axis.text.x = element_text(face = "bold", angle = 90)) +
scale_y_continuous(labels=dollar_format())+
facet_rep_wrap(~state, repeat.tick.labels = TRUE)+
labs (x="Years", y="Milk Production ($)", fill= "States", title="Milk Production By State")
## Warning: Ignoring unknown parameters: binwidth, bins, pad
ggplotly(Pacific_Milk)
These visualizations depict Washington and Oregon as second and third highest producers of milk in the Pacific region. The graphs also show the minuscule amount of milk produced by Hawaii and Alaska. Another observation is that the milk production in both Oregon and Washington has nearly tripled since 1970.
state_milk_production %>%
ggplot(aes(x=milk_produced, y = state, fill=state))+
geom_boxplot(show.legend = FALSE)+
labs (x="Milk Production", y="States", title= "Variety in Milk Production by State")
The box Plot above helps us visualize the change in production experienced by the individual states. Although California produces the most milk, it has also experienced the largest change in production overtime. Taking a look at our “Milk Production By State” visualization, we can see that this variance can be explained by California quadrupling its milk production since 1970.
Note: “Total production” was excluded from the data since this is not consider a type of Milk.
fluid_milk_sales %>%
group_by(milk_type) %>%
mutate(milk_type= as.factor(milk_type)) %>%
filter(milk_type !="Total Production")%>%
summarize(pounds) %>%
ggplot(aes(x = reorder(milk_type, pounds), y =pounds, fill = milk_type)) +
geom_bar(stat = "identity", show.legend = FALSE) +
labs (y="Production of Milk (pounds)", x="Types of Milk", title="Amount of Milk Produced by Milk Type")+
coord_flip()
## `summarise()` has grouped output by 'milk_type'. You can override using the `.groups` argument.
The graph above shows the amount of milk produced in pounds by type of milk and shows whole and reduced fat(2%) as the most produced types of milk. This observation supports the assumption that consumers have a preferences for these two types of milk.
Note: “Total production” was excluded from the data since this is not consider a type of Milk.
Milk_Type_Ly <- fluid_milk_sales %>%
filter(milk_type !="Total Production")%>%
ggplot(aes(x=year, y= pounds, color=milk_type))+
geom_line()+
labs (x="Years", y="Pounds", title= "Pounds Of Milk Produced per Year", color= "Milk Type")
ggplotly(Milk_Type_Ly)
This graph illustrates the amount of milk produced over time by milk type. Production of Reduced Fat, skim, low fat, and flavored (Not Whole) milk has increased. Whole milk production has decreased simultaneously. This insight coupled with prior knowledge from the literature stating that milk production is on the rise, indicates a shift in consumer trends towards a healthier low fat diet.
milkcow_facts %>%
ggplot()+
aes(x=milkcow_facts$milk_production_lbs, y= milkcow_facts$avg_price_milk)+
geom_point()+
geom_smooth(se = FALSE, method = "lm") +
labs(x="Milk Production", y="Price of Milk", title="Relationship between Milk Production and Milk Price")
## `geom_smooth()` using formula 'y ~ x'
The scatter plot above tells an interesting story. Unlike usual supply and demand relationships where the price decreases when production (and supply) is high, in the dairy business milk production is highest when the price of milk is high. I am no economist, but I would say this has something to do with Milk being an inelastic commodity.
milkcow_facts %>%
ggplot()+
aes(x=milkcow_facts$avg_price_milk, y= milkcow_facts$slaughter_cow_price)+
geom_point()+
geom_smooth(se = FALSE, method = "lm") +
labs(x="Price of Milk", y="Slaughter Cow Price", title="Relationship between Price of Milk and Slaughter Cow Price")
## `geom_smooth()` using formula 'y ~ x'
The scatter plot above shows the postive correlation between the price of milk and the price of meat. Our hypothesis is that as the price of milk increases, so does the opportunity cost of slaughtering a cow. As a result, the price of meat must increase in order to “compete” with the dairy business.
fct_count(state_milk_production$region) %>%
ggplot()+
aes(x =f,y=n, fill=f)+
geom_histogram(stat = "identity", show.legend = FALSE)+
theme(axis.text.x = element_text(face = "bold", angle = 90)) +
labs(x="Regions", y="Count", title="Count of Milk Production per Region")
## Warning: Ignoring unknown parameters: binwidth, bins, pad
##
plotly_box <- fluid_milk_sales %>%
filter(milk_type !="Total Production")%>%
ggplot(aes(x=milk_type, y = pounds, fill=milk_type))+
geom_boxplot()+
labs(y="Pounds", x="Milk Type", title="Over Time Distribution of Milk Production by Type")+
coord_flip()
ggplotly(plotly_box)
The visualization above illustrates the variance in milk production by type. Since the data spans across 42 years, we can infer that eggnog, buttermilk and flavored whole milk production has not increased or decreased significantly over time.
fluid_milk_sales %>%
group_by(year) %>%
summarize(median_pounds = median(pounds)) %>%
ggplot(aes(x = year, y =median_pounds)) +
geom_line() +
geom_smooth(se = FALSE)+
labs(x="Years", y="Pounds", title="Change in Median Milk Production Over Time")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
The graph above shows the amount in pounds that are produced over a number of years. The forecast model shows that there had been an steady increase in the 1980’s through early 2000, however, a steady decline begun after 2010; there are many factors that could have affected the decrease in milk.
Summarize_Mean_Milk <- milkcow_facts %>%
group_by(year) %>%
summarize(mean_price = mean(avg_price_milk))%>%
ggplot(aes(x = year, y =mean_price)) +
geom_line() +
geom_smooth(se = FALSE)+
labs(x="Years", y="Mean Price", title="Mean Price of Milk Over the Years")
ggplotly(Summarize_Mean_Milk)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
The above graph shows that overall, the price of milk has been increasing since the 1980s. More interesting however, is that the mean price of milk was most constant through the 80s before beginning to fluctuate frequently.
#What states have produced more than 5,000,000,000 Lbs of milk, and how many times have they done so?
Over_FiveBillion <- state_milk_production %>%
filter(milk_produced> 5000000000)%>%
count(state)%>%
mutate(state = fct_reorder(state, n))%>%
ggplot(aes(x=state, y=n, fill = state))+
geom_histogram(stat = "identity")+
scale_y_continuous()+
theme(axis.text.x = element_text(angle = 45, vjust = 0.9, hjust = 1)) +
labs(x="Years", y="Pounds", title="Change in Median Milk Production Over Time")
## Warning: Ignoring unknown parameters: binwidth, bins, pad
ggplotly(Over_FiveBillion)
The histogram above shows us that 13 states have been able to achieve a production level of 5,000,000,000 lbs of milk per year. Out of those states since 1975. Out of those states, 5 of them have been able to achieve this goal 48 times.