1. Introduction

1.1

Milk_sales data set indicates that the trend of annual milk sales has been downtrend since 2010. If we are doing milk business and would like to explore new opportunity in this industry, what’s the potential opportunity there and what risk there could be? Which product is most cosumed in U.S? Does it mean this product could be a sun-rising industry in milk industry?

1.2

I will use milk_product_facts dataset and ggplot package in R to find the market trend in Butter, Cheese, Milk, Yogurt and Ice Cream businesses.

1.3

I will first do data wrangling for Milk_sales data set to clean and tidy the data. Then I will use ggplot to visualize the annual milk sales by types to prove my assumption that annual milk sales has been in downtrend. Then I will wrangle data for milkproductfacts dataset to create four new variables (cheese, evap_milk, ice_cream, dry_milk) by merging several columns together. At last, ggplot package will be deployed to visualize the trends of multiple products of milk industry to find the sun-rising product which investor can step into.

1.4

My analysis can help customer get a good insight of milk industry. It could support customer with decision making and strategy design to direct company to right track.

2. Packages Required

## Clean and tidy dataset
library(tidyverse)
## Used to quickly get a complete insight of the dataset
library(skimr)
## Data visualization
library(ggplot2)

3. Data

fluid_milk_sales.csv
milk_products_facts.csv

Data comes from the USDA (United States Department of Agriculture). The raw datasets (Excel Sheets) can be found here.

##Importing Data
milksales <- read.csv("C:/Users/sijia/Desktop/BANA7025/fluid_milk_sales.csv")
milkproductfacts <- read.csv("C:/Users/sijia/Desktop/BANA7025/milk_products_facts.csv")

##Quick review for each dataset
skim(milkproductfacts)
## Skim summary statistics
##  n obs: 43 
##  n variables: 18 
## 
## -- Variable type:integer --------------------------------------------------------------------------------------------
##    variable missing complete  n    mean    sd   p0    p25  p50    p75 p100
##  fluid_milk       0       43 43  202.91 27.03  149  183    205  223.5  247
##        year       0       43 43 1996    12.56 1975 1985.5 1996 2006.5 2017
##      hist
##  <U+2583><U+2582><U+2585><U+2586><U+2583><U+2582><U+2587><U+2583>
##  <U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587>
## 
## -- Variable type:numeric --------------------------------------------------------------------------------------------
##                         variable missing complete  n  mean    sd     p0
##                           butter       0       43 43  4.71 0.43   4.19 
##                  cheese_american       0       43 43 11.95 1.5    8.15 
##                   cheese_cottage       0       43 43  3.13 0.86   2.07 
##                     cheese_other       0       43 43 14.71 4.82   6.13 
##                   dry_buttermilk       0       43 43  0.23 0.054  0.17 
##                  dry_nonfat_milk       0       43 43  3.02 0.53   2.12 
##                         dry_whey       0       43 43  3.05 0.66   1.89 
##                   dry_whole_milk       0       43 43  0.31 0.14   0.095
##  evap_cnd_bulk_and_can_skim_milk       0       43 43  4.32 0.82   3.02 
##         evap_cnd_bulk_whole_milk       0       43 43  0.81 0.29   0.44 
##       evap_cnd_canned_whole_milk       0       43 43  2.04 0.71   0.94 
##                     fluid_yogurt       0       43 43  7.16 4.34   1.97 
##     frozen_ice_cream_reduced_fat       0       43 43  6.4  0.43   5.67 
##         frozen_ice_cream_regular       0       43 43 15.63 1.65  12.47 
##                     frozen_other       0       43 43  3.13 1.37   1.35 
##                   frozen_sherbet       0       43 43  1.14 0.15   0.8  
##    p25   p50   p75  p100     hist
##   4.37  4.54  4.91  5.69 <U+2587><U+2587><U+2585><U+2583><U+2582><U+2581><U+2583><U+2582>
##  11.28 12.12 12.95 15.06 <U+2581><U+2582><U+2581><U+2585><U+2586><U+2587><U+2582><U+2581>
##   2.56  2.65  4.03  4.63 <U+2586><U+2587><U+2581><U+2582><U+2581><U+2581><U+2583><U+2583>
##  10.68 15.26 18.96 22.05 <U+2586><U+2582><U+2582><U+2583><U+2586><U+2585><U+2587><U+2585>
##   0.2   0.2   0.25  0.39 <U+2585><U+2587><U+2582><U+2581><U+2581><U+2581><U+2581><U+2581>
##   2.62  3.05  3.31  4.28 <U+2583><U+2586><U+2585><U+2587><U+2587><U+2582><U+2581><U+2582>
##   2.4   3.02  3.65  4.09 <U+2583><U+2587><U+2583><U+2585><U+2583><U+2586><U+2587><U+2585>
##   0.2   0.3   0.4   0.6  <U+2583><U+2587><U+2582><U+2582><U+2587><U+2581><U+2583><U+2581>
##   3.64  4.24  5.17  5.58 <U+2587><U+2583><U+2587><U+2586><U+2582><U+2582><U+2587><U+2587>
##   0.58  0.7   1.06  1.46 <U+2587><U+2587><U+2582><U+2583><U+2583><U+2583><U+2582><U+2582>
##   1.49  1.84  2.34  3.95 <U+2581><U+2587><U+2583><U+2583><U+2581><U+2582><U+2581><U+2581>
##   3.78  5.87 11.31 14.93 <U+2587><U+2587><U+2586><U+2582><U+2582><U+2582><U+2582><U+2586>
##   6.08  6.33  6.61  7.55 <U+2582><U+2587><U+2585><U+2587><U+2582><U+2582><U+2581><U+2581>
##  14.69 15.71 17.06 18.21 <U+2585><U+2581><U+2582><U+2585><U+2587><U+2582><U+2586><U+2586>
##   2.27  2.91  3.76  6.54 <U+2585><U+2582><U+2587><U+2582><U+2582><U+2581><U+2582><U+2581>
##   1.11  1.18  1.22  1.36 <U+2582><U+2582><U+2581><U+2581><U+2583><U+2587><U+2583><U+2582>
skim(milksales)
## Skim summary statistics
##  n obs: 387 
##  n variables: 3 
## 
## -- Variable type:factor ---------------------------------------------------------------------------------------------
##   variable missing complete   n n_unique
##  milk_type       0      387 387        9
##                          top_counts ordered
##  But: 43, Egg: 43, Fla: 43, Fla: 43   FALSE
## 
## -- Variable type:numeric --------------------------------------------------------------------------------------------
##  variable missing complete   n       mean       sd         p0        p25
##    pounds       0      387 387    1.2e+10  1.7e+10    7.6e+07    8.4e+08
##      year       0      387 387 1996       12.43    1975       1985      
##         p50        p75       p100     hist
##     3.9e+09    1.7e+10    5.6e+10 <U+2587><U+2581><U+2582><U+2581><U+2581><U+2581><U+2581><U+2582>
##  1996       2007       2017       <U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587><U+2587>
#No outlier is detected
boxplot(subset(milkproductfacts, select = -c(year)))

#Quick look of the datasets
head(milkproductfacts)
##   year fluid_milk fluid_yogurt   butter cheese_american cheese_other
## 1 1975        247     1.967839 4.728193        8.147222     6.126409
## 2 1976        247     2.132685 4.313202        8.883106     6.627872
## 3 1977        244     2.338369 4.294180        9.213005     6.781846
## 4 1978        241     2.448503 4.354593        9.525359     7.309603
## 5 1979        238     2.443847 4.491231        9.597205     7.567657
## 6 1980        234     2.503008 4.467509        9.620140     7.903713
##   cheese_cottage evap_cnd_canned_whole_milk evap_cnd_bulk_whole_milk
## 1       4.588537                   3.949932                1.2385962
## 2       4.632284                   3.791703                1.1008644
## 3       4.617711                   3.265569                1.0038023
## 4       4.600490                   3.148379                0.9007974
## 5       4.434472                   3.120396                0.9374522
## 6       4.408807                   2.885797                0.8767681
##   evap_cnd_bulk_and_can_skim_milk frozen_ice_cream_regular
## 1                        3.525306                 18.20505
## 2                        3.590506                 17.63845
## 3                        3.879376                 17.28895
## 4                        3.469461                 17.22533
## 5                        3.332083                 16.94341
## 6                        3.281817                 17.11750
##   frozen_ice_cream_reduced_fat frozen_sherbet frozen_other dry_whole_milk
## 1                     6.502202       1.348780     1.816894            0.1
## 2                     6.169193       1.364460     1.678171            0.2
## 3                     6.574222       1.356254     1.627777            0.2
## 4                     6.550307       1.294786     1.511782            0.3
## 5                     6.197152       1.202817     1.413432            0.3
## 6                     6.052010       1.190466     1.348990            0.3
##   dry_nonfat_milk dry_buttermilk dry_whey
## 1        3.261769            0.2      2.2
## 2        3.504864            0.2      2.4
## 3        3.308311            0.3      2.4
## 4        3.101835            0.2      2.4
## 5        3.282367            0.2      2.7
## 6        3.011035            0.2      2.7
head(milksales)
##   year milk_type     pounds
## 1 1975     Whole 3.6188e+10
## 2 1976     Whole 3.5241e+10
## 3 1977     Whole 3.4036e+10
## 4 1978     Whole 3.3235e+10
## 5 1979     Whole 3.2480e+10
## 6 1980     Whole 3.1253e+10

fluid_milk_sales

variable class description
year date Year
milk_type integer Category of Milk product
pounds double Pounds of milk product per year

milk_products_facts

variable class description
year date Year
fluid_milk double Average milk consumption in lbs per person
fluid_yogurt double Average yogurt consumption in lbs per person
butter double Average butter consumption in lbs per person
cheese_american double Average American cheese consumption in lbs per person
cheese_other double Average other cheese consumption in lbs per person
cheese_cottage double Average cottage cheese consumption in lbs per person
evap_cnd_canned_whole_milk double Average evaporated and canned whole milk consumption in lbs per person
evap_cnd_bulk_whole_milk double Average evaporated and canned bulk whole milk consumption in lbs per person
evap_cnd_bulk_and_can_skim_milk double Average evaporated and canned bulk and can skim milk consumption in lbs per person
frozen_ice_cream_regular double Average regular frozen ice cream consumption in lbs per person
frozen_ice_cream_reduced_fat double Average reducated fat frozen ice cream consumption in lbs per person
frozen_sherbet double Average frozen sherbet consumption in lbs per person
frozen_other double Average other frozen milk product consumption in lbs per person
dry_whole_milk double Average dry whole milk consumption in lbs per person
dry_nonfat_milk double Average dry nonfat milk consumption in lbs per person
dry_buttermilk double Average dry buttermilk consumption in lbs per person
dry_whey double Average dry whey (milk protein) consumption in lbs per person

4. Proposed Exploratory Data Analysis

4.1 Data Exploratory

I plan to merge following columns from milkproductfacts to create new varaibles:

cheese = cheese_american + cheese_other + cheese_cottage

evap_milk = evap_cnd_canned_whole_milk + evap_cnd_bulk_whole_milk + evap_cnd_bulk_and_can_skim_milk

ice_cream = frozen_ice_cream_regular + frozen_ice_cream_reduced_fat + frozen_sherbet + frozen_other

dry_milk = dry_whole_milk + dry_nonfat_milk + dry_buttermilk + dry_whey

4.2 Plot

I will use ggplot to visualize the popularity of each product from 1975 to 2017.

4.3 Challenge

I am not very familiar with ggplot and will need time to get familiar with it.

4.4 Support Analysis Method

Machine learning is not a must in this project.