This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Note: this analysis was performed using the open source software R and Rstudio.
The objective of this basic project is to explain the price of avocados using some basic descriptive analysis.This analysis can be used by producers, retailers, and groceries to make decisions about their pricing strategies, advertising strategies, and supply chain stratgies among others. Some additional analysis will follow after this episode. Your feedback is highly appreciated.
This data was downloaded from the Hass Avocado Board website in May of 2018 & compiled into a single CSV. Here’s how the Hass Avocado Board describes the data on their website: The table below represents weekly retail scan data for National retail volume (units) and price. Retail scan data comes directly from retailers’ cash registers based on actual retail sales of Hass avocados. Starting in 2013, the table below reflects an expanded, multi-outlet retail data set. Multi-outlet reporting includes an aggregation of the following channels: grocery, mass, club, drug, dollar and military. The Average Price (of avocados) in the table reflects a per unit (per avocado) cost, even when multiple units (avocados) are sold in bags. The Product Lookup codes (PLU’s) in the table are only for Hass avocados. Other varieties of avocados (e.g. greenskins) are not included in this table.
data <- read.csv("avocado.csv")
head(data)
## X Date AveragePrice Total.Volume X4046 X4225 X4770 Total.Bags
## 1 0 12/27/2015 1.33 64236.62 1036.74 54454.85 48.16 8696.87
## 2 1 12/20/2015 1.35 54876.98 674.28 44638.81 58.33 9505.56
## 3 2 12/13/2015 0.93 118220.22 794.70 109149.67 130.50 8145.35
## 4 3 12/6/2015 1.08 78992.15 1132.00 71976.41 72.58 5811.16
## 5 4 11/29/2015 1.28 51039.60 941.48 43838.39 75.78 6183.95
## 6 5 11/22/2015 1.26 55979.78 1184.27 48067.99 43.61 6683.91
## Small.Bags Large.Bags XLarge.Bags type year region
## 1 8603.62 93.25 0 conventional 2015 Albany
## 2 9408.07 97.49 0 conventional 2015 Albany
## 3 8042.21 103.14 0 conventional 2015 Albany
## 4 5677.40 133.76 0 conventional 2015 Albany
## 5 5986.26 197.69 0 conventional 2015 Albany
## 6 6556.47 127.44 0 conventional 2015 Albany
#install.packages('plyr')
library(plyr)
count(data, 'region')
## region freq
## 1 Albany 338
## 2 Atlanta 338
## 3 BaltimoreWashington 338
## 4 Boise 338
## 5 Boston 338
## 6 BuffaloRochester 338
## 7 California 338
## 8 Charlotte 338
## 9 Chicago 338
## 10 CincinnatiDayton 338
## 11 Columbus 338
## 12 DallasFtWorth 338
## 13 Denver 338
## 14 Detroit 338
## 15 GrandRapids 338
## 16 GreatLakes 338
## 17 HarrisburgScranton 338
## 18 HartfordSpringfield 338
## 19 Houston 338
## 20 Indianapolis 338
## 21 Jacksonville 338
## 22 LasVegas 338
## 23 LosAngeles 338
## 24 Louisville 338
## 25 MiamiFtLauderdale 338
## 26 Midsouth 338
## 27 Nashville 338
## 28 NewOrleansMobile 338
## 29 NewYork 338
## 30 Northeast 338
## 31 NorthernNewEngland 338
## 32 Orlando 338
## 33 Philadelphia 338
## 34 PhoenixTucson 338
## 35 Pittsburgh 338
## 36 Plains 338
## 37 Portland 338
## 38 RaleighGreensboro 338
## 39 RichmondNorfolk 338
## 40 Roanoke 338
## 41 Sacramento 338
## 42 SanDiego 338
## 43 SanFrancisco 338
## 44 Seattle 338
## 45 SouthCarolina 338
## 46 SouthCentral 338
## 47 Southeast 338
## 48 Spokane 338
## 49 StLouis 338
## 50 Syracuse 338
## 51 Tampa 338
## 52 TotalUS 338
## 53 West 338
## 54 WestTexNewMexico 335
count(data, 'AveragePrice')
## AveragePrice freq
## 1 0.44 1
## 2 0.46 1
## 3 0.48 1
## 4 0.49 2
## 5 0.51 5
## 6 0.52 3
## 7 0.53 6
## 8 0.54 7
## 9 0.55 3
## 10 0.56 12
## 11 0.57 9
## 12 0.58 14
## 13 0.59 5
## 14 0.60 12
## 15 0.61 11
## 16 0.62 10
## 17 0.63 10
## 18 0.64 17
## 19 0.65 23
## 20 0.66 10
## 21 0.67 23
## 22 0.68 26
## 23 0.69 18
## 24 0.70 44
## 25 0.71 28
## 26 0.72 30
## 27 0.73 38
## 28 0.74 50
## 29 0.75 43
## 30 0.76 61
## 31 0.77 65
## 32 0.78 53
## 33 0.79 67
## 34 0.80 60
## 35 0.81 60
## 36 0.82 66
## 37 0.83 81
## 38 0.84 64
## 39 0.85 79
## 40 0.86 73
## 41 0.87 77
## 42 0.88 88
## 43 0.89 98
## 44 0.90 94
## 45 0.91 92
## 46 0.92 101
## 47 0.93 141
## 48 0.94 123
## 49 0.95 127
## 50 0.96 143
## 51 0.97 147
## 52 0.98 189
## 53 0.99 185
## 54 1.00 167
## 55 1.01 159
## 56 1.02 160
## 57 1.03 179
## 58 1.04 174
## 59 1.05 178
## 60 1.06 170
## 61 1.07 168
## 62 1.08 194
## 63 1.09 167
## 64 1.10 161
## 65 1.11 169
## 66 1.12 158
## 67 1.13 192
## 68 1.14 180
## 69 1.15 202
## 70 1.16 168
## 71 1.17 174
## 72 1.18 199
## 73 1.19 188
## 74 1.20 155
## 75 1.21 151
## 76 1.22 167
## 77 1.23 181
## 78 1.24 165
## 79 1.25 170
## 80 1.26 193
## 81 1.27 155
## 82 1.28 147
## 83 1.29 149
## 84 1.30 140
## 85 1.31 139
## 86 1.32 137
## 87 1.33 159
## 88 1.34 164
## 89 1.35 163
## 90 1.36 187
## 91 1.37 159
## 92 1.38 155
## 93 1.39 148
## 94 1.40 175
## 95 1.41 167
## 96 1.42 149
## 97 1.43 185
## 98 1.44 172
## 99 1.45 157
## 100 1.46 150
## 101 1.47 160
## 102 1.48 185
## 103 1.49 180
## 104 1.50 170
## 105 1.51 148
## 106 1.52 161
## 107 1.53 160
## 108 1.54 173
## 109 1.55 163
## 110 1.56 151
## 111 1.57 134
## 112 1.58 146
## 113 1.59 186
## 114 1.60 159
## 115 1.61 125
## 116 1.62 139
## 117 1.63 136
## 118 1.64 133
## 119 1.65 123
## 120 1.66 141
## 121 1.67 129
## 122 1.68 138
## 123 1.69 127
## 124 1.70 115
## 125 1.71 85
## 126 1.72 117
## 127 1.73 96
## 128 1.74 106
## 129 1.75 105
## 130 1.76 115
## 131 1.77 85
## 132 1.78 97
## 133 1.79 112
## 134 1.80 116
## 135 1.81 119
## 136 1.82 125
## 137 1.83 115
## 138 1.84 88
## 139 1.85 102
## 140 1.86 90
## 141 1.87 84
## 142 1.88 94
## 143 1.89 80
## 144 1.90 83
## 145 1.91 73
## 146 1.92 81
## 147 1.93 79
## 148 1.94 63
## 149 1.95 59
## 150 1.96 59
## 151 1.97 54
## 152 1.98 50
## 153 1.99 49
## 154 2.00 59
## 155 2.01 59
## 156 2.02 54
## 157 2.03 39
## 158 2.04 42
## 159 2.05 38
## 160 2.06 59
## 161 2.07 48
## 162 2.08 38
## 163 2.09 47
## 164 2.10 26
## 165 2.11 37
## 166 2.12 26
## 167 2.13 35
## 168 2.14 26
## 169 2.15 33
## 170 2.16 30
## 171 2.17 29
## 172 2.18 32
## 173 2.19 26
## 174 2.20 21
## 175 2.21 22
## 176 2.22 20
## 177 2.23 19
## 178 2.24 22
## 179 2.25 22
## 180 2.26 15
## 181 2.27 20
## 182 2.28 13
## 183 2.29 16
## 184 2.30 20
## 185 2.31 24
## 186 2.32 20
## 187 2.33 24
## 188 2.34 19
## 189 2.35 14
## 190 2.36 18
## 191 2.37 18
## 192 2.38 13
## 193 2.39 14
## 194 2.40 13
## 195 2.41 7
## 196 2.42 6
## 197 2.43 10
## 198 2.44 10
## 199 2.45 8
## 200 2.46 7
## 201 2.47 4
## 202 2.48 8
## 203 2.49 5
## 204 2.50 6
## 205 2.51 6
## 206 2.52 4
## 207 2.53 2
## 208 2.54 9
## 209 2.55 9
## 210 2.56 5
## 211 2.57 10
## 212 2.58 9
## 213 2.59 10
## 214 2.60 2
## 215 2.61 6
## 216 2.62 8
## 217 2.63 2
## 218 2.64 5
## 219 2.65 8
## 220 2.66 3
## 221 2.67 7
## 222 2.68 1
## 223 2.69 2
## 224 2.70 3
## 225 2.71 4
## 226 2.72 3
## 227 2.73 8
## 228 2.74 3
## 229 2.75 3
## 230 2.76 5
## 231 2.77 3
## 232 2.78 1
## 233 2.79 4
## 234 2.80 2
## 235 2.81 3
## 236 2.82 3
## 237 2.83 6
## 238 2.84 4
## 239 2.85 4
## 240 2.86 4
## 241 2.87 3
## 242 2.88 3
## 243 2.89 3
## 244 2.90 1
## 245 2.91 1
## 246 2.92 2
## 247 2.93 4
## 248 2.94 2
## 249 2.95 1
## 250 2.96 1
## 251 2.97 1
## 252 2.99 2
## 253 3.00 2
## 254 3.03 1
## 255 3.04 1
## 256 3.05 1
## 257 3.12 1
## 258 3.17 1
## 259 3.25 1
mean(data$AveragePrice)
## [1] 1.405978
median(data$AveragePrice)
## [1] 1.37
#Now let's find the mode of the variable "price." #mode(data$AveragePrice) does not work
#Let's find a built-in function instead
#See reference: https://stackoverflow.com/questions/2547402/is-there-a-built-in-function-for-finding-the-mode
freq <- tapply(data$AveragePrice,data$AveragePrice,length)
as.numeric(names(freq)[which.max(freq)])
## [1] 1.15
hist(data$AveragePrice,xlab="Average Price",main ="Frequency of Average Price")
library(ggplot2)
ggplot(data = data, aes(x = Date, y = AveragePrice)) +
geom_point()
#Let's find the relationship between price and sales volume of avocado
cor(data$Total.Volume,data$AveragePrice)
## [1] -0.1927524