1-Data Exploration: This should include summary statistics, means, medians, quartiles, or any other relevant information about the data set. Please include some conclusions in the R Markdown text.
Question: What is the impact on Grocery Sales from item placement at a store?
Conclusion to question above written at the end.
Grocery <-read.csv(file="https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/Stat2Data/Grocery.csv", header = TRUE, sep = ",")
Grocery
## X Discount Store Display Sales Price
## 1 1 5.00% 1 Featured End of Aisl 240 8.96
## 2 2 5.00% 1 Featured Middle of A 264 9.19
## 3 3 5.00% 1 Not Featured 192 8.46
## 4 4 5.00% 2 Featured End of Aisl 216 8.58
## 5 5 5.00% 2 Featured Middle of A 174 8.31
## 6 6 5.00% 2 Not Featured 264 9.30
## 7 7 5.00% 3 Featured End of Aisl 176 8.04
## 8 8 5.00% 3 Featured Middle of A 220 8.80
## 9 9 5.00% 3 Not Featured 171 8.03
## 10 10 5.00% 4 Featured End of Aisl 199 8.39
## 11 11 5.00% 4 Featured Middle of A 180 8.17
## 12 12 5.00% 4 Not Featured 146 7.76
## 13 13 10.00% 5 Featured End of Aisl 244 8.91
## 14 14 10.00% 5 Featured Middle of A 173 8.07
## 15 15 10.00% 5 Not Featured 225 8.76
## 16 16 10.00% 6 Featured End of Aisl 252 8.99
## 17 17 10.00% 6 Featured Middle of A 192 8.29
## 18 18 10.00% 6 Not Featured 270 9.25
## 19 19 10.00% 7 Featured End of Aisl 202 8.37
## 20 20 10.00% 7 Featured Middle of A 261 9.15
## 21 21 10.00% 7 Not Featured 225 8.64
## 22 22 10.00% 8 Featured End of Aisl 179 8.06
## 23 23 10.00% 8 Featured Middle of A 222 8.59
## 24 24 10.00% 8 Not Featured 168 8.03
## 25 25 15.00% 9 Featured End of Aisl 234 8.73
## 26 26 15.00% 9 Featured Middle of A 233 8.78
## 27 27 15.00% 9 Not Featured 162 7.91
## 28 28 15.00% 10 Featured End of Aisl 220 8.49
## 29 29 15.00% 10 Featured Middle of A 209 8.41
## 30 30 15.00% 10 Not Featured 258 9.02
## 31 31 15.00% 11 Featured End of Aisl 215 8.50
## 32 32 15.00% 11 Featured Middle of A 199 8.22
## 33 33 15.00% 11 Not Featured 242 8.82
## 34 34 15.00% 12 Featured End of Aisl 179 8.11
## 35 35 15.00% 12 Featured Middle of A 206 8.37
## 36 36 15.00% 12 Not Featured 206 8.42
summary(Grocery)
## X Discount Store Display
## Min. : 1.00 Length:36 Min. : 1.00 Length:36
## 1st Qu.: 9.75 Class :character 1st Qu.: 3.75 Class :character
## Median :18.50 Mode :character Median : 6.50 Mode :character
## Mean :18.50 Mean : 6.50
## 3rd Qu.:27.25 3rd Qu.: 9.25
## Max. :36.00 Max. :12.00
## Sales Price
## Min. :146.0 Min. :7.760
## 1st Qu.:179.8 1st Qu.:8.207
## Median :212.0 Median :8.475
## Mean :211.6 Mean :8.524
## 3rd Qu.:235.5 3rd Qu.:8.805
## Max. :270.0 Max. :9.300
mean(Grocery$Sales)
## [1] 211.6111
median(Grocery$Price)
## [1] 8.475
quantile(Grocery$Price)
## 0% 25% 50% 75% 100%
## 7.7600 8.2075 8.4750 8.8050 9.3000
Preliminary conclusions from data:
The lowest price is 7.76 while the highest price is 9.30.
The median price is 8.47.
Data wrangling: Please perform some basic transformations. They will need to make sense but could include column renaming, creating a subset of the data, replacing values, or creating new columns with derived data (for example – if it makes sense you could sum two columns together)
requires package plyr
Grocery2 <- subset(Grocery, Sales > 230, c('Store','Display', 'Sales'))
Grocery2
## Store Display Sales
## 1 1 Featured End of Aisl 240
## 2 1 Featured Middle of A 264
## 6 2 Not Featured 264
## 13 5 Featured End of Aisl 244
## 16 6 Featured End of Aisl 252
## 18 6 Not Featured 270
## 20 7 Featured Middle of A 261
## 25 9 Featured End of Aisl 234
## 26 9 Featured Middle of A 233
## 30 10 Not Featured 258
## 33 11 Not Featured 242
library(plyr)
Grocery2<-rename(Grocery2, c("Store"="GroceryStore","Display"="DisplaySetup" ,"Sales"="TotalSales"))
Grocery2
## GroceryStore DisplaySetup TotalSales
## 1 1 Featured End of Aisl 240
## 2 1 Featured Middle of A 264
## 6 2 Not Featured 264
## 13 5 Featured End of Aisl 244
## 16 6 Featured End of Aisl 252
## 18 6 Not Featured 270
## 20 7 Featured Middle of A 261
## 25 9 Featured End of Aisl 234
## 26 9 Featured Middle of A 233
## 30 10 Not Featured 258
## 33 11 Not Featured 242
Grocery2$DisplaySetup<-revalue(Grocery2$DisplaySetup,c("Featured End of Aisl"="Featured"))
Grocery2$DisplaySetup<-revalue(Grocery2$DisplaySetup,c("Featured Middle of A"="Featured"))
Grocery2$DisplaySetup<-revalue(Grocery2$DisplaySetup,c("Not Featured"="Low Priority"))
Grocery2
## GroceryStore DisplaySetup TotalSales
## 1 1 Featured 240
## 2 1 Featured 264
## 6 2 Low Priority 264
## 13 5 Featured 244
## 16 6 Featured 252
## 18 6 Low Priority 270
## 20 7 Featured 261
## 25 9 Featured 234
## 26 9 Featured 233
## 30 10 Low Priority 258
## 33 11 Low Priority 242
3-Graphics: Please make sure to display at least one scatter plot, box plot and histogram. Don’t be limited to this. Please explore the many other options in R packages such as ggplot2.
plot(x = Grocery$Sales,y = Grocery$Price,xlab = "Sales",ylab = "Price",
xlim = c(146,270),ylim = c(7.00,10.00),main = "Sales vs Price")
boxplot(Grocery$Sales ~ Grocery$Discount, data = Grocery, xlab = "Discount",
ylab = "Sales", main = "Discount Vs Sales")
hist(Grocery$Price,main="Grocery Prices",xlab="Prices",breaks=4,col="yellow")
requires ggplot2 package requires gridExtra package
library(ggplot2)
library(gridExtra)
library(plyr)
x<-ggplot(Grocery, aes(x=Grocery$Sales, y=Grocery$Price)) +
geom_point(shape=1) + geom_smooth(method=lm)
x+ggtitle(" Sales Vs Price")
## Warning: Use of `Grocery$Sales` is discouraged.
## ℹ Use `Sales` instead.
## Warning: Use of `Grocery$Price` is discouraged.
## ℹ Use `Price` instead.
## Warning: Use of `Grocery$Sales` is discouraged.
## ℹ Use `Sales` instead.
## Warning: Use of `Grocery$Price` is discouraged.
## ℹ Use `Price` instead.
## `geom_smooth()` using formula = 'y ~ x'
Grocery4 <- subset(Grocery,Store<5, c('Store','Discount','Display', 'Sales'))
Grocery4$Display<-revalue(Grocery4$Display,c("Featured End of Aisl"="Featured"))
Grocery4$Display<-revalue(Grocery4$Display,c("Featured Middle of A"="Featured"))
Grocery4$Display<-revalue(Grocery4$Display,c("Not Featured"="Not Featured"))
x<-ggplot(data=Grocery4, aes(x=Grocery4$Store, y=Grocery4$Sales, fill=Grocery4$Display)) +
geom_bar(stat="identity", position=position_dodge(), colour="black")
p1<-x+ggtitle("Sales By Display 5% Discount")
Grocery5 <- subset(Grocery,Store>4 & Store<9, c('Store','Discount','Display', 'Sales'))
Grocery5$Display<-revalue(Grocery5$Display,c("Featured End of Aisl"="Featured"))
Grocery5$Display<-revalue(Grocery5$Display,c("Featured Middle of A"="Featured"))
Grocery5$Display<-revalue(Grocery5$Display,c("Not Featured"="Not Featured"))
x<-ggplot(data=Grocery5, aes(x=Grocery5$Store, y=Grocery5$Sales, fill=Grocery5$Display)) +
geom_bar(stat="identity", position=position_dodge(), colour="black")
p2<-x+ggtitle("Sales By Display 10% Discount")
Grocery6 <- subset(Grocery,Store>8, c('Store','Discount','Display', 'Sales'))
Grocery6$Display<-revalue(Grocery6$Display,c("Featured End of Aisl"="Featured"))
Grocery6$Display<-revalue(Grocery6$Display,c("Featured Middle of A"="Featured"))
Grocery6$Display<-revalue(Grocery6$Display,c("Not Featured"="Not Featured"))
x<-ggplot(data=Grocery6, aes(x=Grocery6$Store, y=Grocery6$Sales, fill=Grocery6$Display)) +
geom_bar(stat="identity", position=position_dodge(), colour="black")
p3<-x+ggtitle("Sales By Display 15% Discount")
grid.arrange(p1, p2,p3,nrow = 2)
## Warning: Use of `Grocery4$Store` is discouraged.
## ℹ Use `Store` instead.
## Warning: Use of `Grocery4$Sales` is discouraged.
## ℹ Use `Sales` instead.
## Warning: Use of `Grocery4$Display` is discouraged.
## ℹ Use `Display` instead.
Grocery7 <- subset(Grocery, Store>0,c('Store','Discount','Display', 'Sales'))
Grocery7$Display<-revalue(Grocery7$Display,c("Featured End of Aisl"="Featured"))
Grocery7$Display<-revalue(Grocery7$Display,c("Featured Middle of A"="Featured"))
Grocery7$Display<-revalue(Grocery7$Display,c("Not Featured"="Not Featured"))
Grocery7$Store<-mapvalues(Grocery7$Store, from=c(1,2,3,4), to =c("Discount5","Discount5","Discount5","Discount5"))
Grocery7$Store<-mapvalues(Grocery7$Store, from=c(5,6,7,8), to =c("Discount10","Discount10","Discount10","Discount10"))
Grocery7$Store<-mapvalues(Grocery7$Store, from=c(9,10,11,12), to =c("Discount15","Discount15","Discount15","Discount15"))
x<-ggplot(data=Grocery7, aes(x=Grocery7$Store, y=Grocery7$Sales, fill=Grocery7$Display)) +
geom_bar(stat="identity", position=position_dodge(), colour="black")
p4<-x+ggtitle("Sales By Display By Discount")
p4
## Warning: Use of `Grocery7$Store` is discouraged.
## ℹ Use `Store` instead.
## Warning: Use of `Grocery7$Sales` is discouraged.
## ℹ Use `Sales` instead.
## Warning: Use of `Grocery7$Display` is discouraged.
## ℹ Use `Display` instead.
4-Meaningful question for analysis: Please state at the beginning a meaningful question for analysis. Use the first three steps and anything else that would be helpful to answer the question you are posing from the data set you chose. Please write a brief conclusion paragraph in R markdown at the end. 5-BONUS – place the original .csv in a github file and have R read from the link. This will be a very useful skill as you progress in your data science education and career.
What is the impact on Grocery Sales from item placement at a store?
The conclusion that can be drawn from analysis of the data from the graphs particularly the graphs that break down sales by discount is that item placement has less of an impact on increasing sales for featured items as discounts get larger. When the discount is the highest at 15%, sales of non featured items overall is greater than featured items. At a majority of the stores that have 15% discounts the sales of non featured items have more sales than the items featured. And when looking at the store with 5% discount which is the lowest, sales of featured items are greater overall compared to non featured items. As well as the majority of the stores with 5% discounts on a store by store basis.