For assignmet 3 partB. I use dplyr library for read the data. The dataset is a supermarket transaction with over 14,000 transavtion. Below are the head of the data.
The question for part B is: For each of the following, do whatever it takes to create a bar chart of counts for Units Sold and a histogram of Revenue for the given subpopulation of purchases.
## Transaction Purchase.Date Customer.ID Gender Marital.Status Homeowner
## 1 1 12/18/2007 7223 F S Y
## 2 2 12/20/2007 7841 M M Y
## 3 3 12/21/2007 8374 F M N
## 4 4 12/21/2007 9619 M M Y
## 5 5 12/22/2007 1900 F S Y
## 6 6 12/22/2007 6696 F M Y
## Children Annual.Income City State.or.Province Country
## 1 2 $30K - $50K Los Angeles CA USA
## 2 5 $70K - $90K Los Angeles CA USA
## 3 2 $50K - $70K Bremerton WA USA
## 4 3 $30K - $50K Portland OR USA
## 5 3 $130K - $150K Beverly Hills CA USA
## 6 3 $10K - $30K Beverly Hills CA USA
## Product.Family Product.Department Product.Category Units.Sold
## 1 Food Snack Foods Snack Foods 5
## 2 Food Produce Vegetables 5
## 3 Food Snack Foods Snack Foods 3
## 4 Food Snacks Candy 4
## 5 Drink Beverages Carbonated Beverages 4
## 6 Food Deli Side Dishes 3
## Revenue
## 1 27.38
## 2 14.90
## 3 5.52
## 4 4.44
## 5 14.00
## 6 4.37
- All purchases made during January and February of 2008.
- All purchase made by married female homeowners.
- All purchases made in the state of California.
- All purchases made in the Produce product department.
From all the four curcumstances that analyzed for the data we can see that the most unit sold is 4 units per transaction, and the most frequency revenue per purchase is between $5 and $10. Also, all the histograms are skewed to the right.