A02a_G22358412

The file SupermarketTransactions.csv contains data on over 14.000 transactions. There are two numeric variables, Units Sold and Revenue. The first of these is discrete and the second is continuous. ##load tiyverse and dplyr and read the file

library(tidyverse)

## ── Attaching packages ───────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
## ✔ tibble  1.3.4     ✔ dplyr   0.7.4
## ✔ tidyr   0.7.2     ✔ stringr 1.2.0
## ✔ readr   1.1.1     ✔ forcats 0.2.0

## ── Conflicts ──────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(dplyr, warn.conflicts = FALSE)
dtst <- read_csv("SupermarketTransactions.csv")

## Parsed with column specification:
## cols(
##   Transaction = col_integer(),
##   `Purchase Date` = col_character(),
##   `Customer ID` = col_integer(),
##   Gender = col_character(),
##   `Marital Status` = col_character(),
##   Homeowner = col_character(),
##   Children = col_integer(),
##   `Annual Income` = col_character(),
##   City = col_character(),
##   `State or Province` = col_character(),
##   Country = col_character(),
##   `Product Family` = col_character(),
##   `Product Department` = col_character(),
##   `Product Category` = col_character(),
##   `Units Sold` = col_integer(),
##   Revenue = col_double()
## )

For each of the following, do whatever it takes to create a bar chart of counts for Units Sold and a histogram of Revenue for each of the given subpopulation of purchases below. a. All purchases made during January and February of 2008. ## subset the purchases made during January and February of 2008

spmt <- subset(dtst,as.Date(dtst$`Purchase Date`, "%m/%d/%Y") >= "2008-01-01" 
                & as.Date(dtst$`Purchase Date`, "%m/%d/%Y") <= "2008-02-29", 
                select = c(`Units Sold`, Revenue))

create a bar chart of counts for Units Sold during January and February of 2008

barplot(summary(factor(spmt$`Units Sold`)), xlab = "Units Sold", ylab="Counts", main = "Units Sold during January and February of 2008")

create a histogram of Revenue made during January and February of 2008

hist(spmt$Revenue, xlab = "Revenue", ylab = "Counts", main = "Revenue during January and February of 2008")

All purchase made by married female homeowners in the state of California. ## subset all purchase made by married female homeowners in the state of California

spmt1 <- subset(dtst, dtst$Gender == "F" & dtst$Homeowner == "Y" & dtst$`Marital Status`== "M" & dtst$`State or Province` == "CA",select = c(`Units Sold`, Revenue))

create a bar chart of counts for Units Sold made by married female homeowners in the state of California

barplot(summary(factor(spmt1$`Units Sold`)), xlab = "Units Sold", ylab="Counts", main = "Units Sold made by married female homeowners \n in the state of California")

create a histogram of Revenue made by married female homeowners in the state of California

hist(spmt1$Revenue, xlab = "Revenue", ylab = "Counts", main = "Revenue made from married female homeowners \n in the state of California.")

Write a summary that is less than 100 words that describes your analysis.

The distributions about revenue of Supermarket Transactions are all right skewed. This indicates that most of its revenue are made from cheaper products that have a large proportion of all transactions.

The distributions of Units Sold are more symmetric. Most of the transactions fall in from 3 to 5.