Transform columns so that datatypes are appropriate. Specifically ensure that the CustomerCode variable is formatted as character, any other categorical variable is set as factor, and date column is set as a date type (Date/POSIXIt/POSIXct).

#Convert the specific columns to the specified data type. df$Department <- as.factor(df$Department) df$Category <- as.factor(df$Category) df$Date <- as.Date(df$Date, format = '%m/%d/%Y') str(df)

tibble [34,432 x 6] (S3: tbl_df/tbl/data.frame)

$ Date : Date[1:34432], format: “2016-01-14” “2016-07-02” …

$ Department : Factor w/ 3 levels “Entrees”,“Kabobs”,..: 2 3 3 3 2 1 1 3 3 1 …

$ Category : Factor w/ 10 levels “Beef”,“Beef and Broccoli”,..: 7 8 8 8 1 6 2 10 8 6 …

$ CustomerCode: chr [1:34432] “CWM11331L8O” “CWM11331L8O” “CXP4593H7E” “CWM11331L8O” …

$ Price : int [1:34432] 28 9 9 9 25 18 26 12 9 12 …

$ Quantity : int [1:34432] 11 5 14 6 7 13 9 6 11 22 …

Display and interpret the summaries for the Quantity and Price columns.

summary(df[ ,5:6])

Price Quantity

Min. : 3.00 Min. : 1.00

1st Qu.:12.00 1st Qu.: 8.00

Median :25.00 Median :11.00

Mean :22.81 Mean :11.31

3rd Qu.:33.00 3rd Qu.:15.00

Max. :50.00 Max. :24.00

NA’s :10 NA’s :7

Display the count of NA values in each column.

df %>% map(is.na) %>% map(sum)

$Date

[1] 0

$Department

[1] 0

$Category

[1] 0

$CustomerCode

[1] 0

$Price

[1] 10

$Quantity

[1] 7

Display a bar chart for Category column.

#Get the Total Quantity of the category item category_df <- df %>% group_by(Category) %>% summarise(Total_Quantity = sum(Quantity, na.rm = T)) %>% arrange(desc(Total_Quantity)) #Plot the data ggplot(data = category_df, aes(x = reorder(Category, - Total_Quantity), y = Total_Quantity, fill = Total_Quantity)) + geom_col() + labs(x = 'Category', y = 'Quantity', title = 'Bar Chart of Quantity by Category') + theme(legend.position = 'none') + ggeasy::easy_rotate_x_labels(angle = 90)

Display the Departments and their revenue using a bar chart. Order the bars in a meaningful way.(Hint: You will need to create a new column Revenue by multiplying Price and Quantity.)

#Create a revenue column
revenue_df <- df %>% mutate(revenue = Price*Quantity)

#Group the data by Department
rev_by_dept <- revenue_df %>% group_by(Department) %>% summarise(Total_Revenue = sum(revenue, na.rm = T))

#Plot the data
ggplot(rev_by_dept, aes(x = reorder(Department, - Total_Revenue), y = Total_Revenue, fill = Total_Revenue)) + geom_bar(stat = 'identity') + labs(title = 'Bar Chart of Revenue by Department', x = 'Department', y = 'Revenue') + theme(legend.position = 'none')

Create a histogram and box and whisker plot of the Price and Quantity columns.

#Plot for histogram plot1 <- ggplot(data = df, aes(x = Price, fill = 'red')) + geom_histogram(binwidth = 10, stat = 'bin', col = 'black')+ theme(legend.position = 'none') + labs(title = 'Histogram of Price', y = 'Count' , x = 'Price') #Plot for Boxplot plot2 <- ggplot(data = df, aes(y = Price, fill = 'red')) + geom_boxplot() + theme(legend.position = 'none') + labs(title = 'Boxplot of Price', y = 'Price') #Plot side by side plot_grid(plot1, plot2, labels = 'AUTO')

Warning: Removed 10 rows containing non-finite values (stat_bin).

Warning: Removed 10 rows containing non-finite values (stat_boxplot).

#Plot for histogram graph1 <- ggplot(data = df, aes(x = Quantity)) + geom_histogram(binwidth = 5, stat = 'bin', col = 'black', fill = 'blue')+ theme(legend.position = 'none') + labs(title = 'Histogram of Quantity', y = 'Count' , x = 'Quantity') #Plot for boxplot graph2 <- ggplot(data = df, aes(y = Quantity)) + geom_boxplot(fill = 'blue') + theme(legend.position = 'none') + labs(title = 'Boxplot of Quantity', y = 'Quantity') #Combine the plots plot_grid(graph1, graph2, labels = 'AUTO')

Warning: Removed 7 rows containing non-finite values (stat_bin).

Warning: Removed 7 rows containing non-finite values (stat_boxplot).

Write a short essay (150-200 words) to compare the strengths and weaknesses of (1) Power BI and (2) Alteryx with that of R, for this kind of analysis. You may discuss how each of these fare in terms of replicability, ease of use, cost, ability to share results with others, scalability, etc.

R

Strengths

Weakness

Power BI

Strengths

Weakness

Alteryx

Strengths

Weakness