Transform columns so that datatypes are appropriate. Specifically ensure that the CustomerCode variable is formatted as character, any other categorical variable is set as factor, and date column is set as a date type (Date/POSIXIt/POSIXct).
#Convert the specific columns to the specified data type. df$Department <- as.factor(df$Department) df$Category <- as.factor(df$Category) df$Date <- as.Date(df$Date, format = '%m/%d/%Y') str(df)
Display and interpret the summaries for the Quantity and Price columns.
summary(df[ ,5:6])
Display the count of NA values in each column.
df %>% map(is.na) %>% map(sum)
Display a bar chart for Category column.
#Get the Total Quantity of the category item category_df <- df %>% group_by(Category) %>% summarise(Total_Quantity = sum(Quantity, na.rm = T)) %>% arrange(desc(Total_Quantity)) #Plot the data ggplot(data = category_df, aes(x = reorder(Category, - Total_Quantity), y = Total_Quantity, fill = Total_Quantity)) + geom_col() + labs(x = 'Category', y = 'Quantity', title = 'Bar Chart of Quantity by Category') + theme(legend.position = 'none') + ggeasy::easy_rotate_x_labels(angle = 90)
Display the Departments and their revenue using a bar chart. Order the bars in a meaningful way.(Hint: You will need to create a new column Revenue by multiplying Price and Quantity.)
#Create a revenue column
revenue_df <- df %>% mutate(revenue = Price*Quantity)
#Group the data by Department
rev_by_dept <- revenue_df %>% group_by(Department) %>% summarise(Total_Revenue = sum(revenue, na.rm = T))
#Plot the data
ggplot(rev_by_dept, aes(x = reorder(Department, - Total_Revenue), y = Total_Revenue, fill = Total_Revenue)) + geom_bar(stat = 'identity') + labs(title = 'Bar Chart of Revenue by Department', x = 'Department', y = 'Revenue') + theme(legend.position = 'none')
Create a histogram and box and whisker plot of the Price and Quantity columns.
#Plot for histogram plot1 <- ggplot(data = df, aes(x = Price, fill = 'red')) + geom_histogram(binwidth = 10, stat = 'bin', col = 'black')+ theme(legend.position = 'none') + labs(title = 'Histogram of Price', y = 'Count' , x = 'Price') #Plot for Boxplot plot2 <- ggplot(data = df, aes(y = Price, fill = 'red')) + geom_boxplot() + theme(legend.position = 'none') + labs(title = 'Boxplot of Price', y = 'Price') #Plot side by side plot_grid(plot1, plot2, labels = 'AUTO')
#Plot for histogram graph1 <- ggplot(data = df, aes(x = Quantity)) + geom_histogram(binwidth = 5, stat = 'bin', col = 'black', fill = 'blue')+ theme(legend.position = 'none') + labs(title = 'Histogram of Quantity', y = 'Count' , x = 'Quantity') #Plot for boxplot graph2 <- ggplot(data = df, aes(y = Quantity)) + geom_boxplot(fill = 'blue') + theme(legend.position = 'none') + labs(title = 'Boxplot of Quantity', y = 'Quantity') #Combine the plots plot_grid(graph1, graph2, labels = 'AUTO')
Write a short essay (150-200 words) to compare the strengths and weaknesses of (1) Power BI and (2) Alteryx with that of R, for this kind of analysis. You may discuss how each of these fare in terms of replicability, ease of use, cost, ability to share results with others, scalability, etc.
R
Strengths
One of the strengths of R is that it is open source. Its free to use.
The results are reproducible. Your code can be recreated and reused if required.
Weakness
Power BI
Strengths
Its free to use. Power BI can be installed and used for free.
Easy to learn. Uses a graphical interface to create visualizations.
Weakness
Alteryx
Strengths
Easy to use. One just drags the elements they want to use and create a workflow.
The results are reproducible. Because you are creating a workflow one can replicate your work.
Weakness