Enter your name here: Maojie Li

Step 1: Import the Global Orders 2016 data set using the read.csv function (5 points)

global<-read.csv("Global Superstore Orders 2016(1).csv",stringsAsFactors = FALSE)

Step 2 - Create a horizontal barplot of Total Sales by Market, ordered descending on total sales (i.e. the market with highest total sales should be on the top) (25 points)

D1<-aggregate.data.frame(global$Sales,by=list(global$Market),sum)
                         
names(D1)<-c("Market","Total_Sales")                         
D2<-D1[order(D1$Total_Sales,decreasing = FALSE),]
barplot(D2$Total_Sales,names.arg =D2$Market,col= "pink", horiz= TRUE,border =NA,xlim=c(0,5000000),las=1,
        cex.names=0.71)
options(scipen =1000)
title("Total sales by Market")

Answer the following question: Does any market appear to be very different from the others for total sales?

Answer below: Africa shows that it would be a outlier because it has too small amount of sales compared with other markets.

Can you visually indicate this market as a separate color from the others?

barplot(D2$Total_Sales,names.arg = D2$Market,
        col=c("yellow","pink","pink","pink","pink"),
        horiz= TRUE,border =NA,
        las=1,
        cex.names=0.71)
title("Tatal sales by market 2")  

Step 3 - Create a line chart total sales by year for each market (25 points)

D3<-aggregate.data.frame(global$Sales,by=list(global$Market,global$Order.Year),sum)
names(D3)= c("Market","Year","Total_sales")
D3.AsiaPacific<- D3[D3$Market == "Asia Pacific",]
D3.Europe<- D3[D3$Market == "Europe",]
D3.USCA<- D3[D3$Market == "USCA",]
D3.LATAM<- D3[D3$Market == "LATAM",]
D3.Africa<- D3[D3$Market == "Africa",]


plot(D3$Year, 
     D3$Total_sales, 
     xlab="year",
     ylab="sales",
     xaxt="n",
     las= 1,
     cex.axis= 0.70,
     main = "Total Sales by year for each market")
axis(1, labels = as.character(D3$Year), at = as.numeric(D3$Year))

lines(D3.AsiaPacific$Year,D3.AsiaPacific$Total_sales,col = "pink")
lines(D3.Europe$Year,D3.Europe$Total_sales,col = "light yellow")
lines(D3.USCA$Year,D3.USCA$Total_sales,col = "light blue")
lines(D3.LATAM$Year,D3.LATAM$Total_sales,col = "light green")
lines(D3.Africa$Year,D3.Africa$Total_sales,col = "orange")


points(D3.AsiaPacific$Year,D3.AsiaPacific$Total_Sales)
points(D3.Europe$Year, D3.Europe$Total_Sales)
points(D3.USCA$Year,D3.USCA$Total_Sales)
points(D3.LATAM$Year,D3.LATAM$Total_Sales)
points(D3.Africa$Year,D3.Africa$Total_Sales)


legend("topleft", legend=c("AsiaPacific","Europe","USCA","LATAM","Africa"), lwd=c(3,3),
       col=c("pink","light yellow","light blue","light green"," orange"))

Answer the following question: Does the same market appear to be different in your line graph as well?

Answer below: Africa shows that it is still a outlier. Can you visually indicate this market as a separate color from the other markets in your graph?

Can you tell your markets apart from one another in your graph? If not consider what you need to add to your graph so that you can tell them apart

plot(D3$Year, 
     D3$Total_sales, 
     xlab="year",
     ylab="sales",
     xaxt="n",
     las= 1,
     cex.axis= 0.70,
     main = "Total Sales by year for each market2")
axis(1, labels = as.character(D3$Year), at = as.numeric(D3$Year))
lines(D3.AsiaPacific$Year,D3.AsiaPacific$Total_sales,col = "pink")
lines(D3.Europe$Year,D3.Europe$Total_sales,col = "pink")
lines(D3.USCA$Year,D3.USCA$Total_sales,col = "pink")
lines(D3.LATAM$Year,D3.LATAM$Total_sales,col = "pink")
lines(D3.Africa$Year,D3.Africa$Total_sales,col = "orange")


points(D3.AsiaPacific$Year,D3.AsiaPacific$Total_Sales)
points(D3.Europe$Year, D3.Europe$Total_Sales)
points(D3.USCA$Year,D3.USCA$Total_Sales)
points(D3.LATAM$Year,D3.LATAM$Total_Sales)
points(D3.Africa$Year,D3.Africa$Total_Sales)
legend("topleft", legend=c("AsiaPacific","Europe","USCA","LATAM","Africa"), lwd=c(3,3),
       col=c("pink","pink","pink","pink"," orange"))

Step 4 - Create a box plot of total sales by market (25 points)

Hint: use the ylim parameter to restrict your graph so that you tell the boxes apart

boxplot(Sales~Market,
        data=global,
        ylim = range(0:1000),
        col=  "pink ")
title("Total Sales by Market")

Does the general pattern you observe match that of the earlier steps 2 and 3?

Answer below: No, it is not showing that Africa has significant different than other market.

What other insights can you draw from your box plot above?

Answer below: USCA has outliers with highest value and all the markets have outliers.

Can you visually indicate this market as a separate color from the others in your boxplot graph?

boxplot(Sales~Market,
        data=global,
        ylim = range(0:10000),
        col= c("pink","pink","pink","pink","yellow"))
title("Total Sales by Market2")

Step 5 - Generate a different kind of graph other than what was produced in steps 2, 3, & 4 (20 points) D4<-global[c(‘Profit’,‘Sales’)] plot(D4\(Sales, D4\)Profit, xlab=“sales in $”, ylab=“profit in \(") title("total sales vs. profit for all markets and years") lines(lowess(D4\)Sales,D4$Profit),col=”pink“) legend(”topright“, legend = c(”Lowess Line“, col=”pink“))

This means not a bar plot, line graph, or box plot. You need to use R Base plots for this question, not GGPLOT or any other plotting functions, only R Base.

Please see Week 2 Supporting Materials for helpful websites that have different types of plots for you to use. The syntax is similar to what we’ve done in lab 2

Submit the following two files into the Assignment 2 drop box on Blackboard. HOwever you must first place them in a folder because Blackboard cannot accept .html files, so both files need to be in a folder before you submit that folder to blackboard.

Be sure to include your name in the filenames per the examples below.

File 1: This RMD document with your plots and answers completed above Example: Yoni_Dvorkis_ALY6070_Assignment2.Rmd

File 2: HTML version of this file after you click “Knit” above Example: Yoni_Dvorkis_ALY6070_Assignment2.html

Academic Integrity Reminder: Collaborating on this assignment with anyone is strictly prohibited and will be considered a violation of your Academic Integrity Agreement that you signed prior to joining the program. This includes contacting former students of the course, or collaborating with currently enrolled students. If you have any questions on the assignment I expect you to reach out to me during class or after class dismisses, or you can email me at yonidvorkis@northeastern.edu.

Otherwise be sure to work on the assignment by yourself so that I know you’ve developed the skills to create these R Base graphs and did not copy your work from that of other students or from anywhere else. These skills are critical to being successful in the workforce and I want to be sure you’ve developed them by the time the course concludes.

Good luck!