1 研究背景与问题

本分析旨在根据吴俊老师提供的gloabal-superstore数据集，对数据集中公司的销售情况、发展情况和未来发展趋势进行分析，以发现问题，提出改进建议，并使组内成员练习R语言数据分析技术的使用。

组内分工：

胡智维、焦点负责第一部分，分析公司不同区域的销售情况；

王茂安、胡豪负责第二部分，分析公司目前的市场分布和全球构成；

史瑞、石承致负责分析第三部分，分析公司产品利润情况；

宋诗凝、江明子负责分析第四部分，分析公司业绩整体表现和需要改进之处；

2 销售数据分析部分

2.1 数据预处理

首先对数据进行预处理，在观察数据集之后挑选'id','purchasedate','country','market','type','subtype','sales','quantity'八列数据进行销售数据的分析，并展示前五行数据进行数据预览。

#清空环境，设置工作目录
rm(list = ls())
setwd('D:/Documents/dasanxia/R/')
begin<-Sys.time()
options(warn=-1)
library(tidyverse)
library(REmap)
library(readxl)
library(writexl)
library(devtools)
library(lubridate)
library(scales)
library(RColorBrewer)
library(ggplot2) 
library(ggthemes)
library(gridExtra)
library(ggrepel)
orders <- read_excel("global-superstore(English).xlsx","订单")
orders <- orders[rowSums(is.na(orders)) != ncol(orders),] # 删除空行
names(orders)[names(orders) == 'country...11'] <- 'country_ch'
names(orders)[names(orders) == 'country...12'] <- 'country_en'
# 修改列名
data<-readxl::read_excel('global-superstore.xlsx')
data[1:5,]

## # A tibble: 5 x 24
##      id orderid purchasedate        shipdate            shipway custid custname
##   <dbl> <chr>   <dttm>              <dttm>              <chr>   <chr>  <chr>   
## 1     1 MX-201~ 2014-10-02 00:00:00 2014-10-06 00:00:00 标准级  SC-20~ 常松    
## 2     2 MX-201~ 2012-10-15 00:00:00 2012-10-20 00:00:00 标准级  KW-16~ 郝立勤  
## 3     3 MX-201~ 2012-10-15 00:00:00 2012-10-20 00:00:00 标准级  KW-16~ 郝立勤  
## 4     4 MX-201~ 2012-10-15 00:00:00 2012-10-20 00:00:00 标准级  KW-16~ 郝立勤  
## 5     5 MX-201~ 2012-10-15 00:00:00 2012-10-20 00:00:00 标准级  KW-16~ 郝立勤  
## # ... with 17 more variables: segment <chr>, city <chr>, state <chr>,
## #   country <chr>, zipcode <lgl>, market <chr>, area <chr>, productid <chr>,
## #   type <chr>, subtype <chr>, productname <chr>, sales <dbl>, quantity <dbl>,
## #   discount <dbl>, profit <dbl>, shipcost <dbl>, priority <chr>

#数据处理，预览数据后保留以下8列进行销售情况分析
data_sales<-data[,c('id','purchasedate','country','market','type','subtype','sales','quantity')]
#购买日期数据仅保留年份进行分析
data_sales$purchasedate<-year(data_sales$purchasedate)
#展示前五行
data_sales[1:5,]

## # A tibble: 5 x 8
##      id purchasedate country  market   type     subtype sales quantity
##   <dbl>        <dbl> <chr>    <chr>    <chr>    <chr>   <dbl>    <dbl>
## 1     1         2014 墨西哥   拉丁美洲 办公用品 标签     13.1        3
## 2     2         2012 哥伦比亚 拉丁美洲 家具     用具    252.         8
## 3     3         2012 哥伦比亚 拉丁美洲 家具     书架    193.         2
## 4     4         2012 哥伦比亚 拉丁美洲 办公用品 装订机   35.4        4
## 5     5         2012 哥伦比亚 拉丁美洲 办公用品 美术     71.6        2

2.2 绘制近三年全球范围内的销售额地图

首先对整体销售数据进行概览，我们选择销售额数据作为指标，在全球范围内绘制热力图，销售额度越高，该国家/地区颜色越深。根据绘制的热力地图可以概览得到对公司最重要的市场是美国市场，销售额最高，达到2297201美元，其次为亚洲东部、南部，中国、澳大利亚、印度尼西亚、印度等地的销售额同样很高，第三大市场为欧洲市场。与此同时，公司在中亚、非洲等地区销售额较低，存在未开拓的市场。

#按国家绘制近三年世界各国销售额热力地图
library("REmap")
data_country<-readxl::read_excel("global.xlsx")
mapdata<-data_country[,c('country_English','sales')]
mapdata=data.frame(mapdata)
map_out<-REmap::remapC(
  title="全球销售额",
  maptype='world',
  mapdata,
  color="firebrick",
  mindata=-3000000,
  maxdata=2297201
)
plot(map_out)

## Save img as: C:\Users\86136\AppData\Local\Temp\Rtmpk1jSzS/ID_20200406120640_1153.html

2.3 对公司全球销售额进行时间序列分析

我们以月为节点绘制图形，从图像中可以观察到每年年初为销售淡季，年底为销售旺季，单月销售额差距可以达到3-4倍左右，建议公司注意分析销售淡旺季的出现原因，并针对性优化产品销售策略。

data_sales_time<-data[,c("purchasedate","sales")]
data_sales_time$year=year(data_sales_time$purchasedate)
data_sales_time$month=month(data_sales_time$purchasedate)
data_sales_time$purchasedate<-make_datetime(data_sales_time$year,data_sales_time$month)
data_sales_time%>%
  group_by(purchasedate)%>%
  summarise(
    sales=sum(sales)
  )->data_sales_time
ggplot(data=data_sales_time,aes(x=purchasedate,y=sales))+
  geom_line(color="dimgrey",size=2)+geom_point(color='black',size=5,shape=20)

总体销售额的变化趋势

2.4 对各个市场销售额在时间序列上的变化情况进行分析

2.4.1 整理数据

分析不同市场近年的销售额变化趋势。首先将数据按市场汇总销售额，按时间顺序进行排列，由于希望观察总体结构，统计全年的销售数据。

#将数据按购买年份和市场地区进行分类，观察对比不同市场时间序列上销售额的变化情况
data_country<-data_sales%>%group_by(country)
data_country%>%summarise(
  sales=sum(sales),
  quantity=sum(quantity)
)->data_country
data_country<-data_country[order(data_country$sales),]
data_sales%>%group_by(purchasedate,market)%>%
  summarise(
    sales=sum(sales)
  )->data_market
data_market

## # A tibble: 28 x 3
## # Groups:   purchasedate [4]
##    purchasedate market     sales
##           <dbl> <chr>      <dbl>
##  1         2011 EMEA     136420.
##  2         2011 非洲     127187.
##  3         2011 加拿大     8509.
##  4         2011 拉丁美洲 385098.
##  5         2011 美国     484247.
##  6         2011 欧盟     478743.
##  7         2011 亚太地区 639245.
##  8         2012 EMEA     163414.
##  9         2012 非洲     144481.
## 10         2012 加拿大    16097.
## # ... with 18 more rows

2.4.2 根据统计数据绘制各个市场销售额变化的折线图和柱状图对比

可见公司的主要市场为亚洲市场、欧盟市场、美国市场和拉丁美洲市场。
其中，亚洲市场为公司最大的市场，且近年保持着强大的上升势头，第二大市场为欧盟市场，也是增长速度最快的市场，美国市场和拉丁美洲市场规模相近。
公司在加拿大市场的销售额极低，增长率也几近为0.公司下一步可以考虑开发加拿大市场的潜力

#折线图与柱状图

ggplot(data_market, aes(x=purchasedate, y=sales,colour=market,group=market)) + geom_line(size=1)+geom_point(size=3)

library(splines2)
ggplot(data_market,aes(x=purchasedate,y=sales,fill=market))+
  geom_bar(stat = "identity",position=position_dodge()) +
  geom_smooth(aes(colour=market), se=F,
                method="glm",
                formula=y~ns(x,1),
                family=gaussian(link="log"),
                show.legend  = FALSE) +
    theme(legend.position="right")

2.5 分析各个市场商品均价

2.5.1 对每个市场销售商品的平均价格进行分析

由于销售额不能完整地反应公司商品的销售情况，因此我们选择平均价格对公司出售商品的均价进行分析，以探求公司在不同市场的业务结构。

#计算各个市场三年销售商品的均价（不进行时间序列分析是因为我发现时间上均价变化不大）
data_sales%>%group_by(market)%>%
  summarise(
    sales=sum(sales),
    quantity=sum(quantity),
    average_sales=sales/quantity
  )->average_sales
average_sales

## # A tibble: 7 x 4
##   market      sales quantity average_sales
##   <chr>       <dbl>    <dbl>         <dbl>
## 1 EMEA      806161.    11517          70.0
## 2 非洲      783773.    10564          74.2
## 3 加拿大     66928.      833          80.3
## 4 拉丁美洲 2164605.    38526          56.2
## 5 美国     2297201.    37873          60.7
## 6 欧盟     2938089.    37773          77.8
## 7 亚太地区 3585744.    41226          87.0

2.5.2 将不同市场的平均价格绘制成柱状图进行对比

由图中可见公司商品的平均价格最高的地区为亚太地区，均价接近87美元，最低的地区为拉丁美洲地区，均价为56.18557美元，相差接近40%。为了探求这种差异产生的原因，我们对不同市场销售的商品类型和不同类型商品的均价进行分析。

library(grid)
options(scipen=200)
ggplot(average_sales,aes(x=market,y=sales,xlab="",fill=market))+
  geom_bar(stat = "identity",position=position_dodge())+ theme(axis.text =element_text(size=5), axis.title.y=element_text(size=14)) ->p1
ggplot(average_sales,aes(x=market,y=quantity,fill=market))+
  geom_bar(stat = "identity",position=position_dodge())+ theme(axis.text =element_text(size=5), axis.title.y=element_text(size=14)) ->p2
ggplot(average_sales,aes(x=market,y=average_sales,fill=market))+
  geom_bar(stat = "identity",position=position_dodge())+ theme(axis.text =element_text(size=5), axis.title.y=element_text(size=14)) ->p3
layout <- matrix(c(1, 1, 1, 2, 2, rep(3, 5)), nrow = 2, byrow = TRUE)
p5 <- cowplot::plot_grid(p1, p2, p3, nrow = 2, labels = LETTERS[1:3])#将p1-p3 图组合成一幅图，按照两行两列排列，标签分别为A、B、C、D。（LETTERS[1:4] 意为提取26个大写英文字母的前3个:A、B、C）
p5

2.6 分析不同市场的商品结构

#观察总体数据不同类型商品的均价
type_average<-data_sales%>%group_by(type)%>%summarise(
  sales=sum(sales),
  quantity=sum(quantity),
  average_sales=sales/quantity
)
type_average

## # A tibble: 3 x 4
##   type        sales quantity average_sales
##   <chr>       <dbl>    <dbl>         <dbl>
## 1 办公用品 3787070.   108182          35.0
## 2 技术     4744557.    35176         135. 
## 3 家具     4110874.    34954         118.

#按商品类型、年份、市场对数据进行分类，观察不同市场消费商品类型的构成特征，并从时间序列上进行对比
data_sales%>%group_by(market,type)%>%summarise(
  sales=sum(sales)
)->data_type
data_type

## # A tibble: 21 x 3
## # Groups:   market [7]
##    market   type       sales
##    <chr>    <chr>      <dbl>
##  1 EMEA     办公用品 276686.
##  2 EMEA     技术     300855.
##  3 EMEA     家具     228621.
##  4 非洲     办公用品 266756.
##  5 非洲     技术     322367.
##  6 非洲     家具     194651.
##  7 加拿大   办公用品  30034.
##  8 加拿大   技术      26299.
##  9 加拿大   家具      10595.
## 10 拉丁美洲 办公用品 563921.
## # ... with 11 more rows

2.6.1 对比不同类型商品的商品均价

我们将和不同类型的商品均价进行对比，发现办公用品的销售量最大，均价最低。技术用品和家具的均价均超过了100美元。

options(scipen=200)
ggplot(type_average,aes(x=type,y=sales,fill=type))+
  geom_bar(stat = "identity",position=position_dodge())+ theme(axis.text =element_text(size=7), axis.title.y=element_text(size=14)) ->p1
ggplot(type_average,aes(x=type,y=quantity,fill=type))+
  geom_bar(stat = "identity",position=position_dodge())+ theme(axis.text =element_text(size=7), axis.title.y=element_text(size=14)) ->p2
ggplot(type_average,aes(x=type,y=average_sales,fill=type))+
  geom_bar(stat = "identity",position=position_dodge())+ theme(axis.text =element_text(size=7), axis.title.y=element_text(size=14)) ->p3
layout <- matrix(c(1, 1, 1, 2, 2, rep(3, 5)), nrow = 2, byrow = TRUE)
p5 <- cowplot::plot_grid(p1, p2, p3, nrow = 2, labels = LETTERS[1:3])
p5

###观察不同市场的商品结构可见美国市场技术类商品的销售额最高，拉高了整体的均价，均价最低的拉丁美洲市场三种类型商品的消费总额接近，但数量差别巨大，拉低了均价。

ggplot(data_type,aes(x=type,y=sales,fill=market))+
  geom_bar(stat = "identity",position=position_dodge()) +
  geom_smooth(aes(colour=market), se=F,
                method="glm",
                formula=y~ns(x,1),
                family=gaussian(link="log"),
                show.legend  = FALSE) +
    theme(legend.position="right")

ggplot(data_type,aes(x=market,y=sales,fill=type))+
  geom_bar(stat = "identity",position=position_dodge()) +
  geom_smooth(aes(colour=market), se=F,
                method="glm",
                formula=y~ns(x,1),
                family=gaussian(link="log"),
                show.legend  = FALSE) +
    theme(legend.position="right")

2.7 分析主要结论

通过对公司销售状况的分析，我们发现公司目前销售额来源来自亚太市场、北美市场和欧洲市场；销售额存在季度规律，每年年初为销售淡季，年底为销售旺季，且淡季旺季销售额差距非常大，可以达到3-4倍左右；除此之外我们将和不同类型的商品均价进行对比，发现办公用品的销售量最大，均价最低，薄利多销。技术用品和家具的均价均超过了100美元。

3 公司市场分布分析

3.1 各地区销售额柱状图分析

orders <- read_excel("global-superstore(English).xlsx","订单")
orders <- orders[rowSums(is.na(orders)) != ncol(orders),] # 删除空行
str(orders) # 查看各个字段的情况

## Classes 'tbl_df', 'tbl' and 'data.frame':    51290 obs. of  25 variables:
##  $ id          : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ orderid     : chr  "MX-2014-143658" "MX-2012-155047" "MX-2012-155047" "MX-2012-155047" ...
##  $ purchasedate: POSIXct, format: "2014-10-02" "2012-10-15" ...
##  $ shipdate    : POSIXct, format: "2014-10-06" "2012-10-20" ...
##  $ shipway     : chr  "标准级" "标准级" "标准级" "标准级" ...
##  $ custid      : chr  "SC-20575" "KW-16570" "KW-16570" "KW-16570" ...
##  $ custname    : chr  "常松" "郝立勤" "郝立勤" "郝立勤" ...
##  $ segment     : chr  "消费者" "消费者" "消费者" "消费者" ...
##  $ city        : chr  "Mexico City" "Dos Quebradas" "Dos Quebradas" "Dos Quebradas" ...
##  $ state       : chr  "Distrito Federal" "Risaralda" "Risaralda" "Risaralda" ...
##  $ country...11: chr  "墨西哥" "哥伦比亚" "哥伦比亚" "哥伦比亚" ...
##  $ country...12: chr  "Mexico" "Colombia" "Colombia" "Colombia" ...
##  $ zipcode     : logi  NA NA NA NA NA NA ...
##  $ market      : chr  "拉丁美洲" "拉丁美洲" "拉丁美洲" "拉丁美洲" ...
##  $ area        : chr  "北部" "南部" "南部" "南部" ...
##  $ productid   : chr  "OFF-LA-10002782" "FUR-FU-10004015" "FUR-BO-10002352" "OFF-BI-10004428" ...
##  $ type        : chr  "办公用品" "家具" "家具" "办公用品" ...
##  $ subtype     : chr  "标签" "用具" "书架" "装订机" ...
##  $ productname : chr  "Hon File Folder Labels, Adjustable" "Tenex Clock, Durable" "Ikea 3-Shelf Cabinet, Mobile" "Cardinal Binder, Clear" ...
##  $ sales       : num  13.1 252.2 193.3 35.4 71.6 ...
##  $ quantity    : num  3 8 2 4 2 2 2 3 4 2 ...
##  $ discount    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ profit      : num  4.56 90.72 54.08 4.96 11.44 ...
##  $ shipcost    : num  1.03 13.45 9.63 1.37 3.79 ...
##  $ priority    : chr  "中" "中" "中" "中" ...

names(orders)[names(orders) == 'country...11'] <- 'country_ch'
names(orders)[names(orders) == 'country...12'] <- 'country_en'
# 修改列名
for(i in 1:51290){
  if(orders$country_ch[i]=="美国"){
    orders$country_en[i] <- "United States of America"
  }
}
# 将之前不准确的翻译替换

library(REmap)
data <- data.frame(orders %>% group_by(country_en) %>%  summarise(totalsales = sum(sales))) 

# 画世界各国销售额情况地图
orders %>% group_by(market) %>% 
  summarise(totalsales = sum(sales)) %>%
  ggplot(.,aes(x=market,y=totalsales,fill=market))+geom_bar(stat='identity')+scale_colour_manual(values=c('#fbb4ae','#b3cde3', '#ccebc5','#decbe4','#fed9a6','#ffffcc',' #e5d8bd'))+theme_fivethirtyeight(base_family = "STKaiti")+labs(title=("各地区销售整体情况"))

#  ggplot(., mapping=aes(x = market, y = totalsales)) +
#  geom_bar(stat='identity') +
#  xlab('市场') + ylab('总销售额')

#p1<-

3.2 作不同市场的表格

orders %>% group_by(market) %>% 
  summarise(totalsales = sum(profit)) %>%
  ggplot(.,aes(x=market,y=totalsales,fill=market))+geom_bar(stat='identity')+
  scale_colour_manual(values=c('#fbb4ae','#b3cde3', '#ccebc5','#decbe4','#fed9a6','#ffffcc',' #e5d8bd'))+theme_fivethirtyeight(base_family = "STKaiti")+labs(title=("各地区总利润"))+
  xlab('市场') + ylab('总利润')

结论：公司的利润主要来自于亚太、欧盟，美国和拉美地区。若是从国家层面来看，美国，中国，印度是最主要的市场

data1 <- data.frame(orders %>% group_by(country_en) %>%  summarise(totalprofit = sum(profit))) 
data2 <- data.frame('country_en'=data$country_en,
                    'Operating_margin'=data1$totalprofit/data$totalsales) 
plot(remapC(data2,maptype = "world", title="各国销售利润率情况", color = 'green'))

## Save img as: C:\Users\86136\AppData\Local\Temp\Rtmpk1jSzS/ID_20200406120657_2837.html

3.3 分析主要结论

从这两组数据对比可知公司在不同市场的销售额分布与利润分布是不同步的，
比如俄罗斯，加拿大和非洲等国的销售额虽然不高，但是其销售利润率较高，
说明这些国家和地区的收益水平较高，可以考虑进一步开发这些市场。

总而言之，公司的市场分布呈现出高度集中、不均匀的态势：从销售额看，美国遥遥领先；
从利润来看，主要集中于美国，中国，印度，欧盟等国。
除此之外，销售额和利润呈现出不同步的发展趋势，像美国，欧盟等国销售额很高，但销售利润率较低
而像俄罗斯，加拿大和非洲等国的销售额虽然不高，但是其销售利润率较高
因此我们建议公司进一步开发像俄罗斯、加拿大和非洲这些销售利润率较高的潜力市场，
并考虑及时放弃像土耳其、尼日利亚等会给公司带来大量损失的市场。

若是再结合上一部分和下一部分的时间序列分析，发现无论是销售额还是利润，最主要的市场都是亚太、欧美、美国、拉美地区，而且这四个地区每年的销售额和利润一直持续增长，都还有进一步开拓的可能。但是从增长速度来看，美国市场的开发已经十分完善了，利润增速不如其他三个市场，销售利润率也较低，这意味着今后美国市场的销售策略应该以维稳为主，而将市场销售的重心逐步移到亚太、欧盟等仍有开拓空间的成熟市场。初次之外，也要关注开发拉美和非洲这种仍具有较大发展潜力的新兴市场。

4 公司利润详细分析

本次分析的目的是得出该公司利润的构成以及相关的规律.

4.1 公司利润的特点分析

4.1.1 对profit-year进行分析

orders %>% 
select(purchasedate,profit) %>%
   separate(purchasedate, into = c("year", "month", "day"), sep = "-") %>%
ggplot() +
  geom_point(mapping = aes(x = year, y = profit,color = year))

profit——year

我们可以看出这样做出的点图无法辨别出2011-2014四年的利润差距，所以我们计算出每年的利润均值来进行比较。

orders %>% 
select(purchasedate, profit) %>%
  separate(purchasedate, into = c("year", "month", "day"), sep = "-") %>% 
  group_by(year) %>% 
  summarise(profit = mean(profit)) %>% 
  ggplot() +
  geom_point(mapping = aes(x = year, y = profit))

mean profit——year

我们可以发现，利润在2013年均值最高，2011年均值最低，但结合上一张图表，我们可以看出四年间利润的最大值和最小值均在2013年取得。

4.1.2 对profit-month进行分析

orders %>%
select(purchasedate,profit) %>% 
  separate(purchasedate, into = c("year", "month", "day")) %>% 
  group_by(month) %>% 
  summarise(profit = sum(profit)) %>%
  ggplot(aes(x=month,y=profit,fill=month))+
  geom_bar(stat = 'identity')+xlab('month')+ylab('profit')

利润——月份图

  #ggplot()+
    #geom_col(mapping = aes(x = month,y=profit,fill = month))+
    #theme_bw()+
    #scale_fill_brewer()+
    #labs(x="month",y="profit")

从这张图中我们可以较为清晰的看出总利润大多分布在8月到12月，1月到3月的利润较低，利润在11月取到了最大值，在2月取到最小值，因此我们认为在秋季前后包括秋季是利润的高峰期，公司可以利用这一利润特点选择适当的销售策略来获取更多的利润。

4.1.3 对profit-month进行单独年份的分析

orders%>%mutate(year= lubridate::year(purchasedate), 
                month= lubridate::month(purchasedate) )%>% group_by(month,year)%>% summarise(totalprofit=sum(profit))%>% ggplot(.,aes(x=month,y=totalprofit,colour= factor(year)))+geom_point(color="#FF4500")+geom_line()

profit in 2014——month

通过分析我们发现，月份——利润图像在每一年的分布基本上符合总利润的分布情况，但在2013年秋季一直呈增长趋势，这与上面分析到的2013年利润均值最高有一定程度的联系。

4.1.4 对profit-segment进行分析

orders %>% 
  select(segment,profit) %>% 
  group_by(segment) %>% 
  summarise(profit = sum(profit)) %>% 
  ggplot()+
    geom_col(mapping = aes(x = segment,y=profit,fill = segment))+
    theme_bw()+
    scale_fill_brewer()+
    labs(x="segment",y="profit")

profit——segment

我们可以看出：在不同的消费者类型中，“消费者”带来的利润最多，“家庭办公室”带来的利润最少，“公司”带来的利润居中，公司可以将销售重心向“消费者”类型转变。

4.1.5 对profit-country进行分析

orders %>% 
  select(country_ch, profit) %>% 
  group_by(country_ch) %>% 
  summarise(profit = sum(profit)) %>% 
  arrange(desc(profit)) %>% 
  mutate(percent = (profit-min(profit))/(max(profit)-min(profit)))

## # A tibble: 147 x 3
##    country_ch  profit percent
##    <chr>        <dbl>   <dbl>
##  1 美国       286397.   1    
##  2 中国       150683.   0.647
##  3 印度       129072.   0.591
##  4 英国       111900.   0.547
##  5 法国       109029.   0.539
##  6 德国       107323.   0.535
##  7 澳大利亚   103907.   0.526
##  8 墨西哥     102818.   0.523
##  9 西班牙      54390.   0.397
## 10 萨尔瓦多    42023.   0.365
## # ... with 137 more rows

针对不同国家的利润分布，我们可以看出，利润最多的三个国家分别是美国，中国和印度。我们以美国为例来观察它的利润变动趋势：

美国利润情况

我们可以看出对于美国而言，它的利润是逐年增长的，并且在2014年达到了最大值。我们绘制了不同国家的利润地图，如图所示：

for(i in 1:51290){
  if(orders$country_ch[i]=="美国"){
    orders$country_en[i] <- "United States of America"
  }
}
# 将之前不准确的翻译替换
data <- data.frame(orders %>% group_by(country_en) %>%  summarise(total_profit = sum(profit))) 

plot(remapC(data,maptype = "world", title="各国利润情况", color = 'red'))

## Save img as: C:\Users\86136\AppData\Local\Temp\Rtmpk1jSzS/ID_20200406120709_4071.html

4.1.6 对profit-state进行分析

orders %>% 
  select(state, profit) %>% 
  group_by(state) %>% 
  summarise(profit = sum(profit)) %>% 
  arrange(desc(profit)) %>% 
  mutate(percent = (profit-min(profit))/(max(profit)-min(profit)))

## # A tibble: 1,094 x 3
##    state                  profit percent
##    <chr>                   <dbl>   <dbl>
##  1 England                99908.   1    
##  2 California             76381.   0.818
##  3 New York               74039.   0.799
##  4 Ile-de-France          44056.   0.567
##  5 New South Wales        43696.   0.564
##  6 North Rhine-Westphalia 42348.   0.554
##  7 San Salvador           35883.   0.503
##  8 Washington             33403.   0.484
##  9 Michigan               24463.   0.415
## 10 S<U+00E3>o Paulo              21878.   0.395
## # ... with 1,084 more rows

针对不同州的利润分布，我们可以看出，利润最多的三个州分别是England,California and New York.

4.1.7 对profit-market进行分析

orders %>% 
  select(market, profit) %>% 
  group_by(market) %>% 
  summarise(profit = sum(profit)) %>% 
  arrange(desc(profit)) %>% 
  mutate(percent = (profit-min(profit))/(max(profit)-min(profit)))

## # A tibble: 7 x 3
##   market    profit percent
##   <chr>      <dbl>   <dbl>
## 1 亚太地区 436000.  1     
## 2 欧盟     372830.  0.849 
## 3 美国     286397.  0.642 
## 4 拉丁美洲 221643.  0.487 
## 5 非洲      88872.  0.170 
## 6 EMEA      43898.  0.0624
## 7 加拿大    17817.  0

我们可以看出亚太地区的销售利润额最高,其次是欧盟和美国。

4.1.8 对type-totalprofit进行分析

在所有标签中，技术类产品带来的利润值最高，其次是办公用品类产品，家具类类产品带来的利润最低，可知技术类产品能为贸易带来巨大的收益，社会在科技的驱动下对技术类产品的需求较大，开发、生产技术类和办公用品类产品可以作为企业近几年的核心业务。

orders6<-orders%>%
group_by(type)%>%
summarise(totalprofit=sum(profit))
options(scipen=200)
ggplot(orders6,aes(x=type,y=totalprofit,fill=type))+
  geom_bar(stat = 'identity')+xlab('type')+ylab('totalprofit')

subtype-totalprofit

4.1.9 对subtype-totalprofit进行分析

在所有子标签中，“复印机”产品带来的利润值最高，其次是“电话”产品，且两者同为技术类产品，可知技术类产品能为贸易带来巨大的收益，社会在科技的驱动下对技术类产品的需求较大，开发、生产技术类产品可以作为企业近几年的核心业务。反观，“桌子”产品带来了较大的负亏损，企业应该考虑减少这一业务或者关闭“桌子”业务

orders1<-orders%>%
  drop_na(profit)%>%
  group_by(subtype)%>%
  summarise(totalprofit=sum(profit))
  options(scipen=200)
   ggplot(orders1,aes(totalprofit, fct_reorder(subtype, totalprofit))) +geom_point(color="orange")

subtype-totalprofit

4.1.10 对shipcost-profit进行分析

“办公用品”、“技术”、“家具”类产品的利润与船舶费用都呈现正的线性相关性，其中“技术”类产品由于产品的特殊性和易损性会有大额的船舶费用成本。增加等量的船舶费用时，“办公用品”类产品增加的利润最多，“家具”类产品增加的利润最少，但总体而言相差不大

#orders2<-orders%>%
#tidyr::unite(orders2, "vs_am", vs, am)

ggplot(orders,aes(x=shipcost,y=profit,color=type))+geom_point(alpha=0.1,size=1.0,shape=21,fill='white',stroke=2)+geom_smooth(method='glm')+xlab('shipcost')+ylab('profit')

shipcost-profit

4.1.11 对priority-profit进行分析

优先级”中“”高“的产品能带来较高的利润，其中”中“优先级产品带来的利润最高。每一种优先级的产品都有正、负利润的情况。在”中“”高“优先级中可看出办公用品和技术类产品利润占比较大，建议企业将重心放在”中“”高“优先级产品上，特别是其中的办公用品和技术类产品

options(scipen=200) 
ggplot(orders, aes(x=priority,y=profit,fill=type)) +           geom_bar(stat="identity") +     
  labs(title="priority-profit.", x="priority", y="profit", fill="type") + theme(plot.title = element_text(size=10, margin=margin(t=10, b=10)))

priority-profit

4.2 市场、地区与利润关系的分析

4.2.1 对market-profit进行分析

整体情况看来每个市场都实现了正利润，每个市场都有保留的价值。亚太地区市场利润最大，从2011年到2014年的数据中，亚太地区市场的总利润一直在所有市场之上，2012到2013年出现一次大的增长，从2013到2014年的增长趋势看来，亚太地区市场在高利润上还有利润增长潜力，是企业未来市场发展的重点。其次欧盟和美国也占有较大利润，欧盟市场利润在2013到2014年增长率大幅度上升，未来利润有超越亚太地区市场的趋势，是企业未来几年市场发展的重点。加拿大市场提供的利润最小，结合利润增长趋势来看，加拿大市场除2011到2012年由小的涨幅外，一直处于利润低水平状态，可能已经达到市场利润的天花吧，企业应该重新评估加拿大市场的利润潜力，评估风险，一旦出现负利润则立即退出市场。

orders2<-orders%>%
group_by(market)%>%
  summarise(totalprofit=sum(profit))
ggplot(orders2,aes(x=market,y=fct_reorder(market, totalprofit),fill=market))+
  geom_bar(stat = 'identity')+xlab('market')+ylab('totalprofit')+scale_fill_manual(values=c("#00008B","#0000CD","#4169E1","#1E90FF","#6495ED","#00BFFF","#87CEFA"),0.5)

market-profit

orders%>%mutate(year= lubridate::year(purchasedate), 
                quarter= lubridate::quarter(purchasedate) )%>% group_by(year,market)%>% summarise(totalprofit=sum(profit))%>% ggplot(.,aes(x=year,y=totalprofit,colour= factor(market)))+geom_point(color="#FF4500")+geom_line()+scale_colour_manual(values=c("#00008B","#0000CD","#4169E1","#1E90FF","#6495ED","#00BFFF","#87CEFA"),0.5)

yeat-market-totalprofit

4.2.2 对area-profit进行分析

整体情况看来每个地区都达到了正利润。中部地区利润远远领先其他地区，从2011年到2014年的数据中，中亚地区的总利润一直在所有地区之上，2012到2013年出现一次大的增长，2013到2014年增长率降低将近为0，达到地区市场利润规模天花板，企业在中亚地区不需要更大的扩张，重点在于保持地区优势。东南亚和加拿大地区利润处于较低水平，且东南亚和加拿大地区利润增长率一直处于低水平低增长状态，企业应该重新评估两地区的潜在利润规模和风险，一旦出现负利润即退出地区市场。其他地区利润水平在70000到200000之间分布，对于中利润水平地区，企业应该结合地区特点，有针对性的推进利润规模

orders4<-orders%>%
group_by(area)%>%
  summarise(totalprofit=sum(profit))

ggplot(orders4,aes(x=area,y=fct_reorder(area, totalprofit),fill=area))+
  geom_bar(stat = 'identity')+xlab('area')+ylab('totalprofit')+scale_fill_manual(values=c("#00008B","#0000CD","#4169E1","#1E90FF","#6495ED","#00BFFF","#87CEFA","#B0C4DE","#ADD8E6","#4682B4","#008B8B","#5F9EA0","#00CED1"),0.5)

area-profit

orders%>%mutate(year= lubridate::year(purchasedate), 
                quarter= lubridate::quarter(purchasedate) )%>% group_by(year,area)%>% summarise(totalprofit=sum(profit))%>% ggplot(.,aes(x=year,y=totalprofit,colour= factor(area)))+geom_point(color="#FFC0CB")+geom_line()+scale_colour_manual(values=c("#00008B","#0000CD","#4169E1","#1E90FF","#6495ED","#00BFFF","#87CEFA","#B0C4DE","#ADD8E6","#4682B4","#008B8B","#5F9EA0","#00CED1"),0.5)

year-area-totalprofit

4.3 分析主要结论

1.在时间分布上，四年中年利润在2013年均值最高，2011年均值最低，且四年间利润的最大值和最小值均在2013年取得，在不同的月份中，利润大多分布在8月到12月，1月到3月的利润较低，利润在11月取到了最大值，在2月取到最小值，因此我们认为在秋季前后包括秋季是利润的高峰期。不同年份的年利润基本按照四年总利润的月份分布趋势。

2.在不同的消费者类型中，“消费者”带来的利润最多，“家庭办公室”带来的利润最少，“公司”带来的利润居中。公司可以考虑将更多的重心放在“消费者”类型的客户群体上。

3.在地理位置上，针对不同国家的利润分布，利润最多的三个国家分别是美国，中国和印度。对于美国而言，它的利润是逐年增长的，并且在2014年达到了最大值。针对不同州的利润分布，利润最多的三个州分别是England,California and New York.对于中国而言，利润位居前三的分别是：广东省、山东省、江苏省。针对不同地区市场的利润分布，利润额最高的是亚太地区，其次是欧盟地区。

4.技术类产品和办公用品类产品为企业带来了最高的利润，结合产品优先级的分析，优先级”中“”高“的产品能带来较高的利润，在”中“”高“优先级中办公用品和技术类产品利润占比较大。结合当下社会科技化的特点以及办公用品巨大的市场需求，预测对技术类产品、办公用品类产品需求还会增大，开发、生产、销售“中、高级”技术类和办公用品类产品一应该作为几年内公司业务发展的重点。

5 公司市场业绩整体表现和不同区域表现分析

5.1 市场业绩整体表现分析

5.1.1 时间序列图像

对于业绩的整体分析，我们采用了时间序列的方式，将数据按照年月合并，并计算出不同年月的利润总额、销售总额、利润率以及运输成本，画出了他们从2011年1月至2014年11月的时间序列图像，并将时间序列图像按照趋势变动、季节性变动、不规则变动三个纬度进行分解。

global_superstore<-read_excel('global-superstore.xlsx')
global_superstore$year<- year(global_superstore$purchasedate)#取出日期中的年份数据
global_superstore$month<- month(global_superstore$purchasedate)#取出日期数据中的月份数据
global_superstore$shipcost=as.numeric(global_superstore$shipcost)
global_superstore$shipcost[is.na(global_superstore$shipcost)] <- 0
month_data<-global_superstore%>%group_by(year,month)%>%summarise(sum_sales=sum(sales),sum_pro=sum(profit),sum_shipcost=sum(shipcost))#对数据按月分组
month_data$profit_margin<-month_data$sum_pro/month_data$sum_sales#计算利润率

#设置时间序列
timeserise_profit<-ts(month_data$sum_pro,frequency = 12,start=c(2011/1))
timeserise_sales<-ts(month_data$sum_sales,frequency = 12,start=c(2011/1))
timeserise_profit_margin<-ts(month_data$profit_margin,frequency = 12,start=c(2011/1))
timeserise_shipcost<-ts(month_data$sum_shipcost,frequency = 12,start=c(2011/1))
#利用“加法”原理对时间序列进行分解，分解为趋势性，季节性，随机性
dc_profit<-decompose(timeserise_profit, type = c("additive"), filter = NULL)
dc_profit_margin<-decompose(timeserise_profit_margin, type = c("additive"), filter = NULL)
dc_sales<-decompose(timeserise_sales, type = c("additive"), filter = NULL)
dc_shipcost<-decompose(timeserise_shipcost, type = c("additive"), filter = NULL)
#画出他们的图像
plot(dc_profit,col="orange2",lwd=2)+title(sub="profit ",cex.sub=1,col.sub='red')

## integer(0)

plot(dc_profit_margin,col="darkblue",lwd=2)+title(sub="profit_margin",cex.sub=1,col.sub='red')

## integer(0)

plot(dc_sales,col="bisque4",lwd=2);title(sub="sales",cex.sub=1,col.sub='red')

plot(dc_shipcost,col="aquamarine4",lwd=2);title(sub="shipcost",cex.sub=1,col.sub='red')

从图像中我们可以看出公司的4年来的销售量以及利润,装运成本整体上呈现出上升的趋势，但是利润率自2012年年末达到顶峰后，2013和2014年都是一个下降。
在2013年，利润率一直呈现一个走低的趋势，直到2014年才有所回升好。虽然2014年起有所回升，但是明显大不如2012年的状态。
结合随机性曲线来看，可能是因为在2013年春季出现了一个随机事件，该事件虽然对销量影响不是很大，但是使得利润急剧下降，达到了史上最低点，可能这也是使得2013年表现不好的一个原因。猜测可能是原材料或者供货渠道的成本增加，使得利润下降。
此外，我们还从利润、销量、以及利润率，装运成本方面看见了该公司的销售情况有着季节性的周期特点。
该公司一年中的销量以及利润有两个顶点，分别处于每年的年初1月份和年末12月份。结合该公司的业务主要是公司大小办公用品采购。猜测这个是因为每年年初和年末是公司采购办公用品的时候，需求较大，因此有很好的销售量。而对于装运成本来看，每年的春季成本较低，但是往后，一直在升高，并在冬季达到一个顶点。

5.1.2 未来发展趋势预测

#对利润率预测
adf.test(timeserise_profit_margin)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  timeserise_profit_margin
## Dickey-Fuller = -3.8277, Lag order = 3, p-value = 0.02459
## alternative hypothesis: stationary

#利润率对时间序列是平稳序列，无需差分处理
Box.test(timeserise_profit_margin,type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  timeserise_profit_margin
## X-squared = 0.45091, df = 1, p-value = 0.5019

白噪声检验，p值大于0.05，结果显示属于白噪声，随机序列，对其建模意义不大。同理其他值的时间序列做了检测，发现这均属于白噪声，在时间角度对其建模预测没有什么意义。

5.2 综合指标构建及趋势预测

5.2.1 综合指标构建

将利润、销售额、成本按熵值法，构造综合指标，评估公司业绩的整体表现，可以更加直观的反应公司的实际情况。

#熵值法函数构造
Rescale = function(x, type=1) {
  # type=1正向指标, type=2负向指标
  rng = range(x, na.rm = TRUE)
  if (type == 1) {
    (0.996 - 0.002) * (x - rng[1]) / (rng[2] - rng[1]) + 0.002
  } else {
    (0.996 - 0.002) *  (rng[2] - x) / (rng[2] - rng[1]) + 0.002
  }
}
Entropy_Weight = function(X, index) {
  # 实现用熵权法计算各指标(列）的权重及各数据行的得分
  # X为指标数据, 一行代表一个样本, 每列对应一个指标
  # index指示向量，指示各列正向指标还是负向指标，1表示正向指标，2表示负向指标
  # s返回各行（样本）得分，w返回各列权重
  
  pos = which(index == 1)
  neg = which(index != 1)
  
  # 数据归一化
  X[,pos] = lapply(X[,pos], Rescale, type=1)
  X[,neg] = lapply(X[,neg], Rescale, type=2)
  
  # 计算第j个指标下，第i个样本占该指标的比重p(i,j)           
  P = data.frame(lapply(X, function(x) x / sum(x)))
  
  # 计算第j个指标的熵值e(j)
  e = sapply(P, function(x) sum(x * log(x)) *(-1/log(nrow(P))))
  
  d = 1 - e         # 计算信息熵冗余度
  w = d / sum(d)   # 计算权重向量
  
  # 计算样本得分
  s = as.vector(100 * as.matrix(X) %*% w)
  
  list(w=w, s=s)
}
#导入数据
x=month_data[,c(3,4,5)]
x

## # A tibble: 48 x 3
##    sum_sales sum_pro sum_shipcost
##        <dbl>   <dbl>        <dbl>
##  1    98898.   8322.       10545.
##  2    91152.  12418.       10681.
##  3   145729.  15304.       15670.
##  4   116916.  12902.       12955.
##  5   146748.  12184.       16443.
##  6   215207.  23415.       23813.
##  7   115510.   5585.       12905.
##  8   207581.  23714.       22001.
##  9   290214.  35777.       31019.
## 10   199071.  25963.       21380.
## # ... with 38 more rows

ind = c(1,1,2)
com_data=Entropy_Weight(x, ind)
timeserise_com<-ts(com_data$s,frequency = 12,start=c(2011/1))
dc_com<-decompose(timeserise_com, type = c("additive"), filter = NULL)
plot(dc_com,col="bisque4",lwd=2);title(sub="com_data",cex.sub=1,col.sub='red')

利用熵值法得到的综合指标“com_data”，从它的时间序列图像看出，虽然利润率方面公司处于一个变动情况下，但是该公司整体状况仍旧处于上升趋势，势头较好。且在下半年的整体情况优于上半年。 ###对综合指标的预测首先判断指标是否适合建立时间序列模型

#对整体指标进行预测
adf.test(timeserise_com)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  timeserise_com
## Dickey-Fuller = -4.2468, Lag order = 3, p-value = 0.01
## alternative hypothesis: stationary

Box.test(timeserise_com,type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  timeserise_com
## X-squared = 19.46, df = 1, p-value = 0.00001027

从返回结果可以看出该序列是平稳序列，白噪声检测中的p值小于0.05，非白噪声，可以进行时间序列建模。利用auto.arima函数得到模型的最佳参数值为ARIMA(0,1,1)[12],接下来对模型进行检验。

auto.arima(timeserise_com)

## Series: timeserise_com 
## ARIMA(1,0,0)(1,1,0)[12] with drift 
## 
## Coefficients:
##          ar1     sar1   drift
##       0.3280  -0.6135  0.6042
## s.e.  0.1623   0.1496  0.0791
## 
## sigma^2 estimated as 32.13:  log likelihood=-114.86
## AIC=237.73   AICc=239.02   BIC=244.06

model=arima(timeserise_com,order=c(0,1,1))#构建模型
qqnorm(model$residuals);qqline(model$residuals)

Box.test(model$residuals,type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  model$residuals
## X-squared = 0.087419, df = 1, p-value = 0.7675

从qq图以及残差的白噪声检测结果看出，模型通过检验，说明模型构建较好，可以用来预测。绘制模型的预测图，预测未来1年的整体指标情况。

#利用模型预测
future<-predict(model,24)
model_forecast=forecast(model,h=12)
L1=model_forecast$fitted-1.96*sqrt(model$sigma2)
U1=model_forecast$fitted+1.96*sqrt(model$sigma2)
L2=ts(model_forecast$lower[,2])
U2=ts(model_forecast$upper[,2])
plot(timeserise_com,lwd=2,main='整体情况预测')
lines(model_forecast$fitted,col='red',lty=2,lwd=2)
lines(model_forecast$mean,col='darkblue',lwd=2)
lines(L1,col=4,lty=2)
lines(U1,col=4,lty=2)
lines(L2,col=4,lty=2)
lines(U2,col=4,lty=2)

模型给出了在未来公司整体情况的发展趋势，虽然模型在前面通过了检验，但是由于数据较少，可以看出预测段数据处于一个近乎直线的状态，这也是时间序列模型进行预测的一个弊端之一，不适合长期预测。

5.3 市场业绩不同区域表现分析

5.3.1 市场整体情况分析

area_data<-global_superstore%>%group_by(market,year)%>%summarise(sum_sales=sum(sales),sum_pro=sum(profit),sum_shipcost=sum(shipcost))#对数据按区域和年份分组
area_data$profit_margin=area_data$sum_pro/area_data$sum_sales
#区域整体情况分析
p1<-ggplot(area_data,aes(x=market,y=sum_sales,fill=market))+geom_bar(stat='identity')+scale_colour_manual(values=c('#fbb4ae','#b3cde3', '#ccebc5','#decbe4','#fed9a6','#ffffcc',' #e5d8bd'))+theme_fivethirtyeight(base_family = "STKaiti")+labs(title=("各地区销售整体情况"))
p2<-ggplot(area_data,aes(x=market,y=sum_pro,fill=market))+geom_bar(stat='identity')+scale_colour_manual(values=c('#fbb4ae','#b3cde3', '#ccebc5','#decbe4','#fed9a6','#ffffcc',' #e5d8bd'))+theme_fivethirtyeight(base_family = "STKaiti")+labs(title=("各地区利润率整体情况"))
p3<-ggplot(area_data,aes(x=market,y=profit_margin,fill=market))+geom_bar(stat='identity')+scale_colour_manual(values=c('#fbb4ae','#b3cde3', '#ccebc5','#decbe4','#fed9a6','#ffffcc',' #e5d8bd'))+theme_fivethirtyeight(base_family = "STKaiti")+labs(title=("各地区利润整体情况"))
p4<-ggplot(area_data,aes(x=market,y=sum_shipcost,fill=market))+geom_bar(stat='identity')+scale_colour_manual(values=c('#fbb4ae','#b3cde3', '#ccebc5','#decbe4','#fed9a6','#ffffcc',' #e5d8bd'))+theme_fivethirtyeight(base_family = "STKaiti")+labs(title=("各地区运输成本整体情况"))
plot_lst <- list()
plot_lst[1][1] <- p1
plot_lst[1][2] <- p2
plot_lst[2][1]<-p3
plot_lst[2][2]<-p4
library(gridExtra)
grid.arrange(p1,p2,p3,p4,ncol=2)

综合销售、利润、利润率以及运输成本我们可以看出加拿大地区虽然销售量以及利润不高，但是由于他的运输成本最低，因此使得他的利润率处于最高水平。另一方面，由于亚太地区，欧盟，以及拉丁美洲的运输成本过高，使得这三个市场在销售量利润均取得较高值的情况下，利润率表现并不理想，亚太地区受运输成本影响尤为明显。

5.3.2 市场按年份分析

p5<-ggplot(area_data,aes(x=market,y=sum_sales,fill=factor(year)))+
  geom_bar(stat='identity',position = 'dodge')+
  scale_colour_manual(values=c('#a6611a','#dfc27d', '#80cdc1','#018571'))+
  theme_fivethirtyeight(base_family = "STKaiti")+
  labs(x='',y='sum_sales',title=("2011-2014销售情况"))+
  theme(legend.position="bottom", legend.direction="horizontal",
        legend.title=element_text(size=6),axis.text = element_text(size = 9),legend.text = element_text(size = 8), axis.title = element_text(size = 12)) 
p6<-ggplot(area_data,aes(x=market,y=sum_pro,fill=factor(year)))+
  geom_bar(stat='identity',position = 'dodge')+
  scale_colour_manual(values=c('#a6611a','#dfc27d', '#80cdc1','#018571'))+
  theme_fivethirtyeight(base_family = "STKaiti")+
  labs(x='',y='sum_profit',title=("2011-2014利润情况"))+
  theme(legend.position="bottom", legend.direction="horizontal", legend.title=element_text(size=6), axis.text = element_text(size = 9),legend.text = element_text(size = 8), axis.title = element_text(size = 8))

p7<-ggplot(area_data,aes(x=market,y=profit_margin,fill=factor(year)))+
  geom_bar(stat='identity',position = 'dodge')+
  scale_colour_manual(values=c('#a6611a','#dfc27d', '#80cdc1','#018571'))+
  theme_fivethirtyeight(base_family = "STKaiti")+
  labs(x='',y='profit_margin',title=("2011-2014利润率情况"))+
  theme(legend.position="bottom", legend.direction="horizontal", legend.title = element_blank(), axis.text = element_text(size = 9),legend.text = element_text(size = 8), axis.title = element_text(size = 8)) 
p8<-ggplot(area_data,aes(x=market,y=sum_shipcost,fill=factor(year)))+
  geom_bar(stat='identity',position = 'dodge')+
  scale_colour_manual(values=c('#a6611a','#dfc27d', '#80cdc1','#018571'))+
  theme_fivethirtyeight(base_family = "STKaiti")+
  labs(x='',y='shipcost',title=("2011-2014运输成本情况"))+
  theme(legend.position="bottom", legend.direction="horizontal", legend.title = element_blank(), axis.text = element_text(size = 9),legend.text = element_text(size = 8), axis.title = element_text(size = 8)) 
grid.arrange(p5,p6,ncol=2)

grid.arrange(p7,p8,ncol=2)

整体上看，7个市场的销售量和利润四年来呈现上升的趋势，但是利润率除非洲，加拿大有波动外，其他几个区域没有明显变化，基本持平不变，这和公司整体的趋势大致吻合。需要注意的是，具有较高购买力的地区的运输成本也在逐年增高，这也限制了公司在这些地区的业绩表现。

p9<-ggplot(area_data,aes(x=market,y=sum_sales,fill=market))+
geom_bar(stat='identity')+
scale_colour_manual(values=c('#fbb4ae','#b3cde3', '#ccebc5','#decbe4','#fed9a6','#ffffcc',' #e5d8bd'))+facet_wrap(~ year)+theme_fivethirtyeight(base_family = "STKaiti")+
labs(x='',y='sum_sales',title=("2011-2014各地区销售情况"))+
  theme(legend.position="bottom", legend.direction="horizontal", legend.title = element_blank(), axis.text = element_text(size = 5), 
        legend.text = element_text(size = 8), axis.title = element_text(size = 9))
p10<-ggplot(area_data,aes(x=market,y=sum_pro,fill=market))+
  geom_bar(stat='identity')+
  scale_colour_manual(values=c('#fbb4ae','#b3cde3', '#ccebc5','#decbe4','#fed9a6','#ffffcc',' #e5d8bd'))+facet_wrap(~ year)+theme_fivethirtyeight(base_family = "STKaiti")+
  labs(x='',y='sum_profit',title=("2011-2014各地区利润情况"))+
  theme(legend.position="bottom", legend.direction="horizontal", legend.title = element_blank(), axis.text = element_text(size = 5), 
        legend.text = element_text(size = 8), axis.title = element_text(size = 9))
ggplot(area_data,aes(x=market,y=profit_margin,fill=market))+
  geom_bar(stat='identity')+
  scale_colour_manual(values=c('#fbb4ae','#b3cde3', '#ccebc5','#decbe4','#fed9a6','#ffffcc',' #e5d8bd'))+facet_wrap(~ year)+theme_fivethirtyeight(base_family = "STKaiti")+
  labs(x='',y='profit_margin',title=("2011-2014各地区利润率情况"))+
  theme(legend.position="bottom", legend.direction="horizontal", legend.title = element_blank(), axis.text = element_text(size = 10), 
        legend.text = element_text(size = 13), axis.title = element_text(size = 14))

grid.arrange(p9,p10,ncol=2)

具体按照年份来看,四年来，亚太地区，欧盟，以及拉丁美洲的销售量和利润都居于前三，但是利润率表现却不如加拿大市场。

5.4 分析主要结论

1.从综合指标的时间序列分析来看，该公司业绩整体呈现上升的趋势，下半年的业绩情况要好于上半年。建议该公司可以在上半年加强宣传和推广，扩大业绩，以弥补高额的运输成本 2.结合区域来看，亚太地区作为该公司的最大市场，购买力较强，销售量也很好，但是利润率却始终不高，四年来甚至还有下降的趋势。公司应当加强对亚太地区的成本管理，降低运输、人力等成本，节约开销提高利润率的同时，继续加大对该地区的宣传，保持该市场的活性。对于加拿大地区而言，虽然购买力不如其他市场，但是利润率四年来却始终是最高，说明该地有开发潜力，成本较低，很适合作为公司大力进入的新市场。公司应当加大对加拿大地区的宣传和产品投放开拓开地区。

6 启示建议

 1、公司应该把握年初1月份和年末12月份两个销售旺季节点，特别是在春季运费较低的前提下，利用商品销售季节性周期变化的特点，积极开展营销推广和促销组合活动；同时注意到秋季前后是利润高峰期，而春季利润较低，可以根据以上特点，从年9月份到次年1月份开展不同形式的促销活动。
2、亚太地区作为该公司的最大市场，购买力较强，销售量也很好，但是利润率却始终不高，四年来甚至还有下降的趋势。公司应当加强对亚太地区的成本管理，降低运输、人力等成本，节约开销提高利润率的同时，继续加大对该地区的宣传，保持该市场的活性。
3、而对于加拿大地区而言，虽然购买力不如其他市场，但是利润率四年来却始终是最高，说明该地有开发潜力，成本较低，很适合作为公司大力进入的新市场。公司应当加大对加拿大地区的宣传和产品投放开拓开地区。

本次数据分析耗时1.07分钟

global-superstore数据可视化分析报告

作者：WS小组 胡智维、焦点 第一部分；王茂安、胡豪 第二部分；史瑞、石承致 第三部分，分析公司产品利润情况；宋诗凝、江明子 第四部分