Introduction:

The “CPI” - which is short for “Consumer Price Index” - is an index about the weighted average price. It measures the price of a basket of goods, such as foods, transportation and medical care. It is widely used to measure the inflation. The IMF (International Monetary Fund) keeps the records of CPI for many countries.

Data Resource and Description:

In this study, we uses the data from IMF database, and its download address is: http://data.imf.org/regular.aspx?key=61015892.

We chose three different countries: China, India, and the United States, for the recent five years, which is from 2013 to 2018 April. The data includes their every month’s CPI.

The raw data has two columns:

  1. The “cpi” value.

  2. The “yearm” is the year and month of that cpi. For example, a value of “2013M01” means the January of 2013. To make the data more organized, we added several columns to the raw data.

In part I, we used the data set of “china.csv”, “india.csv” and “us.csv”:

  1. The column of “cpi” contains the cpi value.

  2. The column of “year” contains the data in which year when that CPI happens, and it is gained from the raw data of “yearm”.

In part II, we used the data set of “cpi.csv”:

  1. The “cpi” contains the cpi value.

  2. The “monthfactor” contains the data in which month when that CPI happens, and it is also gained from the raw data of “yearm”. For example, a value of “M06” means June.

Statement of the Goal:

I.In the first part, we will draw a plot for the CPI of all the 3 nations.

We will clean the data first, calculating the yearly average CPI. Due to the different ranges among those three nations, another y-axis would be applied. We will put all the three nations’ CPI data in one plot, using different colors to distinguish them. Proper names for x-axis or y-axis as long as legends would be applied.

II.Then in the second part, we will focus on the U.S. CPI from 2013 January to 2018 April.

We will compare whether there are differences among the 12 month, using HSD method. Then draw a box plot to show the results.

Study details:

1. Reading and cleaning the data

2. Analysis

I. The plot for the CPI of all the 3 nations:

 op <- par(mar = c(5, 5, 3, 5))
 #plot China’s and India’s data
 with(data, plot( year,cpi,ylim=c(90,140),col=country,ylab = "CPI of India or China"))
 with(data, arrows(year, cpi + sd, year, cpi - sd, length = 0.05, angle = 90, code = 3, lwd = 1, col = country))
 #plot U.S.’s data
 op <- par(mar = c(5, 5, 3, 5),new=TRUE)
 with(subset(data,country="U.S."), plot( year,cpi,ylim=c(220,270),col="Blue",yaxt="n",ylab=" "))
 with(subset(data,country="U.S."), arrows(year, cpi + sd, year, cpi - sd, length = 0.05, angle = 90, code = 3, lwd = 1, col = "Blue"))

 #add another y axis
 axis(side=4,at=seq(220,270,by=10))
 mtext(side=4, line=3,"CPI of U.S.")

 #add legend
 legend("topleft",inset=0.015,legend=c("China","India","U.S."),col=c("black","red","blue"),lty = c(1,1,1))

II. The HSD test results:

#compare the cpi within the 12 months

 model <- aov(cpi ~ monthfactor, data = cpi)
 out <- HSD.test(model, "monthfactor", group = TRUE)
 out$groups
##          cpi groups
## M04 240.0883      a
## M09 239.6760      a
## M03 239.4433      a
## M10 239.4420      a
## M06 239.2920      a
## M08 239.2840      a
## M07 239.1840      a
## M11 238.9160      a
## M05 238.7240      a
## M02 238.5617      a
## M12 238.4680      a
## M01 237.5900      a

III.The box plots of the 12 month for the recent 5 years:

#prepare the data for the box plot
 cpimonthly <- aggregate(x = cpi[, 2], by = list(cpi$month,cpi$month), FUN = mean,na.rm = TRUE)
 cpisd<- aggregate(x = cpi[, 2], by = list(cpi$month,cpi$month), FUN = sd,na.rm = TRUE)
cpi.bardata<-as.data.frame(cbind(cpimonthly$Group.1,cpimonthly$x,cpisd$x) )
 colnames(cpi.bardata)<-c("month","mean","sd")
#drawing the box plots
 par(mar = c(4.1, 4.1, 0.6, 0.6))
 with(cpi, plot(monthfactor, cpi, ylab = "CPI",xlab = "month", xaxt = "n") )
# adding means and sd bars
 arrows(1:12, cpi.bardata$mean - cpi.bardata$sd, 1:12, cpi.bardata$mean + cpi.bardata$sd,code = 3, angle = 90, length = 0.1, lwd = 2)
 points(1:12, cpi.bardata$mean, pch = 23, cex = 1.5, lwd = 2)
# adding the x-axis lables.
 axis(side = 1, at = seq(1, 12, by = 1),labels = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"))

Discussion:

I.The plot for the CPI of all the 3 nations shows that for the recent 5 years, India has the largest increase followed by the U.S.

For the U.S. specially, the CPI is very steady between 2014-2015.

It also shows that the U.S. has the highest CPI and China has the lowest. This maybe caused by different baseline of the CPI data: the baseline of the U.S. is the year of 1982, that of India is 2001, and that of China is 2015.

II.The HSD comparison shows that there is no difference among the 12 months’ CPI for the recent 5 years, and the box plot proves it in a intuitive way.