Math 228/Hon 309 Assignment M12
Please answer the questions in this assignment in the form of an RMarkdown document in HTML. Please submit this document in the Dropbox for Assignment M12. All data files are in the Data Sets folder on the course Moodle site. Name(s):Aiza K

Open the data set US_POP.

US_POP <- read.csv("C:/Users/aizax94/Downloads/US_POP.CSV")
  View(US_POP)
  1. Plot population against year (without connecting the points) . Describe the plot.
plot(US_POP$population ~ US_POP$year, main = "Plot of Population by Year", ylab = "population", xlab = "year")

The plot shows a postive, exponential trend in population as the year inceases.

  1. Fit a quadratic model predicting population from year. Write down the model.
model <- lm(population ~ year + I(year^2), US_POP)
model
## 
## Call:
## lm(formula = population ~ year + I(year^2), data = US_POP)
## 
## Coefficients:
## (Intercept)         year    I(year^2)  
##   2.188e+10   -2.431e+07    6.755e+03

The model is Population = 2.188^10 - 2.431^7 * year + 6.755^3 * year^2.

  1. Include the quadratic model on the plot in part (a).
range(US_POP$year)
## [1] 1790 2010
x <- seq(1790,2010)
k <- data.frame(year = x)
y <- predict(model, k)
plot(population ~ year, US_POP,
 main = "Plot of Population against Year")
lines(x, y, col = "red")

(d)Use your model to predict the value for Y2020, the population of the U.S. in the year 2020.

library(forecast)
## Warning: package 'forecast' was built under R version 3.2.5
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 3.2.5
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: timeDate
## This is forecast 7.1
pts <- ts(US_POP$population, frequency = 0.1, start = 1790)
mod <- ets(pts, model = "ZZZ")
fcst <- forecast(mod, 1)
fcst 
##      Point Forecast     Lo 80     Hi 80     Lo 95     Hi 95
## 2020      335221263 329146957 341295568 325931414 344511112

Using the model, we can predict the population of the U.S. in the year 2020 to be 335,221,263.

  1. Use the predict function to obtain a 90% confidence interval for Y2020.
k <- data.frame(year = 2020)
predict(model, k, interval = "confidence", level = 0.9)
##         fit       lwr       upr
## 1 334607733 331083989 338131477

We can be 90% confident that the population of the US in the year 2020 lies between 331,083,989 and 338,131,477.

Open the data set souvenir.csv. (I have cleaned in up a little.)

souvenir <- read.table("C:/Users/aizax94/Downloads/souvenir.csv", quote="\"", comment.char="")
  View(souvenir)

Plot sales. Do connect the points. Carefully describe the components of the plot.

plot(souvenir$V1, type = "l", main = "Plot of monthly souvenir sales ",
ylab = "Sales", xlab = "Month")

The model is additive, shows a trend (about every 10 years), and has seasonality, therefore the components of the model are AAA.

Create a new variable (lsales) by taking the natural logs of the sales. Use the ts function to create a time series object called lsts. Obtain a plot of lsts. Carefully describe the components of the plot.

lsales <- log(souvenir$V1)
lsts <- ts(lsales, start = 1987, frequency = 12)
plot(lsts)

The components of the plot are multiplicative, there is a trend, and there is also seasonality. Overal souvenir sales increase, but there is a repeated trend in the graph.

Fit an appropriate exponential smoothing model to the series lsts.

lsts.5 <- ma(lsts, 5)
d <- data.frame(lsts, lsts.5)
plot(lsts.5)

Use your model to predict log_sales and then sales for the first four months of 1995. Each of your four predictions should be accompanied by an 80% confidence interval. Write a brief summary .

mod <- ets(lsts, model = "ZZZ")
fcst2 <- forecast(mod, 16, interval = "confidence", level = 0.8)
fcst2
##          Point Forecast     Lo 80     Hi 80
## Jan 1994       9.680343  9.490263  9.870423
## Feb 1994       9.949966  9.740940 10.158991
## Mar 1994      10.446730 10.220256 10.673204
## Apr 1994      10.106603  9.863856 10.349351
## May 1994      10.144243  9.886174 10.402313
## Jun 1994      10.200476  9.927875 10.473076
## Jul 1994      10.381921 10.095460 10.668382
## Aug 1994      10.384274 10.084530 10.684018
## Sep 1994      10.512556 10.200032 10.825079
## Oct 1994      10.609431 10.284572 10.934290
## Nov 1994      11.106421 10.769621 11.443220
## Dec 1994      11.896833 11.548441 12.245225
## Jan 1995       9.993443  9.633785 10.353101
## Feb 1995      10.263066  9.892433 10.633699
## Mar 1995      10.759830 10.378487 11.141173
## Apr 1995      10.419704 10.027894 10.811513

We can predict the log sales in the first four months of 1995 to be 9.99, 10.26, 10.75, and 10.41 respectively. We can be 80% confident log sale lies in between 9.63 and 10.35 for January, for February between 9.89 and 10.63, for March between 10.37 and 11.14, and for April between 10.02 and 10.81.

sales <- ts(souvenir$V1, start = 1987, frequency = 12)
mod <- ets(sales, model = "ZZZ")
fcst3 <- forecast(mod, 16, interval = "confidence", level = 0.8)
fcst3
##          Point Forecast    Lo 80     Hi 80
## Jan 1994       15100.83 12182.58  18019.08
## Feb 1994       20270.17 15954.06  24586.29
## Mar 1994       31496.11 24225.70  38766.52
## Apr 1994       22761.86 17131.58  28392.14
## May 1994       23130.81 17053.07  29208.56
## Jun 1994       22968.07 16600.80  29335.34
## Jul 1994       27232.77 19310.78  35154.76
## Aug 1994       27483.04 19131.14  35834.95
## Sep 1994       29760.71 20347.68  39173.74
## Oct 1994       32722.20 21984.05  43460.34
## Nov 1994       51418.21 33958.60  68877.82
## Dec 1994      111921.09 72688.52 151153.66
## Jan 1995       16126.42 10302.58  21950.27
## Feb 1995       21639.10 13602.73  29675.47
## Mar 1995       33611.27 20795.05  46427.49
## Apr 1995       24281.95 14789.20  33774.70

We can predict the sales in the first four months of 1995 to be 16126.42, 21639.10, 33611.27 and 24281.95 respectively. We can be 80% confident sales lie in between 10302.58 and 21950.27 for January, for February between 13602.73 and 29675.47, for March between 20795.05 and 46427.49, and for April between 14789.20 and 33774.70.