Introduction

For your first assignment, you can use this as a templete. When you need to include and R code chuck, you can do so by clicking insert, and choosing R. The following is what you would get

When you are done, or any time in between, you can run the code by clicking on knitr, but please remember to save your file first. For more information, please refer to the R-Markdown video.

Example:

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots:

You can also embed plots, for example:

plot(cars$speed)

Data

For this assignment, you will be using a dataset that is preloaded in R, in the R package fpp2. The data set we will be using is arrivals. Please use R help, i.e., ?arrivals to obtain more information about the data. Mainly this is a time series data sets which considers quarterly international arrivals to Australia from 4 countries: the variables are Japan, NZ (New Zealand), UK , and US. Each variable corresponds to arrivals from the corresponding country.

head(arrivals) #Initial observation of fpp2 arrivals data set
##          Japan     NZ     UK     US
## 1981 Q1 14.763 49.140 45.266 32.316
## 1981 Q2  9.321 87.467 19.886 23.721
## 1981 Q3 10.166 85.841 24.839 24.533
## 1981 Q4 19.509 61.882 52.264 33.438
## 1982 Q1 17.117 42.045 53.636 33.527
## 1982 Q2 10.617 63.081 34.802 28.366

You can type in your responses and code after each question in this document. You need to submit the html file R Markdown creates.

Questions

  1. (15 points) Compute the mean and the standard deviation for all four variables. Are the means similar? How about the standard deviations? Comment on this.
#Calculate mean arrivals for Japan, NZ, UK, and US
mean(arrivals[,"Japan"]) #average number of arrivals - Japan
## [1] 122.0802
mean(arrivals[,"NZ"]) #average number of arrivals - NZ
## [1] 170.5858
mean(arrivals[,"UK"]) #average number of arrivals - UK
## [1] 106.8575
mean(arrivals[,"US"]) #average number of arrivals - US
## [1] 84.84876
#Calculate SD for Japan, NZ, UK, and US arrivals in Australia
sd(arrivals[,"Japan"]) #SD number of arrivals - Japan
## [1] 64.21392
sd(arrivals[,"NZ"]) #SD number of arrivals - NZ
## [1] 84.6801
sd(arrivals[,"UK"]) #SD number of arrivals - UK
## [1] 63.98245
sd(arrivals[,"US"]) #SD number of arrivals - US
## [1] 29.74831
  1. (20 points) Obtain the correlation between US and a) UK, b) Japan, c) NZ. Briefly comment on the correlations, do you see a significant correlations between arrivals from the US and any other country?

It appears that the arrivals from all 4 countries (Japan, NZ, UK, US) have at least moderate association with arrivals to Australia from the other countries, suggesting that emmigration to Australia seems to increase and decrease each year/quarter in similar rates for Japan, NZ, UK, and US suggestion that global economic, social, and political factors (rather than perhaps factors unique to just one country) is pushing emmigration to Australia. More research would be needed to understand why emmigration from the 4 countries population occur in certain periods rather than others, as well as the influences behind any spikes that will soon be identified with a time series plot. Understanding the information will help Australia better prepare for increases in arrivals from these 4 countries by better predicting future surges in arrivals.

It is notable that arrivals from U.S and NZ and U.S and UK to Austraila have a strong coefficient of determination (0.814 and 0.843, respectively). More research would be needed to understand why this pattern exists.

#correlation between all 4 countries arrival to Austraila
cor(arrivals[, c("Japan","NZ","UK","US") ])
##           Japan        NZ        UK        US
## Japan 1.0000000 0.3999781 0.5190884 0.5734422
## NZ    0.3999781 1.0000000 0.6366589 0.8144083
## UK    0.5190884 0.6366589 1.0000000 0.8433243
## US    0.5734422 0.8144083 0.8433243 1.0000000
  1. (15 points) Obtain the histograms for all four variables. Briefly comment on the shapes. Do they look like they are normally distributed?

For each of the four distribution charts generated fro arrivals to Austraila from each of the 4 countries (Japan, U.S, UK, and NZ), there is no noticable perfect bell shape normal distribution. When initially observing the four distribution graphs, the 4 charts appeared platykurtic meaning that they are relatively flat with a high level of variablity. However when looking at the pattens for each of the 4 distribution charts, it appears that arrivals to Austraila from each of the 4 countries can be characterized by multimodality. There are multiple peaks for each of the distribution graphs suggesting that perhaps broader internatinoal restrictions on immigration causes multiple peaks for number of immigrants/arrivals to Australia from other coutnries. All 4 distributions have a high frequency count of relatively few immigrants arriving as well as a high frequency count of high amounts of immigrants/arrivals. More research would need to be conducted. Fist looking at a time series plot to understand fluctuations over time of arrivals to Australia followed by a qualitative analysis of Australia immigration policy would prove beneficial for further research.

#Create Hist. for Arrivals to AU from Japan
hist(arrivals[,"Japan"], ylab="", xlab="Number of Arrivals",freq=TRUE, col="lightblue",
              main="Empircal Distribution of Arrivals to Au from Japan")

# Create Hist. for Arrivals to AU from NZ
hist(arrivals[,"NZ"], ylab="", xlab="Number of Arrivals", freq=TRUE, col="red",
     main="Empirical Distribution of Arrivals to Australia from New Zealand")

#Create Hist. for Arrivals to AU from UK
hist(arrivals[,"UK"], ylab="", xlab="Number of Arrivals", freq=TRUE, col="yellow",
     main="Empirical Distribution of Arrivals to Australia from the UK")

#Create Hist. for Arrivals to AU from US
hist(arrivals[,"US"], ylab="", xlab="Number of Arrivals", freq=TRUE, col="pink",
     main="Empirical Distribution of Arrivals to Australia from the US")

  1. (15 points) Obtain the scatter-plots between US and a) UK, b) Japan, c) NZ. Are these plots reasonable compared to the correlations you calculated in question 2? (hint: see how plot vs plot.default performs)

The plots displaying the relationship between arrivals to Austraila from U.S to arrivals to Australia from Japan, Uk, and NZ are consistent with the observed correlation matrix produced in step 02. Arrivals to Australia from Japan and arrivals to australia from U.S have a coefficient determination of 0.573 demonstrating only a moderate, postive relationship. This can be seen in the scatter plot below. While arrivals to Au from Japan tend to be comparabe to arrivals from U.S to AU, there is no convicing ralationship suggesting that arrivals from one country in Au could be used to predict arrivals from the other country to Au. However, the scatterploits showing the relationship between (1) U.S arrivals to Au and arrivals to Au from Nz, and (2) U.S arrivals to Au an d arrivals to Au from UK tend to demonstrate stronger, positive correlation as identified in the corrletation matrix in step 02 (0.824, 0.843, respectively).

#Scatter plot between US and UK
plot(arrivals[,"US"], arrivals[,"UK"], 
     xlab="Arrivals from US", ylab="Arrivals from UK", 
     main="Scatter Plot of Arrivals: US vs UK", 
     col="blue", pch=19, type="p")

# Scatter plot between US and Japan
plot(arrivals[,"US"], arrivals[,"Japan"], 
     xlab="Arrivals from US", ylab="Arrivals from Japan", 
     main="Scatter Plot of Arrivals: US vs Japan", 
     col="green", pch=19, type="p") 

# Scatter plot between US and NZ without lines
plot(arrivals[,"US"], arrivals[,"NZ"], 
     xlab="Arrivals from US", ylab="Arrivals from NZ", 
     main="Scatter Plot of Arrivals: US vs NZ", 
     col="red", pch=19, type="p") 

  1. (15 points) Plot a time series plot for each variable (hint: plot.ts or autoplot). Do you see any patterns over time for any one of the countries?

All 4 time series plots below demonstrate cyclical fluctuations in arrivals to Australia from each of the 4 countries.Arrivals from the U.S to Australia and arrivals from NZ to Australia, depsite cyclical patterns, have increased from 1980 to 2010, while arrivals from UK to Australia and particularly arrivals from Japan to Australia have decreased in recent years. Following a steady increase in arrivals to Australia from Japan from 1980 to 1995, there has been a steady decline in arrivals. Australia has seen increases in arrivals from UK up until 2005 where that has been a slight decrease. More data will be needed to understand the cyclical pattern of arrivals in Australia from the 4 countries. Likely explanations include seasonality, economic events, policy change, cultural events, and global events. While not included in the dataset ‘arrivals’, it is expected that arrivals to Australia from each of the 4 countries dropped significantly in 2020 and 2021, likely returning to pre-pandemic levels in recent years. More data is needed to confirm this trend and to make decisions regarding tools for predicting future arrivals in Australia from the 4 countries.

# Time series plot for Japan
plot.ts(arrivals[, "Japan"], main="Time Series Plot of Arrivals from Japan", 
        ylab="Number of Arrivals", xlab="Time", col="blue")

# Time series plot for New Zealand
plot.ts(arrivals[, "NZ"], main="Time Series Plot of Arrivals from New Zealand", 
        ylab="Number of Arrivals", xlab="Time", col="green")

# Time series plot for the UK
plot.ts(arrivals[, "UK"], main="Time Series Plot of Arrivals from the UK", 
        ylab="Number of Arrivals", xlab="Time", col="red")

# Time series plot for the US
plot.ts(arrivals[, "US"], main="Time Series Plot of Arrivals from the US", 
        ylab="Number of Arrivals", xlab="Time", col="purple")

  1. (10 points) Based on the analysis you have done here, what have you learned about arrivals data?

Based upon my analysis in R, I have learned that the arrivals data includes 4 variables (arrivals to Australia from Japan, NZ, UK, US) from 1981 to 2010. The arrivals Japan, NZ, and UK have a larger distribution as measured by S.D (64.21, 84.68, 63.98) while the arrivals from US has a smaller distribution and variable data points tend to be more clustered around the mean with a S.D of 29.75). None of the variables have a perfect bell-shaped normal distribution with each of the distribution/frequency histograms demonstrating a multimodule distribution indicating that there is more than one mode. This indicates that for each of the 4 variables there is a high frequency of very few arrivals to Australia from Japan, NZ, UK, and US while there is also a high frequency of high level of arrives to Australia from the same 4 countries (variables). Arrivals to Australia from each of the 4 countries demonstrate similar distribution curves, time series charts, and scatter plots as the 4 variables are moderately to strongly, positively correlated (associated) with one another. The U.S in particular, has strong association with NZ and UK. This can be visualized by looking at the linearity present in the scatter plot demonstrating the relationship between arrivals to Australia from U.S and NZ.Finally, the time series chart shows a continuous increase from 1981 to 2010 in arrivals to Australia from U.S and NZ, albeit with cyclical fluctuations. While the arrivals from Japan to Australia increased from 1981 to 1995 in a cyclical manner and at a rate similar to U.S UK and NZ, there has been a steady decline in arrivals to Australia since 1995, helping to explain why the U.S arrivals to Australia is more strongly correlated with arrivals to Australia from NZ and UK than arrival to Australia from Japan. More information will be needed to understand why Japanese are not traveling to Australia to the same extent. What has happened in the years following 1995 that has led Japanese arrivals to Australia to decrease? Note also that more research will need to be conducted to understand the what explains the cyclical nature of all 4 variables arrivals to Australia. Are thee seasonality, cultural, global, economic factors contributing to the cyclical nature?

  1. (10 points) Can you summarize your learning outcome for this assignment?

Overall, this assignment provided the student with a strong understanding of the arrivals data set by considering important descriptive statistics (mean, S.D), exploratory data analysis (correlation, variance/spread of data via frequency/distribution histogram charts), and various visualizations (time series analysis). By considering the above, the student was able to understand distribution of data sets (particularly the concept and implication of multimodal distributions), performing and understanding basic statistical analysis, and learning to interpret basic visualizations (time series analysis) to make meaning of underlying data set when simultaneous considering basic descriptive statistics and correlation among the underlying variables. Using R to accomplish this assignment has allowed the student to gain insight to performing descriptive statistic, EDA, and visualization tasks quickly and efficiently using fpp2 and ggplot2 packages. The outcome will enable more advance usage of R for predictive analytical tools in subsequent modules. I look forward to learning more in subsequent weeks.