This is my Homework 3 for DACSS 601.
The dataset, Chinese Real National Income Data, is from GitHub, and contains time series of real national income in China per section (index with 1952 = 100).
The variables in the dataset are following:
| Variable | Data type | Description |
|---|---|---|
| index | Number | Index of data. |
| agriculture | Number | Real national income in agriculture sector. |
| commerce | Number | Real national income in commerce sector. |
| construction | Number | Real national income in construction sector. |
| industry | Number | Real national income in industry sector. |
| transport | Number | Real national income in transport sector. |
incoming_data <- read_csv("C:/Users/zhang/OneDrive - University of Massachusetts/_601/Sample Datasets/ChinaIncome.csv", show_col_types = FALSE)
incoming_data
# A tibble: 37 x 6
index agriculture commerce construction industry transport
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 100 100 100 100 100
2 2 102. 133 138. 134. 120
3 3 103. 136. 133. 159. 136
4 4 112. 138. 152. 169. 140
5 5 116. 147. 262. 219. 164
6 6 120. 147. 243. 244. 176
7 7 120. 156. 367 384. 271.
8 8 101. 170. 389. 502. 356.
9 9 83.6 164. 394 541. 384.
10 10 84.7 130. 130. 316. 221.
# ... with 27 more rows
[1] 1 3 4 4 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
[34] 4 4 4 4
As we can see, the 4th field, which is industry, is almost always the maximal during this period. Thus, industry grew the fastest.
field_data = incoming_data[2:6]
for (i in 1:4) {
for (j in (i+1):5) {
x = as.vector(unlist(field_data[i]))
y = as.vector(unlist(field_data[j]))
cat("The correlation between the ",i,"th field and the ",j,"th field: ",sep="")
cor(x,y) %>% cat('\n')
}
}
The correlation between the 1th field and the 2th field: 0.9641547
The correlation between the 1th field and the 3th field: 0.955028
The correlation between the 1th field and the 4th field: 0.9670784
The correlation between the 1th field and the 5th field: 0.9521588
The correlation between the 2th field and the 3th field: 0.9880994
The correlation between the 2th field and the 4th field: 0.9883226
The correlation between the 2th field and the 5th field: 0.9881633
The correlation between the 3th field and the 4th field: 0.9873144
The correlation between the 3th field and the 5th field: 0.9936743
The correlation between the 4th field and the 5th field: 0.9927794
As we can see, every two of the five fields have a strong correlation, but the 3th field, construction, and the 5th field, transport, have the strongest one.