HW exercise 1.

Use the dataset containing the average number of visitors (monthly) in New Zealand by country of residence to explore the seasonal patterns between the eight countries. Is there a hemisphere effect?

Load and check

'data.frame':   170 obs. of  10 variables:
 $ Month    : Factor w/ 170 levels "1998M09","1998M10",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ Australia: int  17244 18090 16750 25909 27228 19461 19200 19595 12564 10987 ...
 $ ChinaPRof: int  748 941 1054 1270 1375 1660 1456 1488 1449 1413 ...
 $ Japan    : int  6093 5039 6112 6670 6008 7478 7341 5030 4275 3758 ...
 $ Korea    : int  979 1083 1144 1836 2716 2245 1611 1416 1192 1255 ...
 $ Germany  : int  1320 2459 5195 5499 6430 7320 6094 3157 1450 765 ...
 $ UK       : int  5794 7876 13362 20238 22557 27477 20187 13448 8587 7271 ...
 $ Canada   : int  973 1418 2236 2935 3623 4394 3573 1795 1160 903 ...
 $ USA      : int  3837 6093 8468 7865 10007 12533 10519 6737 4932 4845 ...
 $ Total    : int  57930 68203 84370 113853 122130 124305 104106 83414 59800 52426 ...

Data transformation

     Month             variable       value         Year_Month       
 Min.   : 1.000   Australia:170   Min.   :   748   Length:1530       
 1st Qu.: 4.000   ChinaPRof:170   1st Qu.:  4304   Class :character  
 Median : 7.000   Japan    :170   Median :  7180   Mode  :character  
 Mean   : 6.535   Korea    :170   Mean   : 22928                     
 3rd Qu.:10.000   Germany  :170   3rd Qu.: 19420                     
 Max.   :12.000   UK       :170   Max.   :212693                     
                  (Other)  :510                                      
 Year_Month2         Year_Month3         hemisphere
 Length:1530        Min.   :1998-09-01   N:1360    
 Class :character   1st Qu.:2002-03-01   S: 170    
 Mode  :character   Median :2005-09-16             
                    Mean   :2005-09-15             
                    3rd Qu.:2009-04-01             
                    Max.   :2012-10-01             
                                                   

Data visualization

Visitor counts comparison

Conclusion

The hemisphere effect can be found in:

  1. There are more visitors visit New Zealand in the winter of S. hemisphere (the summer in N. hemisphere).
  2. There are more visitors visiting New Zealand from Australia, which is also located in S. hemisphere.


HW exercise 2.

Use the sample data set to estimate the mean life expectancy of Nobel prize winners.

Load and check

'data.frame':   21 obs. of  3 variables:
 $ ID  : Factor w/ 21 levels "Bernard Katz",..: 3 8 17 1 13 15 7 14 2 16 ...
 $ Born: Factor w/ 21 levels "August 8, 1902",..: 5 9 2 12 1 14 6 10 16 3 ...
 $ Died: Factor w/ 21 levels "April 16, 1972",..: 21 8 20 2 19 6 5 16 7 4 ...

Discuss

這樣的語法在日期格式調整時,似乎會出一些狀況,像是第一列的出生日期應為February 27, 1926,轉換後變成了1926-02-07,還有蠻多其他錯誤。


HW exercise 3.

Use the following sample of records for profit made, arrival date, and departure date of group travel booked at a travel agency in Taiwan to estimate the mean profit per day of service.

Load and check

'data.frame':   96 obs. of  3 variables:
 $ Expense : int  15393 27616 8876 57378 32613 46998 10744 3269 16195 55842 ...
 $ Arrival : Factor w/ 83 levels "2014/10/10","2014/10/13",..: 79 83 78 74 75 31 31 26 29 77 ...
 $ Depature: Factor w/ 79 levels "2014/10/1","2014/10/13",..: 73 77 73 76 75 32 69 26 28 73 ...

Compute

[1] 5522.453

The estimated mean profit per day of service is 5522.453 units.


HW exercise 4.

The following rather awful plot is shown on a web page hosted by the Taiwanese Ministry of Education.

The original plot

The original plot

Revise it so that it is a proper time series plot. For your convenience, the data points have been extracted and saved in the the file. What had happened in the early 1990’s and how do we know if the trend reversal is real? You may want to augment the data set with further data points from 2012 to 2018 available in the foreign students in the U.S. data file.

Load and check

       V1       
 Min.   : 2553  
 1st Qu.:14905  
 Median :28930  
 Mean   :24497  
 3rd Qu.:32286  
 Max.   :37580  
      Year      Country     Number      
 Min.   :2012   CN:7    Min.   : 18105  
 1st Qu.:2013   JP:7    1st Qu.: 19334  
 Median :2015   TW:7    Median : 21516  
 Mean   :2015           Mean   :119596  
 3rd Qu.:2017           3rd Qu.:274439  
 Max.   :2018           Max.   :369548  
'data.frame':   21 obs. of  3 variables:
 $ Year   : int  2012 2013 2014 2015 2016 2017 2018 2012 2013 2014 ...
 $ Country: Factor w/ 3 levels "CN","JP","TW": 3 3 3 3 3 3 3 2 2 2 ...
 $ Number : int  21867 21266 20993 21127 21516 22454 23369 19568 19334 19064 ...

Transform

Visualize

HW exercise 5.

How different groups spend their day is an article published in The New York Times using the data collected from The American Time Use Survey. Discuss what we need to have in order to replicate this piece of graphical journalism in Taiwan.

[ANS]

We need to collect a representative sample via a reasonable sampling method (e.g., sampling stratified by the cities and counties), and obtain the following measurement/indices from the sample:

  • Basic demographic variables
    1. gender
    2. age
    3. ethnicity
    4. education level
    5. employment status
    6. the no. of children
  • Daily-activity-related measurements
    1. daily activities
    2. start and end time of the daily activities.

In the part of data analysis, we need to turn the data set into the format of time series data to calculate the percentage of each activity in each time period.