MATH2405 TP5, 2021

# This is a chunk where the necessary packages required are downloaded to produce the report 


library(readr)
library(knitr)
library(openxlsx)
library(tidyverse)

## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  2.1.3     ✓ dplyr   0.8.3
## ✓ tidyr   1.0.0     ✓ stringr 1.3.1
## ✓ ggplot2 3.3.5     ✓ forcats 0.3.0

## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(magrittr)

## 
## Attaching package: 'magrittr'

## The following object is masked from 'package:purrr':
## 
##     set_names

## The following object is masked from 'package:tidyr':
## 
##     extract

library(glue)

## 
## Attaching package: 'glue'

## The following object is masked from 'package:dplyr':
## 
##     collapse

Step1 Locate Data

Data: The Daily Adjusted Closing Price data of the top 10 companies on ASX (marketindex) as of 3-9-2021 for the period 1-1-2020 to 3-9-2021.

Variables description are shown below under Data Description.

Source: Yahoo Finance viewed on 5-9-2021 (https://finance.yahoo.com/)

Step 2 Read/Import Data

# This is an R chunk for importing the data. 

# The sheet "A1Ret" from the file "DWA1X.xlsx" is read with R Code below.  A1Ret contains the returns of the top 10 companies in ASX, calculated from the prices.


DWA1 <- read.xlsx("data/DWA1X.xlsx", sheet = "A1Ret")


head(DWA1)

str(DWA1)

## 'data.frame':    425 obs. of  10 variables:
##  $ CBARet: num  0.00537 -0.00675 0.01777 -0.0037 0.00727 ...
##  $ CSLRet: num  0.00818 -0.0018 0.02363 0.00854 0.0182 ...
##  $ BHPRet: num  0.005122 0.006365 0.0038 0.000506 0.011558 ...
##  $ WBCRet: num  0.007003 -0.000821 0.011438 -0.003255 -0.002857 ...
##  $ NABRet: num  0.006086 -0.006901 0.014156 -0.003218 0.000403 ...
##  $ ANZRet: num  0.00731 -0.00405 0.01291 -0.00361 0.00121 ...
##  $ WESRet: num  0.00961 0.00857 0.02041 -0.00349 0.00858 ...
##  $ FMGRet: num  -0.00279 0.00186 -0.00746 -0.00469 0.01678 ...
##  $ MQGRet: num  0.00609 -0.00536 0.01357 -0.00727 0.00454 ...
##  $ WOWRet: num  0.001928 -0.001101 0.017202 0.008357 -0.000806 ...

Step 3 Data description

The Daily Adjusted Closing Price data of the top 10 companies on ASX (marketindex) on 3-9-2021 have been downloaded from Yahoo Finance

(https://finance.yahoo.com/quote/)

Names of companies (ranking by market capitalization) are:

CBA = Commonwealth Bank of Australia
CSL = Commonwealth Serum Laboratories
BHP = Broken Hill Proprietary Billiton
WBC = Westpac Banking Corporation
NAB = National Australia Bank
ANZ = Australia New Zealand Banking Group
WES = Wesfarmers
FMG = Fortescue Metals Group
MQG = Macquarie Group
WOW = Woolworths Group

The prices cover the period from 1-1-2020 to 3-9-2021.
On 4-1-2020, WHO reported on social media that there was a cluster of pneumonia cases in Wuhan, Hubei province (WHO).
The selection of the price data is to evaluate the performance of these top 10 companies since the announcement of the Covid by WHO.
Instead of ranking by market capitalization, this report will rank these 10 companies in terms of the return per unit of risk, where risk is measured by the standard deviation.
The daily return is defined as ln [(Price at t+1)/Price at t], and expressed in decimal points. e.g. 0.01 = 1%
The daily return of a company is designated as company name Ret. e.g. CBARet
Risk is the standard deviation for the daily returns, expressed in decimal points.
RetperRisk measures the Return per unit of Standard Deviation, or Return per unit of Total Risk.
In finance, security risk is measured by the standard deviation of the return. It represents two types of risks-unsystematic risk and systematic risk.
Unsystematic risk is dependent on the type of business of the security, and it can be diversified away if sufficient number of less than perfectly correlated securities are included in the portfolio.
Systematic risk is dependent on the market factors that affect all businesses, and it cannot be diversified away in theory.

Step 4 Inspect dataset and variables

# This is a chunk where the types of variables, data structures, and the attributes in the data are inspected.

str(DWA1)

## 'data.frame':    425 obs. of  10 variables:
##  $ CBARet: num  0.00537 -0.00675 0.01777 -0.0037 0.00727 ...
##  $ CSLRet: num  0.00818 -0.0018 0.02363 0.00854 0.0182 ...
##  $ BHPRet: num  0.005122 0.006365 0.0038 0.000506 0.011558 ...
##  $ WBCRet: num  0.007003 -0.000821 0.011438 -0.003255 -0.002857 ...
##  $ NABRet: num  0.006086 -0.006901 0.014156 -0.003218 0.000403 ...
##  $ ANZRet: num  0.00731 -0.00405 0.01291 -0.00361 0.00121 ...
##  $ WESRet: num  0.00961 0.00857 0.02041 -0.00349 0.00858 ...
##  $ FMGRet: num  -0.00279 0.00186 -0.00746 -0.00469 0.01678 ...
##  $ MQGRet: num  0.00609 -0.00536 0.01357 -0.00727 0.00454 ...
##  $ WOWRet: num  0.001928 -0.001101 0.017202 0.008357 -0.000806 ...

attributes(DWA1)

## $names
##  [1] "CBARet" "CSLRet" "BHPRet" "WBCRet" "NABRet" "ANZRet" "WESRet"
##  [8] "FMGRet" "MQGRet" "WOWRet"
## 
## $row.names
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
##  [18]  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34
##  [35]  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51
##  [52]  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68
##  [69]  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85
##  [86]  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102
## [103] 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
## [120] 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
## [137] 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153
## [154] 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170
## [171] 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187
## [188] 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204
## [205] 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221
## [222] 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238
## [239] 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255
## [256] 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272
## [273] 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289
## [290] 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306
## [307] 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323
## [324] 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340
## [341] 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357
## [358] 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374
## [375] 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391
## [392] 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408
## [409] 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425
## 
## $class
## [1] "data.frame"

# The dimension is 425 by 10.
# Data types are 10 character variables (qualitative and nominal) and 425x10 numeric variables (quantitative and continuous). 
# Levels of factor variables do not apply in this case as it is set by the pre-determined ranking by market capitalization.
# Column names are the same as the source data

Step 5 Tidy data

# This is a chunk to check if the data conforms to the tidy data principles.  

# The data file is tidy as it meets the principle of a tidy file:

# Each observation has its own row.
# Each variable has its own column.
# Each value has its own cell.

# For Assessment 1, I am interested to analyse the behaviour of the return per unit risk by using the the steps in the assessment.

# The original price data were taken from 10 different companies, downloaded from Yahoo Finance.

# The price data have been used to calculate the Return, the Standard Deviation, the Return per Standard Deviation (risk) and other summary statistics.

# Due to the property of ln (Pt+1/Pt) = ln Pt+1 - ln Pt. i.e. ln(P2/P1) + ln (P3/P2)...._ln(Pn/Pn-1) = ln(Pn/P1), the sum of the return series is the return for the period from 1-1-2020 to 3-9-2021 for each security. Pt+1 represents the price at t+1.

# Return per Standard Deviation is a measure of the Return per unit risk, this permits the performance of the security to be measured on a comparable basis, rather than a comparison based on return alone, as high risk security usually requires a high return for compensation.

# From the Summary Statistics below, the ranking based on Return per Standard Deviation becomes FMG=1, WBC=10, as opposed to the ranking by market capitalization: FMG=8, and WBC=4.  Higher ranking has lower number.

Step 6 Summary statistics

# This chunk shows the summary statistics for each of the 10 companies

# The R codes below read the data from the sheet "A1Ret" in the file "DWA1".  For the purpose of easier reading by the reader, the summary statistics for each and every of the 10 companies are calculated separately.  A more compact R code using sapply(Sumstat, mean, na.rm=TRUE) is also possible.


Sumstat <- read.xlsx("data/DWA1X.xlsx", sheet = "A1Ret")


#CBA
CBA<- Sumstat %>%
  
summarise(Mean_CBA = mean(CBARet, na.rm = TRUE),
          
Median_CBA = median(CBARet, na.rm = TRUE),

Min_CBA = min(CBARet, na.rm = TRUE),

Max_CBA = max(CBARet, na.rm = TRUE),

Sum_CBA = sum(CBARet, na.rm = TRUE),

SD_CBA = sd(CBARet, na.rm = TRUE),

RetperRisk_CBA = sum(CBARet, na.rm = TRUE)/sd(CBARet, na.rm = TRUE),

n = n())


#CSL

CSL<- Sumstat %>%
  
summarise(Mean_CSL = mean(CSLRet, na.rm = TRUE),
          
Median_CSL = median(CSLRet, na.rm = TRUE),

Min_CSL = min(CSLRet, na.rm = TRUE),

Max_CSL = max(CSLRet, na.rm = TRUE),

Sum_CSL = sum(CSLRet, na.rm = TRUE),

SD_CSL = sd(CSLRet, na.rm = TRUE),

RetperRisk_CSL = sum(CSLRet, na.rm = TRUE)/sd(CBARet, na.rm = TRUE),

n = n())

# BHP

BHP<- Sumstat %>%
  
summarise(Mean_BHP = mean(BHPRet, na.rm = TRUE),
          
Median_BHP = median(BHPRet, na.rm = TRUE),

Min_BHP = min(BHPRet, na.rm = TRUE),

Max_BHP = max(BHPRet, na.rm = TRUE),

Sum_BHP = sum(BHPRet, na.rm = TRUE),

SD_BHP = sd(BHPRet, na.rm = TRUE),

RetperRisk_BHP = sum(BHPRet, na.rm = TRUE)/sd(BHPRet, na.rm = TRUE),

n = n())

# WBC

WBC<- Sumstat %>%
  
summarise(Mean_WBC = mean(WBCRet, na.rm = TRUE),
          
Median_WBC = median(WBCRet, na.rm = TRUE),

Min_WBC = min(WBCRet, na.rm = TRUE),

Max_WBC = max(WBCRet, na.rm = TRUE),

Sum_WBC = sum(WBCRet, na.rm = TRUE),

SD_WBC = sd(WBCRet, na.rm = TRUE),

RetperRisk_WBC = sum(WBCRet, na.rm = TRUE)/sd(WBCRet, na.rm = TRUE),

n = n())

# NAB

NAB<- Sumstat %>%
  
summarise(Mean_NAB = mean(NABRet, na.rm = TRUE),
          
Median_NAB = median(NABRet, na.rm = TRUE),

Min_NAB = min(NABRet, na.rm = TRUE),

Max_NAB = max(NABRet, na.rm = TRUE),

Sum_NAB = sum(NABRet, na.rm = TRUE),

SD_NAB = sd(NABRet, na.rm = TRUE),

RetperRisk_NAB = sum(NABRet, na.rm = TRUE)/sd(NABRet, na.rm = TRUE),

n = n())

# ANZ

ANZ<- Sumstat %>%
  
summarise(Mean_ANZ = mean(ANZRet, na.rm = TRUE),
          
Median_ANZ = median(ANZRet, na.rm = TRUE),

Min_ANZ = min(ANZRet, na.rm = TRUE),

Max_ANZ = max(ANZRet, na.rm = TRUE),

Sum_ANZ = sum(ANZRet, na.rm = TRUE),

SD_ANZ = sd(ANZRet, na.rm = TRUE),

RetperRisk_ANZ = sum(ANZRet, na.rm = TRUE)/sd(ANZRet, na.rm = TRUE),

n = n())

# WES

WES<- Sumstat %>%
  
summarise(Mean_WES = mean(WESRet, na.rm = TRUE),
          
Median_WES = median(WESRet, na.rm = TRUE),

Min_WES = min(WESRet, na.rm = TRUE),

Max_WES = max(WESRet, na.rm = TRUE),

Sum_WES = sum(WESRet, na.rm = TRUE),

SD_WES = sd(WESRet, na.rm = TRUE),

RetperRisk_WES = sum(WESRet, na.rm = TRUE)/sd(WESRet, na.rm = TRUE),

n = n())

# FMG

FMG<- Sumstat %>%
  
summarise(Mean_FMG = mean(FMGRet, na.rm = TRUE),
          
Median_FMG = median(FMGRet, na.rm = TRUE),

Min_FMG = min(FMGRet, na.rm = TRUE),

Max_FMG = max(FMGRet, na.rm = TRUE),

Sum_FMG = sum(FMGRet, na.rm = TRUE),

SD_FMG = sd(FMGRet, na.rm = TRUE),

RetperRisk_FMG = sum(FMGRet, na.rm = TRUE)/sd(FMGRet, na.rm = TRUE),

n = n())

# MQG

MQG<- Sumstat %>%
  
summarise(Mean_MQG = mean(MQGRet, na.rm = TRUE),
          
Median_MQG = median(MQGRet, na.rm = TRUE),

Min_MQG = min(MQGRet, na.rm = TRUE),

Max_MQG = max(MQGRet, na.rm = TRUE),

Sum_MQG = sum(MQGRet, na.rm = TRUE),

SD_MQG = sd(MQGRet, na.rm = TRUE),

RetperRisk_MQG = sum(MQGRet, na.rm = TRUE)/sd(MQGRet, na.rm = TRUE),

n = n())

# WOW

WOW<- Sumstat %>%
  
summarise(Mean_WOW = mean(WOWRet, na.rm = TRUE),
          
Median_WOW = median(WOWRet, na.rm = TRUE),

Min_WOW = min(WOWRet, na.rm = TRUE),

Max_WOW = max(WOWRet, na.rm = TRUE),

Sum_WOW = sum(WOWRet, na.rm = TRUE),

SD_WOW = sd(WOWRet, na.rm = TRUE),

RetperRisk_WOW = sum(WOWRet, na.rm = TRUE)/sd(WOWRet, na.rm = TRUE),

n = n())

# CBA

Sumstat %>% 
  
summarise(mean_CBARet = mean(CBARet, na.rm = TRUE), 
    median_CBARet = median(CBARet, na.rm = TRUE), 
     min_CBARet = min(CBARet, na.rm = TRUE),  
    max_CBARet = median(CBARet, na.rm = TRUE),
    sd_CBA = sd(CBARet, na.rm = TRUE),
    RetperRisk_CBA = sum(CBARet, na.rm = TRUE)/sd(CBARet, na.rm = TRUE))

# CSL

Sumstat %>% 
summarise(mean_CSLRet = mean(CSLRet, na.rm = TRUE), 
    median_CSLRet = median(CSLRet, na.rm = TRUE), 
     min_CSLRet = min(CSLRet, na.rm = TRUE),  
    max_CSLRet = median(CSLRet, na.rm = TRUE),
    sd_CSL = sd(CSLRet, na.rm = TRUE),
   RetperRisk_CSL = sum(CSLRet, na.rm = TRUE)/sd(CSLRet, na.rm = TRUE))

# BHP

Sumstat %>% 
summarise(mean_BHPRet = mean(BHPRet, na.rm = TRUE), 
    median_BHPRet = median(BHPRet, na.rm = TRUE), 
     min_BHPRet = min(BHPRet, na.rm = TRUE),  
    max_BHPRet = median(BHPRet, na.rm = TRUE),
    sd_BHP = sd(BHPRet, na.rm = TRUE),
    RetperRisk_BHP = sum(BHPRet, na.rm = TRUE)/sd(BHPRet, na.rm = TRUE))

# WBC

Sumstat %>% 
summarise(mean_WBCRet = mean(WBCRet, na.rm = TRUE), 
    median_WBCRet = median(WBCRet, na.rm = TRUE), 
     min_WBCRet = min(WBCRet, na.rm = TRUE),  
    max_WBCRet = median(WBCRet, na.rm = TRUE),
    sd_WBC = sd(WBCRet, na.rm = TRUE),
     RetperRisk_WBC = sum(WBCRet, na.rm = TRUE)/sd(WBCRet, na.rm = TRUE))

# NAB

Sumstat %>% 
summarise(mean_NABRet = mean(NABRet, na.rm = TRUE), 
    median_NABRet = median(NABRet, na.rm = TRUE), 
     min_NABRet = min(NABRet, na.rm = TRUE),  
    max_NABRet = median(NABRet, na.rm = TRUE),
    sd_NAB = sd(NABRet, na.rm = TRUE),
    RetperRisk_NAB = sum(NABRet, na.rm = TRUE)/sd(NABRet, na.rm = TRUE))

#ANZ

Sumstat %>% 
summarise(mean_ANZRet = mean(ANZRet, na.rm = TRUE), 
    median_ANZRet = median(ANZRet, na.rm = TRUE), 
     min_ANZRet = min(ANZRet, na.rm = TRUE),  
    max_ANZRet = median(ANZRet, na.rm = TRUE),
    sd_ANZ = sd(ANZRet, na.rm = TRUE),
    RetperRisk_ANZ = sum(ANZRet, na.rm = TRUE)/sd(ANZRet, na.rm = TRUE))

#WES

Sumstat %>% 
summarise(mean_WESRet = mean(WESRet, na.rm = TRUE), 
    median_WESRet = median(WESRet, na.rm = TRUE), 
     min_WESRet = min(WESRet, na.rm = TRUE),  
    max_WESRet = median(WESRet, na.rm = TRUE),
    sd_WES = sd(WESRet, na.rm = TRUE),
     RetperRisk_WES = sum(WESRet, na.rm = TRUE)/sd(WESRet, na.rm = TRUE))

#FMG

Sumstat %>% 
summarise(mean_FMGRet = mean(FMGRet, na.rm = TRUE), 
    median_FMGRet = median(FMGRet, na.rm = TRUE), 
     min_FMGRet = min(FMGRet, na.rm = TRUE),  
    max_FMGRet = median(FMGRet, na.rm = TRUE),
    sd_FMG = sd(FMGRet, na.rm = TRUE),
     RetperRisk_FMG = sum(FMGRet, na.rm = TRUE)/sd(FMGRet, na.rm = TRUE))

#MQG

Sumstat %>% 
summarise(mean_MQGRet = mean(MQGRet, na.rm = TRUE), 
    median_MQGRet = median(MQGRet, na.rm = TRUE), 
     min_MQGRet = min(MQGRet, na.rm = TRUE),  
    max_MQGRet = median(MQGRet, na.rm = TRUE),
    sd_MQG = sd(MQGRet, na.rm = TRUE),
    RetperRisk_MQG = sum(MQGRet, na.rm = TRUE)/sd(MQGRet, na.rm = TRUE))

#WOW

Sumstat %>% 
summarise(mean_WOWRet = mean(WOWRet, na.rm = TRUE), 
    median_WOWRet = median(WOWRet, na.rm = TRUE), 
     min_WOWRet = min(WOWRet, na.rm = TRUE),  
    max_WOWRet = median(WOWRet, na.rm = TRUE),
    sd_WOWFRet = sd(WOWRet, na.rm = TRUE),
      RetperRisk_WOW = sum(WOWRet, na.rm = TRUE)/sd(WOWRet, na.rm = TRUE))

Step 7 Create a list

# This is a chunk where a list that contains a numeric value for each response to the categorical variable (company) is created

# list1 creates the ranking of the companies, where the lowest number represents the highest rank.

list1<- tbl_df(data.frame(Ranking=c(1:10)))

# list2 creates the names of the companies in the order of the market capitalization

list2<- tbl_df(data.frame(By_Market_Cap = c("CBA", "CSL", "BHP", "WBC", "NAB", "ANZ", "WES", "FMG", "MQG", "WOW")))

# list3 creates the names of the companies in the order of the Return per unit risk (or standard deviation)

list3<- tbl_df(data.frame(By_ReturnperRisk = c("FMG", "WES", "CBA", "MQG", "BHP", "WOW", "NAB", "ANZ", "CSL", "WBC")))

           
list1

list2

list3

Step 8 Join the list

# This is a chunk where the lists are joined

# clistM shows the Ranking by Market Capitalization

clistM<- cbind (list1, list2) 

clistM

# clistR shows the Ranking by Return per unit risk (or standard deviation)

clistR<- cbind (list1, list3) 

clistR

Step 9 Subsetting I

# This is a chunk to subset the data frame using first 10 observations (including all 10 variables), and converted to a matrix.

SI <- read.xlsx("data/DWA1X.xlsx", sheet = "A1Ret", cols = 1:10, rows= 1: 11)

str(SI)

## 'data.frame':    10 obs. of  10 variables:
##  $ CBARet: num  0.00537 -0.00675 0.01777 -0.0037 0.00727 ...
##  $ CSLRet: num  0.00818 -0.0018 0.02363 0.00854 0.0182 ...
##  $ BHPRet: num  0.005122 0.006365 0.0038 0.000506 0.011558 ...
##  $ WBCRet: num  0.007003 -0.000821 0.011438 -0.003255 -0.002857 ...
##  $ NABRet: num  0.006086 -0.006901 0.014156 -0.003218 0.000403 ...
##  $ ANZRet: num  0.00731 -0.00405 0.01291 -0.00361 0.00121 ...
##  $ WESRet: num  0.00961 0.00857 0.02041 -0.00349 0.00858 ...
##  $ FMGRet: num  -0.00279 0.00186 -0.00746 -0.00469 0.01678 ...
##  $ MQGRet: num  0.00609 -0.00536 0.01357 -0.00727 0.00454 ...
##  $ WOWRet: num  0.001928 -0.001101 0.017202 0.008357 -0.000806 ...

SI_matrix <- as.matrix(SI)  

str(SI_matrix)

##  num [1:10, 1:10] 0.00537 -0.00675 0.01777 -0.0037 0.00727 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:10] "1" "2" "3" "4" ...
##   ..$ : chr [1:10] "CBARet" "CSLRet" "BHPRet" "WBCRet" ...

SI_matrix

##          CBARet       CSLRet        BHPRet        WBCRet        NABRet
## 1   0.005368665  0.008183389  0.0051215304  0.0070031118  0.0060864241
## 2  -0.006746547 -0.001804790  0.0063654646 -0.0008214224 -0.0069007535
## 3   0.017767699  0.023634762  0.0037997747  0.0114381180  0.0141559833
## 4  -0.003701408  0.008537224  0.0005056133 -0.0032546934 -0.0032180119
## 5   0.007266540  0.018199769  0.0115578676 -0.0028566892  0.0004027757
## 6   0.012318030  0.027779426 -0.0032528395  0.0073290296  0.0040193682
## 7  -0.000242517 -0.016167423 -0.0093165105 -0.0024369477  0.0000000000
## 8   0.008330764  0.006768229  0.0130687872 -0.0008136939  0.0052010328
## 9   0.005874383  0.004173394  0.0032407623  0.0048720927  0.0043798017
## 10  0.009517037  0.010524320 -0.0012452463  0.0084695003  0.0075203767
##          ANZRet       WESRet       FMGRet       MQGRet        WOWRet
## 1   0.007305234  0.009606373 -0.002786896  0.006089294  0.0019275015
## 2  -0.004051945  0.008567390  0.001858835 -0.005362303 -0.0011009330
## 3   0.012908764  0.020406461 -0.007455910  0.013567292  0.0172018171
## 4  -0.003613760 -0.003488836 -0.004688046 -0.007266055  0.0083569304
## 5   0.001205925  0.008583815  0.016775796  0.004538653 -0.0008057017
## 6   0.009198311  0.013992665 -0.011152621  0.008730540  0.0159918594
## 7  -0.004388593 -0.006857156  0.008376167 -0.006074761 -0.0103655325
## 8   0.007568263 -0.003906750  0.018365845  0.011616888  0.0250638949
## 9   0.001982114  0.005510785 -0.012820626  0.002688922 -0.0018255685
## 10  0.006710109  0.007301140  0.013730240  0.013755526  0.0096117718

# The matrix contains 10 character variables and 10x10 numeric variables.  

# The structure follows a tidy data format as discussed in Step 5. 

# The matrix is shown in a wide format (preferred under the 3 tidy data principles) where a single company's observation has its own row, each company has its own column, and each value has its own cell.  The matrix has a wide layout containing all the measured information in different columns.

Step 10 Subsetting II

# To extract the first (CBARet) and last variables (WOWRet) in the data set

SII <- DWA1[ , c(1, 10)]

str(SII)

## 'data.frame':    425 obs. of  2 variables:
##  $ CBARet: num  0.00537 -0.00675 0.01777 -0.0037 0.00727 ...
##  $ WOWRet: num  0.001928 -0.001101 0.017202 0.008357 -0.000806 ...

# To save the subset as an R object file (.RData).

save(SII, file = "output/SII.RData")

Step 11 Create a new Data Frame

# This is a chunk to create a new data frame with the given specifications.

# This dataframe (df3) is designed to report the survey results to the question:  Australia did a great job in managing the Covid crisis.

# The integer variable, where score 1 represents "Strongly_Disagree", score 7 "Strongly_Agree"

Int_vector<- tbl_df(data.frame(Score=c(1:7)))

# The ordinal variable.   It is a factor and ordered properly, and the names of the variables are given.

Ord_vector<- c("Agree", "Disagree", "Strongly_Agree","Strongly_Disagree", "Neutral", "Agree", "Disagree")

factor_Ord_vector<- tbl_df(data.frame(Decision = factor(Ord_vector, order = TRUE, levels = c("Strongly_Disagree","Disagree", "Slightly_Disagree", "Neutral", "Slightly_Agree", "Agree", "Strongly_Agree"))))

# A numeric variable. These are the survey results from Australia expressed in %.

Austnum_vector <- tbl_df(data.frame(Aust_Percentage =c(21, 14, 6, 8, 14, 15, 22)))

# Creating a new dataframe from the 3 variables above

df3<- cbind (Int_vector, factor_Ord_vector, Austnum_vector)

# The interger variable contains 7 observations (integers) and 1 variable named as Score.

# The factor_Ord_vector contains 7 observations (ordinals) and 1 variable named as Decision.

# The factor has 7 levels: strongly_Disagree.....to Strongly_Agree, assigned as 6, 2, 7, 1, 4, 6 and 2 by R.

# A new numeric vector called Austnum_vector is created with the Australian results expressed in %.  cbind() is used to combine this vector to the data frame created by the interger and the ordinal variables.

# The Austnum_vector contains 7 observations (numeric) and 1 variable named as Australian_Percentage.

# The new dataframe is called df3.  It contains 1 integer variable (quantitative discrete), 1 ordinal variable (qualitative ordinal) and a nuumeric variable (quantitative continuous).  The dimension is 7 x 3.

str(Int_vector)

## Classes 'tbl_df', 'tbl' and 'data.frame':    7 obs. of  1 variable:
##  $ Score: int  1 2 3 4 5 6 7

str(factor_Ord_vector)

## Classes 'tbl_df', 'tbl' and 'data.frame':    7 obs. of  1 variable:
##  $ Decision: Ord.factor w/ 7 levels "Strongly_Disagree"<..: 6 2 7 1 4 6 2

str(Austnum_vector)

## Classes 'tbl_df', 'tbl' and 'data.frame':    7 obs. of  1 variable:
##  $ Aust_Percentage: num  21 14 6 8 14 15 22

str(df3)

## 'data.frame':    7 obs. of  3 variables:
##  $ Score          : int  1 2 3 4 5 6 7
##  $ Decision       : Ord.factor w/ 7 levels "Strongly_Disagree"<..: 6 2 7 1 4 6 2
##  $ Aust_Percentage: num  21 14 6 8 14 15 22

attributes(df3)

## $names
## [1] "Score"           "Decision"        "Aust_Percentage"
## 
## $class
## [1] "data.frame"
## 
## $row.names
## [1] 1 2 3 4 5 6 7

Step 12 Create another Data Frame

# Another data frame is created with a common variable to the dataset created in step 11. The common variable is the ordinal variable named "factor_ord_vector."

factor_Ord_vector<- tbl_df(data.frame(Decision = factor(Ord_vector, order = TRUE, levels = c("Strongly_Disagree","Disagree", "Slightly_Disagree", "Neutral", "Slightly_Agree", "Agree", "Strongly_Agree"))))

# A new variable called USnum_vector is created with the US results expressed in %.

USnum_vector <- tbl_df(data.frame(US_Percentage =c(3, 24, 18, 12, 18, 20, 5)))


# The new dataframe is called df4.  It contains 1 integer variable (quantitative discrete), 1 ordinal variable (qualitative ordinal) and a nuumeric variable (quantitative continuous).  The dimension is 7 x 3.

df4<- cbind (Int_vector, factor_Ord_vector, USnum_vector)

str(Int_vector)

## Classes 'tbl_df', 'tbl' and 'data.frame':    7 obs. of  1 variable:
##  $ Score: int  1 2 3 4 5 6 7

str(factor_Ord_vector)

## Classes 'tbl_df', 'tbl' and 'data.frame':    7 obs. of  1 variable:
##  $ Decision: Ord.factor w/ 7 levels "Strongly_Disagree"<..: 6 2 7 1 4 6 2

str(USnum_vector)

## Classes 'tbl_df', 'tbl' and 'data.frame':    7 obs. of  1 variable:
##  $ US_Percentage: num  3 24 18 12 18 20 5

str(df4)

## 'data.frame':    7 obs. of  3 variables:
##  $ Score        : int  1 2 3 4 5 6 7
##  $ Decision     : Ord.factor w/ 7 levels "Strongly_Disagree"<..: 6 2 7 1 4 6 2
##  $ US_Percentage: num  3 24 18 12 18 20 5

# Join the data frame df4 to df3 from Step 11.

Step12df <- cbind (Int_vector, factor_Ord_vector, Austnum_vector, USnum_vector)

# The new dataframe is called Step12df.  It contains 1 integer variable (quantitative discrete), 1 ordinal variable (qualitative ordinal) and 2 nuumeric variables (quantitative continuous).  The dimension is 7 x 4.

## This dataframe (Step12df) is designed to provide a comparison of the survey results to the questions. 1. Australia did a great job in managing the Covid crisis. 2. USA did a great job in managing the Covid crisis.

str(Step12df)

## 'data.frame':    7 obs. of  4 variables:
##  $ Score          : int  1 2 3 4 5 6 7
##  $ Decision       : Ord.factor w/ 7 levels "Strongly_Disagree"<..: 6 2 7 1 4 6 2
##  $ Aust_Percentage: num  21 14 6 8 14 15 22
##  $ US_Percentage  : num  3 24 18 12 18 20 5

** Data Reference **

List of ASX Companies viewed 4 Sep 2021 https://www.marketindex.com.au/asx-listed-companies
Price of top 10 companies listed on ASX viewed on 5 Sep 2021 https://finance.yahoo.com/

** Reference **

RMIT Online Jul-Dec 2021 (2149), Data Wrangling Course Notes, Math 2405, RMIT University.
WHO reported Covid-19 on 4 Jan 2020, viewed 5 Sep 2021 https://www.who.int/news/item/27-04-2020-who-timeline---covid-19