Looking at the potential of Australian Stocks using data vizualization and data wrangling in R

In this project, I investigate potential of future Australian Stocks based on historical data and trends using data visualization in R. Link to GitHub

Data is taken from ASX Historical Data at ASX Link

Stock Market List Company Names for data merge name Link

Analyse Data

## [1] "20192020_1.csv"
##     ticker              name               date                open         
##  Length:1021320     Length:1021320     Length:1021320     Min.   :    0.00  
##  Class :character   Class :character   Class :character   1st Qu.:    0.06  
##  Mode  :character   Mode  :character   Mode  :character   Median :    0.34  
##                                                           Mean   :  119.96  
##                                                           3rd Qu.:    2.51  
##                                                           Max.   :63972.90  
##       high               low               close              volume         
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :        0  
##  1st Qu.:    0.06   1st Qu.:    0.06   1st Qu.:    0.06   1st Qu.:    38323  
##  Median :    0.34   Median :    0.33   Median :    0.34   Median :   206667  
##  Mean   :  120.88   Mean   :  119.10   Mean   :  120.03   Mean   :  1376658  
##  3rd Qu.:    2.55   3rd Qu.:    2.47   3rd Qu.:    2.51   3rd Qu.:   949410  
##  Max.   :64495.20   Max.   :63361.70   Max.   :63972.90   Max.   :718225752

Data summary shows disparity in median share prices, factoring in the total volume of shares traded (volume) during this period, which incorporated the start of covid-19 in Australia, the Share Market is still a place of high activity catering to the needs of share holders across the board. Further investigation will now break down shares by 1. volume of shares traded (popularity) by a top 20 investigation, 2. top earning shares by volume traded by price (top 20 again), 3. shares with the highest increase in value over the period assessed regardless of popularity, but with a minimal volume requirement to qualify Formatting Dates

Highest Volume of shares traded (top 20)

Code and name change – Novita Healthcare Limited (ASX Code: NHL) to Tali Digital Limited (ASX Code: TD1) NHL will appear as #NA on the chart legend but it has changed name ref ASX Link

Look Up for ASX Codes at: ASX Code Look Up and marketwatch.com

Analysis - Data volumes show with Tesltra (TLS) is the standout stock in terms of volumes traded. Then what stands out are the mining shares show that the Australian Economy is heavily dependant on mining. The next chart will delve into the top 20 $amounts traded by total

Second Analysis

Share $ Value by highest Volume of shares traded (top 20)

Closing price of top 20 stocks (by volume * close price) over period Jan ’19 - Jul ’20

Plot will show not much variance in the top priced shares over an 18 month period, so in establishing what shares to be looked at, shares that have the highest variance in price by price rise will be looked at

This plot will look at the top n (no fixed amount of shares) that have risen in price since the start of Jan ’19. The amount to which the shares has risen will be calculated as a amount from its opening price at it’s inception (from Jan ’19 most likely) to its closing price at the closure of the dataset (Jul ’20 or around this time)

##     ticker              name               date                open         
##  Length:1021320     Length:1021320     Length:1021320     Min.   :    0.00  
##  Class :character   Class :character   Class :character   1st Qu.:    0.06  
##  Mode  :character   Mode  :character   Mode  :character   Median :    0.34  
##                                                           Mean   :  119.96  
##                                                           3rd Qu.:    2.51  
##                                                           Max.   :63972.90  
##       high               low               close              volume         
##  Min.   :    0.00   Min.   :    0.00   Min.   :    0.00   Min.   :        0  
##  1st Qu.:    0.06   1st Qu.:    0.06   1st Qu.:    0.06   1st Qu.:    38323  
##  Median :    0.34   Median :    0.33   Median :    0.34   Median :   206667  
##  Mean   :  120.88   Mean   :  119.10   Mean   :  120.03   Mean   :  1376658  
##  3rd Qu.:    2.55   3rd Qu.:    2.47   3rd Qu.:    2.51   3rd Qu.:   949410  
##  Max.   :64495.20   Max.   :63361.70   Max.   :63972.90   Max.   :718225752
## # A tibble: 6 x 5
## # Groups:   ticker, name [6]
##   ticker name                     date        open close
##   <chr>  <chr>                    <date>     <dbl> <dbl>
## 1 14D    1414 Degrees Ltd         2020-06-19 0.069 0.077
## 2 14DO   #N/A                     2020-07-16 0.001 0.008
## 3 1AD    Adalta Ltd               2020-03-25 0.041 0.041
## 4 1ADO   #N/A                     2020-06-09 0.006 0.006
## 5 1AG    Alterra Ltd              2019-02-04 0.026 0.026
## 6 1AL    Oneall International Ltd 2020-07-09 0.21  0.27
##     ticker              name                date                 open         
##  Length:2679        Length:2679        Min.   :2019-01-02   Min.   :    0.00  
##  Class :character   Class :character   1st Qu.:2019-08-20   1st Qu.:    0.01  
##  Mode  :character   Mode  :character   Median :2020-03-24   Median :    0.09  
##                                        Mean   :2020-01-04   Mean   :   77.21  
##                                        3rd Qu.:2020-04-24   3rd Qu.:    1.18  
##                                        Max.   :2020-07-17   Max.   :59513.50  
##      close         
##  Min.   :    0.00  
##  1st Qu.:    0.01  
##  Median :    0.09  
##  Mean   :   80.40  
##  3rd Qu.:    1.23  
##  Max.   :61828.30

N/A Shares 14DO - OPTION EXPIRING 21-AUG-2020, 1ADO - OPTION EXPIRING 30-JUN-2021 LINK A1C - status: delisted LINK AASF - Airlie Australian Share Fund (Managed Fund) – ISIN Change ASX Code: AASF, AAU - Adcorp Australia Limited – Cessation of an approval ASX Code: AAU and Cessation of an approval Animoca Brands Corporation Limited ASX Code: AB1 LINK

Plot of shares by maximum closing value over closing values

## # A tibble: 6 x 8
##   ticker name                  date       open  high   low close  volume
##   <chr>  <chr>                 <chr>     <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 14D    1414 Degrees Ltd      2/01/2019 0.31  0.31  0.3   0.3     26036
## 2 3DP    Pointerra Ltd         2/01/2019 0.044 0.045 0.043 0.043  854110
## 3 3PL    3P Learning Ltd       2/01/2019 1.2   1.21  1.20  1.20   101947
## 4 4CE    Force Commodities Ltd 2/01/2019 0.015 0.015 0.014 0.014  483448
## 5 4DS    4DS Memory Ltd        2/01/2019 0.056 0.059 0.054 0.057 9090917
## 6 5GN    5G Networks Ltd       2/01/2019 0.42  0.42  0.42  0.42     3580
## $data
## # A tibble: 2,190 x 13
## # Groups:   ticker, name [2,190]
##    ticker name  date        open close volume min_close min_open max_close
##    <chr>  <chr> <date>     <dbl> <dbl>  <dbl>     <dbl>    <dbl>     <dbl>
##  1 IVV    Isha… 2019-01-02  359.  358.   6940     352.     352.       510.
##  2 SPY    SPDR… 2019-01-02  355.  353.    211     348.     349.       509.
##  3 IHVV   Isha… 2020-06-09  393.  393.  13747     360.     366.       393.
##  4 CSL    CSL … 2019-01-02  185.  185. 487741     184      185.       341 
##  5 IJH    Isha… 2019-01-02  235   235.    232     210.     150.       318.
##  6 VTS    Vang… 2019-01-02  182.  180.   3173     179.     179        259.
##  7 COH    Coch… 2019-01-02  174.  175. 200503     159.     155.       252.
##  8 GOLD   ETFs… 2020-06-09  227.  228.  44846     228.     227.       243.
##  9 MQG    Macq… 2019-01-02  108.  107. 497113      72.0     74.5      152.
## 10 ILB    Isha… 2019-01-02  118.  119.    240     114      112        132.
## # … with 2,180 more rows, and 4 more variables: total_volume <dbl>,
## #   sum_close <dbl>, variation <dbl>, .group <int>
## 
## $layers
## $layers[[1]]
## geom_line: na.rm = FALSE, orientation = NA, flipped_aes = FALSE
## stat_identity: na.rm = FALSE
## position_identity 
## 
## 
## $scales
## <ggproto object: Class ScalesList, gg>
##     add: function
##     clone: function
##     find: function
##     get_scales: function
##     has_scale: function
##     input: function
##     n: function
##     non_position_scales: function
##     scales: list
##     super:  <ggproto object: Class ScalesList, gg>
## 
## $mapping
## Aesthetic mapping: 
## * `x`      -> `date`
## * `y`      -> `variation`
## * `colour` -> `name`
## 
## $theme
## $theme$axis.text.x
## List of 11
##  $ family       : NULL
##  $ face         : NULL
##  $ colour       : NULL
##  $ size         : NULL
##  $ hjust        : num 1
##  $ vjust        : NULL
##  $ angle        : num 90
##  $ lineheight   : NULL
##  $ margin       : NULL
##  $ debug        : NULL
##  $ inherit.blank: logi FALSE
##  - attr(*, "class")= chr [1:2] "element_text" "element"
## 
## $theme$plot.title
## List of 11
##  $ family       : NULL
##  $ face         : NULL
##  $ colour       : NULL
##  $ size         : NULL
##  $ hjust        : num 0.8
##  $ vjust        : NULL
##  $ angle        : NULL
##  $ lineheight   : NULL
##  $ margin       : NULL
##  $ debug        : NULL
##  $ inherit.blank: logi FALSE
##  - attr(*, "class")= chr [1:2] "element_text" "element"
## 
## attr(,"complete")
## [1] FALSE
## attr(,"validate")
## [1] TRUE
## 
## $coordinates
## <ggproto object: Class CoordCartesian, Coord, gg>
##     aspect: function
##     backtransform_range: function
##     clip: on
##     default: TRUE
##     distance: function
##     expand: TRUE
##     is_free: function
##     is_linear: function
##     labels: function
##     limits: list
##     modify_scales: function
##     range: function
##     render_axis_h: function
##     render_axis_v: function
##     render_bg: function
##     render_fg: function
##     setup_data: function
##     setup_layout: function
##     setup_panel_guides: function
##     setup_panel_params: function
##     setup_params: function
##     train_panel_guides: function
##     transform: function
##     super:  <ggproto object: Class CoordCartesian, Coord, gg>

Plot of shares for #N/A

## # A tibble: 6 x 8
##   ticker name                  date       open  high   low close  volume
##   <chr>  <chr>                 <chr>     <dbl> <dbl> <dbl> <dbl>   <dbl>
## 1 14D    1414 Degrees Ltd      2/01/2019 0.31  0.31  0.3   0.3     26036
## 2 3DP    Pointerra Ltd         2/01/2019 0.044 0.045 0.043 0.043  854110
## 3 3PL    3P Learning Ltd       2/01/2019 1.2   1.21  1.20  1.20   101947
## 4 4CE    Force Commodities Ltd 2/01/2019 0.015 0.015 0.014 0.014  483448
## 5 4DS    4DS Memory Ltd        2/01/2019 0.056 0.059 0.054 0.057 9090917
## 6 5GN    5G Networks Ltd       2/01/2019 0.42  0.42  0.42  0.42     3580
## $data
## # A tibble: 488 x 13
## # Groups:   ticker, name [488]
##    ticker name  date         open  close volume min_close min_open max_close
##    <chr>  <chr> <date>      <dbl>  <dbl>  <dbl>     <dbl>    <dbl>     <dbl>
##  1 XNT    #N/A  2020-06-09 62415. 63937.      0    59514.   59514.    63973.
##  2 XHJ    #N/A  2019-01-02 28772. 28768.      0    28768.   28768.    48747.
##  3 XMJ    #N/A  2019-01-02 11320. 11138.      0     9695.    9695.    14764.
##  4 XSJ    #N/A  2019-01-02 10346. 10276.      0    10207.   10207.    13635 
##  5 XEJ    #N/A  2019-01-02  9728.  9477.      0     5023.    5023.    12139.
##  6 XGD    #N/A  2019-01-02  5465.  5448       0     5141.    5141.     9031.
##  7 XUJ    #N/A  2019-01-02  7409.  7341.      0     6363.    6363.     8623.
##  8 XMD    #N/A  2019-01-02  6119.  6036.      0     4511.    4511.     7662.
##  9 XNJ    #N/A  2019-01-02  5672.  5599.      0     4321.    4321.     7321.
## 10 XXJ    #N/A  2019-01-02  6206.  6081.      0     4081.    4081.     7286.
## # … with 478 more rows, and 4 more variables: total_volume <dbl>,
## #   sum_close <dbl>, variation <dbl>, .group <int>
## 
## $layers
## $layers[[1]]
## geom_line: na.rm = FALSE, orientation = NA, flipped_aes = FALSE
## stat_identity: na.rm = FALSE
## position_identity 
## 
## 
## $scales
## <ggproto object: Class ScalesList, gg>
##     add: function
##     clone: function
##     find: function
##     get_scales: function
##     has_scale: function
##     input: function
##     n: function
##     non_position_scales: function
##     scales: list
##     super:  <ggproto object: Class ScalesList, gg>
## 
## $mapping
## Aesthetic mapping: 
## * `x`      -> `date`
## * `y`      -> `variation`
## * `colour` -> `ticker`
## 
## $theme
## $theme$axis.text.x
## List of 11
##  $ family       : NULL
##  $ face         : NULL
##  $ colour       : NULL
##  $ size         : NULL
##  $ hjust        : num 1
##  $ vjust        : NULL
##  $ angle        : num 90
##  $ lineheight   : NULL
##  $ margin       : NULL
##  $ debug        : NULL
##  $ inherit.blank: logi FALSE
##  - attr(*, "class")= chr [1:2] "element_text" "element"
## 
## $theme$plot.title
## List of 11
##  $ family       : NULL
##  $ face         : NULL
##  $ colour       : NULL
##  $ size         : NULL
##  $ hjust        : num 0.8
##  $ vjust        : NULL
##  $ angle        : NULL
##  $ lineheight   : NULL
##  $ margin       : NULL
##  $ debug        : NULL
##  $ inherit.blank: logi FALSE
##  - attr(*, "class")= chr [1:2] "element_text" "element"
## 
## attr(,"complete")
## [1] FALSE
## attr(,"validate")
## [1] TRUE
## 
## $coordinates
## <ggproto object: Class CoordCartesian, Coord, gg>
##     aspect: function
##     backtransform_range: function
##     clip: on
##     default: TRUE
##     distance: function
##     expand: TRUE
##     is_free: function
##     is_linear: function
##     labels: function
##     limits: list
##     modify_scales: function
##     range: function
##     render_axis_h: function
##     render_axis_v: function
##     render_bg: function
##     render_fg: function
##     setup_data: function
##     setup_layout: function
##     setup_panel_guides: function
##     setup_panel_params: function
##     setup_params: function
##     train_panel_guides: function
##     transform: function
##     super:  <ggproto object: Class CoordCartesian, Coord, gg>
## $data
## # A tibble: 22 x 14
## # Groups:   ticker, name [22]
##    ticker name  date        open close volume min_close min_open max_close
##    <chr>  <chr> <date>     <dbl> <dbl>  <dbl>     <dbl>    <dbl>     <dbl>
##  1 HCD    Hydr… 2020-01-02 0.085 0.086 9.08e4     0.013    0.014     0.086
##  2 TSO    Teso… 2020-02-07 0.028 0.028 9.56e6     0.017    0.02      0.09 
##  3 BHD    Benj… 2020-06-09 0.71  0.71  0.         0.185    0.185     0.71 
##  4 EMD    Emer… 2020-02-12 0.145 0.145 4.81e6     0.04     0.04      0.145
##  5 LCL    Los … 2020-01-21 0.048 0.055 6.97e4     0.016    0.016     0.056
##  6 DCX    Disc… 2020-03-26 0.003 0.003 8.60e5     0.003    0.003     0.01 
##  7 CBE    Cobr… 2020-01-31 0.23  0.25  6.10e6     0.085    0.09      0.28 
##  8 TDY    Thed… 2020-02-14 0.25  0.245 6.50e5     0.08     0.08      0.26 
##  9 HTG    Harv… 2020-04-03 0.087 0.09  3.07e5     0.09     0.087     0.285
## 10 EMUCA  EMU … 2020-06-09 0.047 0.047 0.         0.015    0.015     0.047
## # … with 12 more rows, and 5 more variables: total_volume <dbl>,
## #   sum_close <dbl>, variation <dbl>, max_variation <dbl>, .group <int>
## 
## $layers
## $layers[[1]]
## geom_bar: width = NULL, na.rm = FALSE, orientation = NA, flipped_aes = FALSE
## stat_identity: na.rm = FALSE
## position_stack 
## 
## 
## $scales
## <ggproto object: Class ScalesList, gg>
##     add: function
##     clone: function
##     find: function
##     get_scales: function
##     has_scale: function
##     input: function
##     n: function
##     non_position_scales: function
##     scales: list
##     super:  <ggproto object: Class ScalesList, gg>
## 
## $mapping
## Aesthetic mapping: 
## * `x`      -> `date`
## * `y`      -> `max_variation`
## * `colour` -> `name`
## 
## $theme
## $theme$axis.text.x
## List of 11
##  $ family       : NULL
##  $ face         : NULL
##  $ colour       : NULL
##  $ size         : NULL
##  $ hjust        : num 1
##  $ vjust        : NULL
##  $ angle        : num 90
##  $ lineheight   : NULL
##  $ margin       : NULL
##  $ debug        : NULL
##  $ inherit.blank: logi FALSE
##  - attr(*, "class")= chr [1:2] "element_text" "element"
## 
## $theme$plot.title
## List of 11
##  $ family       : NULL
##  $ face         : NULL
##  $ colour       : NULL
##  $ size         : NULL
##  $ hjust        : num 0.8
##  $ vjust        : NULL
##  $ angle        : NULL
##  $ lineheight   : NULL
##  $ margin       : NULL
##  $ debug        : NULL
##  $ inherit.blank: logi FALSE
##  - attr(*, "class")= chr [1:2] "element_text" "element"
## 
## attr(,"complete")
## [1] FALSE
## attr(,"validate")
## [1] TRUE
## 
## $coordinates
## <ggproto object: Class CoordCartesian, Coord, gg>
##     aspect: function
##     backtransform_range: function
##     clip: on
##     default: TRUE
##     distance: function
##     expand: TRUE
##     is_free: function
##     is_linear: function
##     labels: function
##     limits: list
##     modify_scales: function
##     range: function
##     render_axis_h: function
##     render_axis_v: function
##     render_bg: function
##     render_fg: function
##     setup_data: function
##     setup_layout: function
##     setup_panel_guides: function
##     setup_panel_params: function
##     setup_params: function
##     train_panel_guides: function
##     transform: function
##     super:  <ggproto object: Class CoordCartesian, Coord, gg>

Create Dataset for predictive part - called “shares”

## # A tibble: 6 x 13
## # Groups:   ticker, name [6]
##   ticker name  date        open close volume min_close min_open max_close
##   <chr>  <chr> <date>     <dbl> <dbl>  <dbl>     <dbl>    <dbl>     <dbl>
## 1 HCD    Hydr… 2020-01-02 0.085 0.086 9.08e4     0.013    0.014     0.086
## 2 TSO    Teso… 2020-02-07 0.028 0.028 9.56e6     0.017    0.02      0.09 
## 3 BHD    Benj… 2020-06-09 0.71  0.71  0.         0.185    0.185     0.71 
## 4 EMD    Emer… 2020-02-12 0.145 0.145 4.81e6     0.04     0.04      0.145
## 5 LCL    Los … 2020-01-21 0.048 0.055 6.97e4     0.016    0.016     0.056
## 6 DCX    Disc… 2020-03-26 0.003 0.003 8.60e5     0.003    0.003     0.01 
## # … with 4 more variables: total_volume <dbl>, sum_close <dbl>,
## #   variation <dbl>, max_variation <dbl>

Bar chart of top 100 shares that had the maximum variance between its top closing price

and its minimum opening price

Plot of the most productive shares from 01/01/2020 in terms of greatest variance in price rise

start of ransom forrest datasets - did not work 020121 “all models failed”