Looking at the potential of Australian Stocks using data vizualization and data wrangling in R
In this project, I investigate potential of future Australian Stocks based on historical data and trends using data visualization in R. Link to GitHub
Data is taken from ASX Historical Data at ASX Link
Stock Market List Company Names for data merge name Link
Analyse Data
## [1] "20192020_1.csv"
## ticker name date open
## Length:1021320 Length:1021320 Length:1021320 Min. : 0.00
## Class :character Class :character Class :character 1st Qu.: 0.06
## Mode :character Mode :character Mode :character Median : 0.34
## Mean : 119.96
## 3rd Qu.: 2.51
## Max. :63972.90
## high low close volume
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0
## 1st Qu.: 0.06 1st Qu.: 0.06 1st Qu.: 0.06 1st Qu.: 38323
## Median : 0.34 Median : 0.33 Median : 0.34 Median : 206667
## Mean : 120.88 Mean : 119.10 Mean : 120.03 Mean : 1376658
## 3rd Qu.: 2.55 3rd Qu.: 2.47 3rd Qu.: 2.51 3rd Qu.: 949410
## Max. :64495.20 Max. :63361.70 Max. :63972.90 Max. :718225752
Data summary shows disparity in median share prices, factoring in the total volume of shares traded (volume) during this period, which incorporated the start of covid-19 in Australia, the Share Market is still a place of high activity catering to the needs of share holders across the board. Further investigation will now break down shares by 1. volume of shares traded (popularity) by a top 20 investigation, 2. top earning shares by volume traded by price (top 20 again), 3. shares with the highest increase in value over the period assessed regardless of popularity, but with a minimal volume requirement to qualify Formatting Dates
Highest Volume of shares traded (top 20)
Code and name change – Novita Healthcare Limited (ASX Code: NHL) to Tali Digital Limited (ASX Code: TD1) NHL will appear as #NA on the chart legend but it has changed name ref ASX Link
Look Up for ASX Codes at: ASX Code Look Up and marketwatch.com
Analysis - Data volumes show with Tesltra (TLS) is the standout stock in terms of volumes traded. Then what stands out are the mining shares show that the Australian Economy is heavily dependant on mining. The next chart will delve into the top 20 $amounts traded by total
Second Analysis
Share $ Value by highest Volume of shares traded (top 20)
Closing price of top 20 stocks (by volume * close price) over period Jan ’19 - Jul ’20
Plot will show not much variance in the top priced shares over an 18 month period, so in establishing what shares to be looked at, shares that have the highest variance in price by price rise will be looked at
This plot will look at the top n (no fixed amount of shares) that have risen in price since the start of Jan ’19. The amount to which the shares has risen will be calculated as a amount from its opening price at it’s inception (from Jan ’19 most likely) to its closing price at the closure of the dataset (Jul ’20 or around this time)
## ticker name date open
## Length:1021320 Length:1021320 Length:1021320 Min. : 0.00
## Class :character Class :character Class :character 1st Qu.: 0.06
## Mode :character Mode :character Mode :character Median : 0.34
## Mean : 119.96
## 3rd Qu.: 2.51
## Max. :63972.90
## high low close volume
## Min. : 0.00 Min. : 0.00 Min. : 0.00 Min. : 0
## 1st Qu.: 0.06 1st Qu.: 0.06 1st Qu.: 0.06 1st Qu.: 38323
## Median : 0.34 Median : 0.33 Median : 0.34 Median : 206667
## Mean : 120.88 Mean : 119.10 Mean : 120.03 Mean : 1376658
## 3rd Qu.: 2.55 3rd Qu.: 2.47 3rd Qu.: 2.51 3rd Qu.: 949410
## Max. :64495.20 Max. :63361.70 Max. :63972.90 Max. :718225752
## # A tibble: 6 x 5
## # Groups: ticker, name [6]
## ticker name date open close
## <chr> <chr> <date> <dbl> <dbl>
## 1 14D 1414 Degrees Ltd 2020-06-19 0.069 0.077
## 2 14DO #N/A 2020-07-16 0.001 0.008
## 3 1AD Adalta Ltd 2020-03-25 0.041 0.041
## 4 1ADO #N/A 2020-06-09 0.006 0.006
## 5 1AG Alterra Ltd 2019-02-04 0.026 0.026
## 6 1AL Oneall International Ltd 2020-07-09 0.21 0.27
## ticker name date open
## Length:2679 Length:2679 Min. :2019-01-02 Min. : 0.00
## Class :character Class :character 1st Qu.:2019-08-20 1st Qu.: 0.01
## Mode :character Mode :character Median :2020-03-24 Median : 0.09
## Mean :2020-01-04 Mean : 77.21
## 3rd Qu.:2020-04-24 3rd Qu.: 1.18
## Max. :2020-07-17 Max. :59513.50
## close
## Min. : 0.00
## 1st Qu.: 0.01
## Median : 0.09
## Mean : 80.40
## 3rd Qu.: 1.23
## Max. :61828.30
N/A Shares 14DO - OPTION EXPIRING 21-AUG-2020, 1ADO - OPTION EXPIRING 30-JUN-2021 LINK A1C - status: delisted LINK AASF - Airlie Australian Share Fund (Managed Fund) – ISIN Change ASX Code: AASF, AAU - Adcorp Australia Limited – Cessation of an approval ASX Code: AAU and Cessation of an approval Animoca Brands Corporation Limited ASX Code: AB1 LINK
Plot of shares by maximum closing value over closing values
## # A tibble: 6 x 8
## ticker name date open high low close volume
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 14D 1414 Degrees Ltd 2/01/2019 0.31 0.31 0.3 0.3 26036
## 2 3DP Pointerra Ltd 2/01/2019 0.044 0.045 0.043 0.043 854110
## 3 3PL 3P Learning Ltd 2/01/2019 1.2 1.21 1.20 1.20 101947
## 4 4CE Force Commodities Ltd 2/01/2019 0.015 0.015 0.014 0.014 483448
## 5 4DS 4DS Memory Ltd 2/01/2019 0.056 0.059 0.054 0.057 9090917
## 6 5GN 5G Networks Ltd 2/01/2019 0.42 0.42 0.42 0.42 3580
## $data
## # A tibble: 2,190 x 13
## # Groups: ticker, name [2,190]
## ticker name date open close volume min_close min_open max_close
## <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 IVV Isha… 2019-01-02 359. 358. 6940 352. 352. 510.
## 2 SPY SPDR… 2019-01-02 355. 353. 211 348. 349. 509.
## 3 IHVV Isha… 2020-06-09 393. 393. 13747 360. 366. 393.
## 4 CSL CSL … 2019-01-02 185. 185. 487741 184 185. 341
## 5 IJH Isha… 2019-01-02 235 235. 232 210. 150. 318.
## 6 VTS Vang… 2019-01-02 182. 180. 3173 179. 179 259.
## 7 COH Coch… 2019-01-02 174. 175. 200503 159. 155. 252.
## 8 GOLD ETFs… 2020-06-09 227. 228. 44846 228. 227. 243.
## 9 MQG Macq… 2019-01-02 108. 107. 497113 72.0 74.5 152.
## 10 ILB Isha… 2019-01-02 118. 119. 240 114 112 132.
## # … with 2,180 more rows, and 4 more variables: total_volume <dbl>,
## # sum_close <dbl>, variation <dbl>, .group <int>
##
## $layers
## $layers[[1]]
## geom_line: na.rm = FALSE, orientation = NA, flipped_aes = FALSE
## stat_identity: na.rm = FALSE
## position_identity
##
##
## $scales
## <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: list
## super: <ggproto object: Class ScalesList, gg>
##
## $mapping
## Aesthetic mapping:
## * `x` -> `date`
## * `y` -> `variation`
## * `colour` -> `name`
##
## $theme
## $theme$axis.text.x
## List of 11
## $ family : NULL
## $ face : NULL
## $ colour : NULL
## $ size : NULL
## $ hjust : num 1
## $ vjust : NULL
## $ angle : num 90
## $ lineheight : NULL
## $ margin : NULL
## $ debug : NULL
## $ inherit.blank: logi FALSE
## - attr(*, "class")= chr [1:2] "element_text" "element"
##
## $theme$plot.title
## List of 11
## $ family : NULL
## $ face : NULL
## $ colour : NULL
## $ size : NULL
## $ hjust : num 0.8
## $ vjust : NULL
## $ angle : NULL
## $ lineheight : NULL
## $ margin : NULL
## $ debug : NULL
## $ inherit.blank: logi FALSE
## - attr(*, "class")= chr [1:2] "element_text" "element"
##
## attr(,"complete")
## [1] FALSE
## attr(,"validate")
## [1] TRUE
##
## $coordinates
## <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
Plot of shares for #N/A
## # A tibble: 6 x 8
## ticker name date open high low close volume
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 14D 1414 Degrees Ltd 2/01/2019 0.31 0.31 0.3 0.3 26036
## 2 3DP Pointerra Ltd 2/01/2019 0.044 0.045 0.043 0.043 854110
## 3 3PL 3P Learning Ltd 2/01/2019 1.2 1.21 1.20 1.20 101947
## 4 4CE Force Commodities Ltd 2/01/2019 0.015 0.015 0.014 0.014 483448
## 5 4DS 4DS Memory Ltd 2/01/2019 0.056 0.059 0.054 0.057 9090917
## 6 5GN 5G Networks Ltd 2/01/2019 0.42 0.42 0.42 0.42 3580
## $data
## # A tibble: 488 x 13
## # Groups: ticker, name [488]
## ticker name date open close volume min_close min_open max_close
## <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 XNT #N/A 2020-06-09 62415. 63937. 0 59514. 59514. 63973.
## 2 XHJ #N/A 2019-01-02 28772. 28768. 0 28768. 28768. 48747.
## 3 XMJ #N/A 2019-01-02 11320. 11138. 0 9695. 9695. 14764.
## 4 XSJ #N/A 2019-01-02 10346. 10276. 0 10207. 10207. 13635
## 5 XEJ #N/A 2019-01-02 9728. 9477. 0 5023. 5023. 12139.
## 6 XGD #N/A 2019-01-02 5465. 5448 0 5141. 5141. 9031.
## 7 XUJ #N/A 2019-01-02 7409. 7341. 0 6363. 6363. 8623.
## 8 XMD #N/A 2019-01-02 6119. 6036. 0 4511. 4511. 7662.
## 9 XNJ #N/A 2019-01-02 5672. 5599. 0 4321. 4321. 7321.
## 10 XXJ #N/A 2019-01-02 6206. 6081. 0 4081. 4081. 7286.
## # … with 478 more rows, and 4 more variables: total_volume <dbl>,
## # sum_close <dbl>, variation <dbl>, .group <int>
##
## $layers
## $layers[[1]]
## geom_line: na.rm = FALSE, orientation = NA, flipped_aes = FALSE
## stat_identity: na.rm = FALSE
## position_identity
##
##
## $scales
## <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: list
## super: <ggproto object: Class ScalesList, gg>
##
## $mapping
## Aesthetic mapping:
## * `x` -> `date`
## * `y` -> `variation`
## * `colour` -> `ticker`
##
## $theme
## $theme$axis.text.x
## List of 11
## $ family : NULL
## $ face : NULL
## $ colour : NULL
## $ size : NULL
## $ hjust : num 1
## $ vjust : NULL
## $ angle : num 90
## $ lineheight : NULL
## $ margin : NULL
## $ debug : NULL
## $ inherit.blank: logi FALSE
## - attr(*, "class")= chr [1:2] "element_text" "element"
##
## $theme$plot.title
## List of 11
## $ family : NULL
## $ face : NULL
## $ colour : NULL
## $ size : NULL
## $ hjust : num 0.8
## $ vjust : NULL
## $ angle : NULL
## $ lineheight : NULL
## $ margin : NULL
## $ debug : NULL
## $ inherit.blank: logi FALSE
## - attr(*, "class")= chr [1:2] "element_text" "element"
##
## attr(,"complete")
## [1] FALSE
## attr(,"validate")
## [1] TRUE
##
## $coordinates
## <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
## $data
## # A tibble: 22 x 14
## # Groups: ticker, name [22]
## ticker name date open close volume min_close min_open max_close
## <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 HCD Hydr… 2020-01-02 0.085 0.086 9.08e4 0.013 0.014 0.086
## 2 TSO Teso… 2020-02-07 0.028 0.028 9.56e6 0.017 0.02 0.09
## 3 BHD Benj… 2020-06-09 0.71 0.71 0. 0.185 0.185 0.71
## 4 EMD Emer… 2020-02-12 0.145 0.145 4.81e6 0.04 0.04 0.145
## 5 LCL Los … 2020-01-21 0.048 0.055 6.97e4 0.016 0.016 0.056
## 6 DCX Disc… 2020-03-26 0.003 0.003 8.60e5 0.003 0.003 0.01
## 7 CBE Cobr… 2020-01-31 0.23 0.25 6.10e6 0.085 0.09 0.28
## 8 TDY Thed… 2020-02-14 0.25 0.245 6.50e5 0.08 0.08 0.26
## 9 HTG Harv… 2020-04-03 0.087 0.09 3.07e5 0.09 0.087 0.285
## 10 EMUCA EMU … 2020-06-09 0.047 0.047 0. 0.015 0.015 0.047
## # … with 12 more rows, and 5 more variables: total_volume <dbl>,
## # sum_close <dbl>, variation <dbl>, max_variation <dbl>, .group <int>
##
## $layers
## $layers[[1]]
## geom_bar: width = NULL, na.rm = FALSE, orientation = NA, flipped_aes = FALSE
## stat_identity: na.rm = FALSE
## position_stack
##
##
## $scales
## <ggproto object: Class ScalesList, gg>
## add: function
## clone: function
## find: function
## get_scales: function
## has_scale: function
## input: function
## n: function
## non_position_scales: function
## scales: list
## super: <ggproto object: Class ScalesList, gg>
##
## $mapping
## Aesthetic mapping:
## * `x` -> `date`
## * `y` -> `max_variation`
## * `colour` -> `name`
##
## $theme
## $theme$axis.text.x
## List of 11
## $ family : NULL
## $ face : NULL
## $ colour : NULL
## $ size : NULL
## $ hjust : num 1
## $ vjust : NULL
## $ angle : num 90
## $ lineheight : NULL
## $ margin : NULL
## $ debug : NULL
## $ inherit.blank: logi FALSE
## - attr(*, "class")= chr [1:2] "element_text" "element"
##
## $theme$plot.title
## List of 11
## $ family : NULL
## $ face : NULL
## $ colour : NULL
## $ size : NULL
## $ hjust : num 0.8
## $ vjust : NULL
## $ angle : NULL
## $ lineheight : NULL
## $ margin : NULL
## $ debug : NULL
## $ inherit.blank: logi FALSE
## - attr(*, "class")= chr [1:2] "element_text" "element"
##
## attr(,"complete")
## [1] FALSE
## attr(,"validate")
## [1] TRUE
##
## $coordinates
## <ggproto object: Class CoordCartesian, Coord, gg>
## aspect: function
## backtransform_range: function
## clip: on
## default: TRUE
## distance: function
## expand: TRUE
## is_free: function
## is_linear: function
## labels: function
## limits: list
## modify_scales: function
## range: function
## render_axis_h: function
## render_axis_v: function
## render_bg: function
## render_fg: function
## setup_data: function
## setup_layout: function
## setup_panel_guides: function
## setup_panel_params: function
## setup_params: function
## train_panel_guides: function
## transform: function
## super: <ggproto object: Class CoordCartesian, Coord, gg>
Create Dataset for predictive part - called “shares”
## # A tibble: 6 x 13
## # Groups: ticker, name [6]
## ticker name date open close volume min_close min_open max_close
## <chr> <chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 HCD Hydr… 2020-01-02 0.085 0.086 9.08e4 0.013 0.014 0.086
## 2 TSO Teso… 2020-02-07 0.028 0.028 9.56e6 0.017 0.02 0.09
## 3 BHD Benj… 2020-06-09 0.71 0.71 0. 0.185 0.185 0.71
## 4 EMD Emer… 2020-02-12 0.145 0.145 4.81e6 0.04 0.04 0.145
## 5 LCL Los … 2020-01-21 0.048 0.055 6.97e4 0.016 0.016 0.056
## 6 DCX Disc… 2020-03-26 0.003 0.003 8.60e5 0.003 0.003 0.01
## # … with 4 more variables: total_volume <dbl>, sum_close <dbl>,
## # variation <dbl>, max_variation <dbl>
Bar chart of top 100 shares that had the maximum variance between its top closing price
and its minimum opening price
Plot of the most productive shares from 01/01/2020 in terms of greatest variance in price rise
start of ransom forrest datasets - did not work 020121 “all models failed”