COVID in Oceania

Reuters

Introduction

COVID-19 is a virus that has affected all of our lives in the past recent years. The world went on lockdown and rarely were people seen outside. We kept up with the cases and deaths in our countries but what about a region that is located further away from the rest of the globe? Oceania is a continent consisting of 17 countries and territories: Australia, New Zealand, Papua New Guinea, Solomon Islands, Marshall Islands, Cook Islands, Fiji, Vanuatu, New Caledonia, French Polynesia, Samoa, Kiribati, Micronesia, Tonga, Palau, Wallis and Futuna, and Nauru.

The COVID-19 pandemic was officially confirmed to have reached Oceania on January 25th, 2020, with the first confirmed case reported in Melbourne, Australia. The virus later spread to all sovereign states and territories in the continent. In the early years of the pandemic, Australia and New Zealand were highly coveted for their strategies in handling the pandemic in comparison to the West. Both countries have repeatedly wiped out all community transmission of the virus several times even after the virus persisted (Wikipedia).

I decided to choose this particular topic because I’m interested to know how Oceania has done in comparison to the rest of the world when it comes to COVID. A region that is isolated from the rest of the world by a large margin should be inspected because almost everyone has heard of how Australia and NZ have locked down their countries; by allowing zero flights to go in or out of its borders.

The dataset itself comes from a site called Worldometer, and their whole organization is about collecting statistics for worldwide concerns such as water usage, world hunger, world population, govenrment and politics, energy, etc. Their COVID-19 dataset was a global collection, meaning, there were statistics for all 231 countries. However, the data for just Oceania was manually pulled from this dataset by a user on Kaggle named Anandhu H. For my final project, I will be working with a dataset that regards Oceania’s 17 countries and territories. I will be using the following variables: the country, its population, the total cases, total recovered, and the total deaths. I plan to explore the relationship between population and total cases as well as total cases and total deaths, and the relationship between all three: total cases, total recovered, and total deaths.

Cleaning & Working the Dataset

I’m going to begin by calling all of the libraries I plan to use. Tidyverse and Highcharter are the ones we know. Magick, Raster, and Showtext and 3 libraries I discovered recently. Magick and Raster aid in adding a background image to a visualization and Showtext’s purpose is to bring in any desired Google font into R. I follow that by setting my working directory and bringing in my dataset.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(highcharter)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo 
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use

library(magick)

## Linking to ImageMagick 6.9.12.93
## Enabled features: cairo, fontconfig, freetype, heic, lcms, pango, raw, rsvg, webp
## Disabled features: fftw, ghostscript, x11

library(raster)

## Loading required package: sp
## 
## Attaching package: 'raster'
## 
## The following object is masked from 'package:dplyr':
## 
##     select

library(showtext)

## Loading required package: sysfonts
## Loading required package: showtextdb

setwd("/Users/aashkanavale/Desktop/Montgomery College/MC Spring '24/DATA110/PROJECTS/Project 3 - COVID in Oceania")
oceania <- read_csv("oceania_covid.csv")

## Rows: 17 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Country/Other
## dbl (9): Total Cases, Total Deaths, Total Recovered, Active Cases, Tot Cases...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Thankfully, this dataset is already really well put together so there wasn’t much I needed to clean. However, I wanted to make some small changes.

First, I’m renaming all of the variable names, since they all have a space in between words, that might cause us problems later. I also got rid of slashes and made it into one word so it’s easy to use later down the road.

oceania <- oceania |>
  rename(Country = `Country/Other`,
         TotalCases = `Total Cases`,
         TotalDeaths = `Total Deaths`,
         TotalRecovered = `Total Recovered`,
         ActiveCases = `Active Cases`,
         TotCasesPerMilPop = `Tot Cases/ 1M pop`,
         DeathsPerMilPop = `Deaths/ 1M pop`,
         TotalTests = `Total Tests`,
         TestsPerMilPop = `Tests/ 1M pop`)
oceania

## # A tibble: 17 × 10
##    Country   TotalCases TotalDeaths TotalRecovered ActiveCases TotCasesPerMilPop
##    <chr>          <dbl>       <dbl>          <dbl>       <dbl>             <dbl>
##  1 Australia   11211305       17380       11052693      141232            430066
##  2 Cook Isl…       6860           1           6782          77            390416
##  3 Fiji           68771         883          66730        1158             75617
##  4 French P…      77957         649             NA          NA            274338
##  5 Kiribati        3430          13           2703         714             27792
##  6 Marshall…      15554          17          15528           9            258987
##  7 Micrones…      22247          58             NA          NA            189354
##  8 Nauru           4621           1           4609          11            423828
##  9 New Cale…      79724         314          79234         176            274046
## 10 New Zeal…    2138754        3621        2114718       20415            436641
## 11 Palau           5976           9           5965           2            327757
## 12 Papua Ne…      46663         669          43982        2012              5022
## 13 Samoa          15991          29           1605       14357             79070
## 14 Solomon …      24575         153             NA          NA             34077
## 15 Tonga          16487          12          15638         837            153013
## 16 Vanuatu        12014          14          11976          24             37330
## 17 Wallis a…       3427           7            438        2982            312056
## # ℹ 4 more variables: DeathsPerMilPop <dbl>, TotalTests <dbl>,
## #   TestsPerMilPop <dbl>, Population <dbl>

Next, I rearranged the order of the countries by largest population, to smallest, just so it was easier for me to work with and read.

oceania <- oceania |>
  arrange(desc(Population))
oceania

## # A tibble: 17 × 10
##    Country   TotalCases TotalDeaths TotalRecovered ActiveCases TotCasesPerMilPop
##    <chr>          <dbl>       <dbl>          <dbl>       <dbl>             <dbl>
##  1 Australia   11211305       17380       11052693      141232            430066
##  2 Papua Ne…      46663         669          43982        2012              5022
##  3 New Zeal…    2138754        3621        2114718       20415            436641
##  4 Fiji           68771         883          66730        1158             75617
##  5 Solomon …      24575         153             NA          NA             34077
##  6 Vanuatu        12014          14          11976          24             37330
##  7 New Cale…      79724         314          79234         176            274046
##  8 French P…      77957         649             NA          NA            274338
##  9 Samoa          15991          29           1605       14357             79070
## 10 Kiribati        3430          13           2703         714             27792
## 11 Micrones…      22247          58             NA          NA            189354
## 12 Tonga          16487          12          15638         837            153013
## 13 Marshall…      15554          17          15528           9            258987
## 14 Palau           5976           9           5965           2            327757
## 15 Cook Isl…       6860           1           6782          77            390416
## 16 Wallis a…       3427           7            438        2982            312056
## 17 Nauru           4621           1           4609          11            423828
## # ℹ 4 more variables: DeathsPerMilPop <dbl>, TotalTests <dbl>,
## #   TestsPerMilPop <dbl>, Population <dbl>

The last change I made was creating a new variable. I created a new variable that categorized the population of each country into either a “Low” population, “Medium” population, or “High” population.

The “Low” category quantifies as a population less than or equal to 100,000. The “Medium” category quanitifies as a population less than or equal to 1,000,000, but greater than 100,000. And the “High” population quantifies as greater than or equal to 1,000,001.

oceania2 <- oceania |>
  mutate(PopulationLevel = case_when(Population <= 100000 ~ "Low",
                                     Population <= 1000000 ~ "Medium",
                                     Population >= 1000001 ~ "High"))
oceania2

## # A tibble: 17 × 11
##    Country   TotalCases TotalDeaths TotalRecovered ActiveCases TotCasesPerMilPop
##    <chr>          <dbl>       <dbl>          <dbl>       <dbl>             <dbl>
##  1 Australia   11211305       17380       11052693      141232            430066
##  2 Papua Ne…      46663         669          43982        2012              5022
##  3 New Zeal…    2138754        3621        2114718       20415            436641
##  4 Fiji           68771         883          66730        1158             75617
##  5 Solomon …      24575         153             NA          NA             34077
##  6 Vanuatu        12014          14          11976          24             37330
##  7 New Cale…      79724         314          79234         176            274046
##  8 French P…      77957         649             NA          NA            274338
##  9 Samoa          15991          29           1605       14357             79070
## 10 Kiribati        3430          13           2703         714             27792
## 11 Micrones…      22247          58             NA          NA            189354
## 12 Tonga          16487          12          15638         837            153013
## 13 Marshall…      15554          17          15528           9            258987
## 14 Palau           5976           9           5965           2            327757
## 15 Cook Isl…       6860           1           6782          77            390416
## 16 Wallis a…       3427           7            438        2982            312056
## 17 Nauru           4621           1           4609          11            423828
## # ℹ 5 more variables: DeathsPerMilPop <dbl>, TotalTests <dbl>,
## #   TestsPerMilPop <dbl>, Population <dbl>, PopulationLevel <chr>

Now that my dataset is clean, let’s perform a linear regression statistical analysis.

Statistical Analysis: Linear Regression

I first wanted to see a rough plot of what my linear regression would look like before I added to it. I used my most recent dataset, the one with the mutated variable and set my x-axis to the total number of cases and my y-axis to the total number of deaths. I used the geom_point() function to actually plot the points and called my first graph.

regression <- oceania2 |>
  ggplot(aes(x = TotalCases, y = TotalDeaths)) +
  geom_point()
regression

Not terrible. However, the x-axis is in scientific notation, there’s no title, and it’s just overall boring. Let’s change that.

First I wanted to add in an image to the background. I’m doing so by using the following functions. The image_read() function pulls the image from whatever folder it’s in on my computer. The image_fill() function fills the image with a color, and we specified that we don’t want any color by stating “none”. The as.raster() function coerces the image to a raster object.

image <- image_read("/Users/aashkanavale/Desktop/Montgomery College/MC Spring '24/DATA110/PROJECTS/Project 3 - COVID in Oceania/transmap.png")
image <- image_fill(image, 'none')
image <- as.raster(image)

# Source: RDocumentation

I also wanted to change the font of the text to one of my favorite fonts: Spectral. However, since Spectral isn’t built into R as a font, I needed to bring it in from Google. I used the font_add_google() function to pull the Spectral font from the internet and also specified the family. Then I used the showtext_auto() function to officially bring it into R.

font_add_google(name = "Spectral", family = "serif-face") 
showtext_auto()

# Source: Daniel Oehm, Gradient Descending

Because my numbers are difficult to process in scientific notation, I created a function that would allow me to change any scientific notation back to its original form.

comma_format <- function(x) {
  format(x, scientific = FALSE)}

# Source: ChatGPT

I created a color palette that would suit each of the 3 population levels: “Low”, “Medium”, and “High”.

desiredcolors <- c("#26547c", "#8a2a23", "#3c4846")

To begin replotting my linear regression, I set my axes to the same variables and this time, colored by the 3 population levels. Then I used the annotation_raster() function to bring in my image to the background. I plotted my line of best fit using the linear model method and changed its linetype, its color, and its transparency. After that, I plotted my 17 points and changed their size and opacity, since they were too small and bold. I set the colors to my desired colors from my color palette above then I used my above code to change the scientific notation to regular numbers.

I also changed my labels by adding a title, adjusting the axes labels so they weren’t one word, adjusted the legend title, and added the source of my data as a caption. I changed the theme away from the default and finally changed the font to my desired font.

Then I just called my second regression graph.

regression2 <- oceania2 |>
  ggplot(aes(x = TotalCases, y = TotalDeaths, color = PopulationLevel)) +
  annotation_raster(image, -Inf, Inf, -Inf, Inf, interpolate = FALSE) + # Source: RDocumentation
  geom_smooth(method = 'lm', formula = y~x, se = FALSE, linetype = "dotdash", color = "#FF000D", alpha = 0.6) +
  geom_point(size = 3, alpha = 0.8) +
  scale_color_manual(values = desiredcolors) +
  scale_x_continuous(labels = comma_format) + # Source: ChatGPT
  labs(title = "Total COVID Cases vs. Total COVID Deaths in Oceania",
       x = "Total Cases per Country",
       y = "Total Deaths per Country",
       color = "Population Level",
       caption = "Source: Worldometer") +
  theme_classic() +
  theme(text = element_text(family = "serif-face"))
regression2

As we can see, there is an extremely positive trend to the graph, meaning that the higher the cases per country, the more likely that country is to have higher total deaths.

Since we can’t really notice the other 15 countries because of that massive outlier (I believe we can all make an educated guess that it is 100% Australia) and the other highly populated country that’s towards the bottom left of the graph (New Zealand), so I filtered for cases less than 500,000, just so we can get a good look at the rest of the countries and territories.

oceania3 <- oceania2 |>
  filter(TotalCases < 500000)

For this one, I simply copied and pasted the above code and changed the dataset so it would be the one with the filtered cases.

regression3 <- oceania3 |>
  ggplot(aes(x = TotalCases, y = TotalDeaths, color = PopulationLevel)) +
  annotation_raster(image, -Inf, Inf, -Inf, Inf, interpolate = FALSE) + 
  geom_smooth(method = 'lm', formula = y~x, se = FALSE, linetype = "dotdash", color = "#FF000D", alpha = 0.6) +
  geom_point(size = 3, alpha = 0.8) +
  scale_color_manual(values = desiredcolors) +
  scale_x_continuous(labels = comma_format) +
  labs(title = "Total COVID Cases vs. Total COVID Deaths in Oceania",
       x = "Total Cases per Country",
       y = "Total Deaths per Country",
       color = "Population Level",
       caption = "Source: Worldometer") +
  theme_classic() +
  theme(text = element_text(family = "serif-face"))
regression3

That looks a lot clearer. We can notice that there’s a high population country in this mix that doesn’t seem to have quite as many cases or deaths as New Zealand and Australia. This country is in fact, Papua New Guinea. As expected, the low populated countries are all at the lower left corner of the graph and the medium populated countries are scattered around the middle but when we compare it to our regression2 graph, they’re also very low.

To actually figure out what everything means, let’s do a summary.

cor(oceania2$TotalCases, oceania2$TotalDeaths)

## [1] 0.9982372

fit <- lm(TotalDeaths ~ TotalCases, data = oceania2)
summary(fit)

## 
## Call:
## lm(formula = TotalDeaths ~ TotalCases, data = oceania2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -169.29 -160.40 -148.18   35.34  621.20 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.559e+02  6.544e+01   2.382   0.0309 *  
## TotalCases  1.540e-03  2.364e-05  65.141   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 258 on 15 degrees of freedom
## Multiple R-squared:  0.9965, Adjusted R-squared:  0.9962 
## F-statistic:  4243 on 1 and 15 DF,  p-value: < 2.2e-16

cor() stand for correlation. This value is always between -1 and 1. The correlation coefficient is there to let us know how strong or weak the correlation is. Values closer to positive or negative one are always strong correlations, values close to positive or negative 0.5 indicate a weak correlation, and values close to zero have absolutely no correlation.

My value came out to be 0.9982372, which when rounded up, is 1. Since my coefficient is so close to positive 1, my graph has a strong positive correlation.

For a linear regression, we must utilize the equation - y = mx + b.
The equation for my model is: TotalDeaths = 0.00154(TotalCases) + 155.9.

Now, how do we interpret this equation?
0.00154 is the TotalCases estimate as a number rather than scientific notation. 155.9 is the intercept estimate rather than scientific notation. As the total number of cases increases, there is a predicted increase in death rate by 0.00154.

To check if the results are significant, we must identify the p-value. The levels of significance are typically 0.05 or 5%. In this case, my p-value is 2.2 x 10^-16, which is an extremely small number. 0.00000000000000022, to be exact. The p-value is considered very significant to any experiment conducted when investigating correlation, and mine means that there is an extremely small probability of observing my test statistic as extreme as or more extreme than the one actually observed.

Final Visualization 1

For my first final visualization, I wanted to go off of my linear regression model to explore exactly which countries were which without making any educated guesses. To do so, I’m using the highcharter package.

I’m using my dataset without the outliers filtered, so we can see the two outliers as well. The axes and grouping are the same but this time, I also sized the bubbles by population but also divided it by 100 so it wouldn’t be too big.
I set the colors to 3 new colors for the population level and adjusted the size and symbol of my points.
Then I adjusted all of the labels to the same ones as above and adjusted the position of my legend to be at the top.
After that, I used the hc_tooltip() function to add in my desired tooltips, such as the country, the total number of cases, and the total number of deaths.
Finally, I changed the color of my background and changed the font to my beloved “Spectral” and called my first final visualization.

viz1 <- highchart() |>
  
  hc_add_series(data = oceania2,
               type = "scatter",
               hcaes(x = TotalCases, y = TotalDeaths, group = PopulationLevel, size = Population/100)) |>
  
  hc_colors(colors = c("#B6594C", "#37514D", "#DD8E75")) |>
  hc_plotOptions(scatter = list(marker = list(radius = 6, symbol = "circle"))) |>
  
  hc_title(text = "Total COVID-19 Cases vs. Total COVID-19 Deaths in Oceania") |>
  hc_xAxis(title = list(text = "Total Cases per Country")) |>
  hc_yAxis(title = list(text = "Total Deaths per Country")) |>
  hc_caption(text = "Source: Worldometer") |>
  hc_legend(align = "left", verticalAlign = "top", title = list(text = "Population Level"),
            categories = c("Low", "Medium", "High")) |>
  
  hc_tooltip(shared = F, pointFormat = "Country/Territory: {point.Country}<br> Total Cases: {point.x}<br> Total Deaths: {point.y}") |>
  
  hc_chart(backgroundColor = "#FAF6F2") |>
  hc_chart(style = list(fontFamily = "Spectral",
                        fontWeight = "bold"))
viz1

Analysis

As confirmed, Australia has the largest cases to deaths ratio, followed by New Zealand, then Papua New Guinea. However, it looks like Papua New Guinea has a larger population than New Zealand. Interesting. If you click on the “High” population level button, you’ll be able to notice a much clearer layout of the rest of the countries and territories. Obviously, the countries with smaller populations are going to be on the smaller end of this scale. Nauru is the country with the least amount of deaths at only 1 death, and Wallis & Futuna is the country with the least amount of cases at 3427.

Final Visualization 2

For my second visualization, I wanted a stacked bar chart that focused on the relationship between the total cases, total deaths, and total recovered. So I did just that. This time, however, I used my dataset with the outliers filtered, meaning Australia and New Zealand are not in the following visualization. Their numbers are so high that we aren’t able to see the rest of the countries.

I set my chart type to “column”, adjusted all of my labels, changed the default colors, and set the plotOptions() to stacking = “normal”. Then I added each individual series: “TotalCases”, “TotalDeaths”, and “TotalRecovered”. Once again, I changed the background color and font and called my second visualization.

viz2 <- highchart() |>
  
  hc_chart(type = "column") |>
  
  hc_title(text = "COVID-19 Cases in Oceania (Cases v. Deaths v. Recovered)") |>
  hc_xAxis(categories = oceania3$Country) |>
  hc_yAxis(title = list(text = "Total Cases per Country")) |>
  hc_caption(text = "Source: Worldometer") |>
  
  hc_colors(colors = c("#1A4146", "#D3A446", "#B66D2F")) |>
  hc_plotOptions(column = list(stacking = "normal")) |>
  
  hc_add_series(name = "Total Cases", 
                data = oceania3$TotalCases) |>
  hc_add_series(name = "Total Deaths",
                data = oceania3$TotalDeaths) |>
  hc_add_series(name = "Total Recovered", 
                data = oceania3$TotalRecovered) |>
  
  hc_chart(backgroundColor = "#F2EDE0") |>
  hc_chart(style = list(fontFamily = "Spectral",
                        fontWeight = "bold"))
  
viz2

Analysis

One thing I was able to immediately notice is that most of the time, the total recovered is almost proportional to the total cases. Which is an excellent thing. Most people that have been diagnosed with COVID were able to recover. However, that does not seem to be the case for Solomon Islands, Micronesia, French Polynesia, and Samoa. Their total cases are overwhelmingly larger than their deaths (which is good) and their recovered (which is not so good). If we look at just total recovered, it looks like Solomon Islands, French Polynesia, and Micronesia have nearly 0 recovered. It also looks like Wallis and Futuna also has zero but no, their number of total recovered is just really small compared to the others. From this, we can infer than either people are still suffering from COVID, or have sadly died, or the information is simply not available in the dataset. Since I have explored this dataset, I know that it is the third option.

Final Visualization 3

For my last final visualization, I wanted to challenge myself a little and create a bubble chart that shows the intensity of the total number of cases. This particular chart took a very, very long time.

I began by setting the type of the chart to “packedbubble” and the height to around 80%. Anything else messed up the graph too much. Then I added in a background image by using code for a highcharter version. I changed the title, added my tooltips, and added my caption as well. For my plotOptions(), I had to figure out percentages that would make the bubbles look proportionate but not completely go out of the screen. To be honest, I’ve looked at multiple sources to help me understand what that chunk of code means, but I haven’t found one yet. I’ll definitely do some more external research on this code at a more available time. Then I added each and every single country and territory under hc_add_series() and grouped them by their population level. I manually inputted their population and total case count. And for the last time, I changed the position of my legend, changed the font, and called my final visualization.

viz3 <- highchart() |>
  
  hc_chart(type = "packedbubble", height = "80%") |>
  
  hc_chart(backgroundColor = "transparent", 
           divBackgroundImage = "https://i.pinimg.com/736x/ff/6e/dd/ff6eddce9d277c3f34809c3724eef3bf.jpg") |>
  
  hc_title(text = "Total COVID-19 Cases by Country and Population Level in Oceania", align = "left") |>
  hc_tooltip(useHTML = TRUE, pointFormat = "<b>{point.name}</b><br>Population: {point.x}<br>Total Cases: {point.value}") |> # Source: Highcharts Demos
  hc_caption(text = "Source: Worldometer") |>
  
  hc_plotOptions(packedbubble = list(minSize = "30%",
                                     maxSize = "280%",
                                     zMin = 0,
                                     layoutAlgorithm = list(splitSeries = FALSE, gravitationalConstant = 0.02),
                 dataLabels = list(enabled = TRUE,
                                   format = "{point.name}",
                                   filter = list(property = "y", operator = ">", value = 0),
                                   style = list(color = "black", textOutline = "none", fontWeight = "bold")))) |> # Source: ChatGPT
  
  hc_add_series(name = "High", data = list(
    list(name = "Australia", x = 26068792, value = 11211305),
    list(name = "New Zealand", x = 4898203, value = 2138754),
    list(name = "Papua New Guinea", x = 9292169, value = 46663)),
    color = "#D2793B", showInLegend = TRUE) |>
  
  hc_add_series(name = "Medium", data = list(
    list(name = "Fiji", x = 909466, value = 68771),
    list(name = "Solomon Islands", x = 721159, value = 24575),
    list(name = "Vanuatu", x = 321832, value = 12014),
    list(name = "New Caledonia", x = 290915, value = 79724),
    list(name = "French Polynesia", x =284164 , value = 77957),
    list(name = "Samoa", x = 202239, value = 15991),
    list(name = "Kiribati", x = 123419, value = 3430),
    list(name = "Micronesia", x = 117489, value = 22247),
    list(name = "Tonga", x = 107749, value = 16487)),
    color = "#F3BA83", showInLegend = TRUE) |>
  
  hc_add_series(name = "Low", data = list(
    list(name = "Marshall Islands", x = 60057, value = 15554),
    list(name = "Palau", x = 18233, value = 5976),
    list(name = "Cook Islands", x = 17571, value = 6860),
    list(name = "Wallis and Futuna", x = 10982, value = 3427),
    list(name = "Nauru", x = 10903, value = 4621)),
    color = "#628485", showInLegend = TRUE) |> # Source: Highcharts Demos & ChatGPT
  
  hc_legend(align = "left", verticalAlign = "top", title = list(text = "Population Level"),
            categories = c("Low", "Medium", "High")) |>
  
  hc_chart(style = list(fontFamily = "Spectral",
                        fontWeight = "bold"))
viz3

Analysis

The size of the bubbles is quantified by the total number of cases. Each country is placed into the “High”, “Medium”, or “Low” population category. There’s not very much to analyze or infer about this one, it’s pretty much self-explanatory.

However, there were a couple things I couldn’t figure out that I would love to fix. For one, the bubbles are not centered. No matter what I do, they just won’t go to the middle of the graph. It bothers me a lot and I’m sure it bothers some of you as well. Second, the caption is nowhere to be seen. It’s in my code, but it won’t appear in my graph. Third, I would’ve loved to add a little bit more to this. While it is pleasing to the eyes, it doesn’t show much and if I had more time, I would definitely devote my attention to bettering this particular visualization.

Conclusion & Reflection

This dataset had very limited number of observations and variables so it was a little difficult to work with. Finding new things to explore was time-consuming but I was able to complete it, albeit not on time. Each one is similar to another so I would love to hear some feedback on different paths I could’ve explored. After all, staring at a computer screen and hundreds of lines of code for hours doesn’t help the brain refresh and think of something new. A newer set of eyes could possibly help me find what I’d never even think to consider.

Overall, I’m proud of my visualizations. I think I’ve done well but I could’ve done better. I originally had plans to do a Tableau dashboard as my third final visualization but because of personal issues at home, my falling sick with a fever, and the time frame, I wasn’t able achieve that. Regarding the class itself, I’m proud I was able to keep such a high grade and even surprised myself at how much I enjoyed doing the homework and assignments. Next semester, I’m going to start a little bit earlier on my assignments and projects so I’m able to execute the original plans I make.

References

Dataset: https://www.worldometers.info/coronavirus/#countries

Background Information: https://en.wikipedia.org/wiki/COVID-19_pandemic_in_Oceania#:~:text=The%20COVID%2D19%20pandemic%20was,and%20territories%20in%20the%20region.

Coding:
https://www.rdocumentation.org/packages/magick/versions/2.8.3/topics/image_ggplot
https://gradientdescending.com/adding-custom-fonts-to-ggplot-in-r/
https://www.highcharts.com/demo/highcharts/packed-bubble
ChatGPT

DATA110 Final Project

Aashka Navale

2024-04-30

COVID in Oceania

Introduction

Cleaning & Working the Dataset

Statistical Analysis: Linear Regression

Final Visualization 1

Analysis

Final Visualization 2

Analysis

Final Visualization 3

Analysis

Conclusion & Reflection

References