Working directory in R is the default location on your computer where R reads and writes files.
Setting the working directory ensures that your files are saved and loaded from the intended locations, streamlining your workflow
setwd("~/TMMS2024")
library(tidyverse) ##for data manipulation
library(psych) ##for description of data
library(summarytools) ##for summarizing data
library(sjmisc) ##for data manipulation
malaria_data <- read.csv("mockdata_cases.csv")
mosquito_data <- read.csv("mosq_mock.csv")
Before we start visualizing our data, we need to understand the characteristics of our data. The goal is to get an idea of the data structure and to understand the relationships between variables.
dim(malaria_data) ##Dimension of the data set
[1] 514 10
str(malaria_data) ##structure of the data set
'data.frame': 514 obs. of 10 variables:
$ location : chr "mordor" "mordor" "mordor" "mordor" ...
$ month : int 1 9 1 6 4 7 2 5 12 5 ...
$ year : int 2020 2019 2019 2018 2019 2020 2019 2020 2019 2020 ...
$ ages : chr "15_above" "5_to_14" "15_above" "5_to_14" ...
$ total : int 22 34 36 32 36 34 38 31 53 57 ...
$ positive : int 4 6 7 7 8 8 8 8 9 9 ...
$ xcoord : num -20.3 -20.5 -19.7 -19.8 -20.1 ...
$ ycoord : num 29.4 29.4 30.5 30.1 30.6 ...
$ prev : num 0.182 0.176 0.194 0.219 0.222 ...
$ time_order_loc: int 25 21 13 6 16 31 14 29 24 29 ...
head(malaria_data) ##View the first few rows
location month year ages total positive xcoord ycoord prev
1 mordor 1 2020 15_above 22 4 -20.25495 29.38997 0.1818182
2 mordor 9 2019 5_to_14 34 6 -20.47055 29.43265 0.1764706
3 mordor 1 2019 15_above 36 7 -19.70804 30.49507 0.1944444
4 mordor 6 2018 5_to_14 32 7 -19.84918 30.14912 0.2187500
5 mordor 4 2019 15_above 36 8 -20.14803 30.60721 0.2222222
6 mordor 7 2020 15_above 34 8 -19.31510 29.71290 0.2352941
time_order_loc
1 25
2 21
3 13
4 6
5 16
6 31
summary(malaria_data) ##summary of descriptive statistics
location month year ages
Length:514 Min. : 1.000 Min. :2018 Length:514
Class :character 1st Qu.: 4.000 1st Qu.:2018 Class :character
Mode :character Median : 7.000 Median :2019 Mode :character
Mean : 6.486 Mean :2019
3rd Qu.: 9.000 3rd Qu.:2020
Max. :12.000 Max. :2020
total positive xcoord ycoord
Min. : 20.0 Min. : 0.00 Min. :-21.84 Min. :28.52
1st Qu.: 46.0 1st Qu.: 14.00 1st Qu.:-20.39 1st Qu.:29.64
Median :103.0 Median : 33.00 Median :-20.06 Median :29.99
Mean :141.5 Mean : 47.81 Mean :-20.04 Mean :30.00
3rd Qu.:206.0 3rd Qu.: 67.00 3rd Qu.:-19.71 3rd Qu.:30.32
Max. :611.0 Max. :264.00 Max. :-18.79 Max. :31.81
prev time_order_loc
Min. :-0.04545 Min. : 1.00
1st Qu.: 0.24615 1st Qu.: 9.00
Median : 0.33016 Median :18.00
Mean : 0.31518 Mean :17.65
3rd Qu.: 0.39024 3rd Qu.:26.00
Max. : 0.53488 Max. :35.00
describe(malaria_data) ##summary of descriptive statistics
vars n mean sd median trimmed mad min max
location* 1 514 3.00 1.43 3.00 3.00 1.48 1.00 5.00
month 2 514 6.49 3.37 7.00 6.50 4.45 1.00 12.00
year 3 514 2018.91 0.78 2019.00 2018.89 1.48 2018.00 2020.00
ages* 4 514 2.00 0.82 2.00 2.00 1.48 1.00 3.00
total 5 514 141.46 118.83 103.00 122.87 96.37 20.00 611.00
positive 6 514 47.81 46.52 33.00 39.75 34.10 0.00 264.00
xcoord 7 514 -20.04 0.49 -20.06 -20.05 0.50 -21.84 -18.79
ycoord 8 514 30.00 0.51 29.99 29.99 0.51 28.52 31.81
prev 9 514 0.32 0.10 0.33 0.32 0.11 -0.05 0.53
time_order_loc 10 514 17.65 9.93 18.00 17.63 13.34 1.00 35.00
range skew kurtosis se
location* 4.00 0.00 -1.34 0.06
month 11.00 -0.03 -1.18 0.15
year 2.00 0.15 -1.33 0.03
ages* 2.00 0.01 -1.50 0.04
total 591.00 1.26 1.11 5.24
positive 264.00 1.75 3.48 2.05
xcoord 3.05 0.04 -0.26 0.02
ycoord 3.29 0.11 0.09 0.02
prev 0.58 -0.51 -0.15 0.00
time_order_loc 34.00 0.01 -1.20 0.44
#malaria_data$location ##values for a single column (locations)
#malaria_data$month ##values for a single column (month)
#malaria_data$year ##values for a single column (year)
unique(malaria_data$location) ## unique values for a single column
[1] "mordor" "narnia" "neverwhere" "oz" "wonderland"
table(malaria_data$location) ## frequencies for a single column (location)
mordor narnia neverwhere oz wonderland
105 104 96 104 105
frq(malaria_data$location)
x <character>
# total N=514 valid N=514 mean=3.00 sd=1.43
Value | N | Raw % | Valid % | Cum. %
-------------------------------------------
mordor | 105 | 20.43 | 20.43 | 20.43
narnia | 104 | 20.23 | 20.23 | 40.66
neverwhere | 96 | 18.68 | 18.68 | 59.34
oz | 104 | 20.23 | 20.23 | 79.57
wonderland | 105 | 20.43 | 20.43 | 100.00
<NA> | 0 | 0.00 | <NA> | <NA>
#prop.table(table(malaria_data$location)) # percentages for a single column
table(malaria_data$location, malaria_data$year) # frequencies for multiple columns
2018 2019 2020
mordor 36 36 33
narnia 36 36 32
neverwhere 36 57 3
oz 35 36 33
wonderland 36 36 33
sum(is.na(malaria_data)) ##checking for missing values in the data set
[1] 0
library(kableExtra)
missing_values <- colSums(is.na(malaria_data))
kable(missing_values)
| x | |
|---|---|
| location | 0 |
| month | 0 |
| year | 0 |
| ages | 0 |
| total | 0 |
| positive | 0 |
| xcoord | 0 |
| ycoord | 0 |
| prev | 0 |
| time_order_loc | 0 |
What are the dimensions of the dataset?
What are the column names?
What are the column types?
#sapply(mosquito_data, class)
What are some key variables or relationships that we can explore?
First, we will look at some exploratory data visualization techniques using base R functions. The purpose of these plots is to help us understand the relationships between variables and characteristics of our data. They are useful for quickly exploring the data and understanding the relationships, but they are not great for sharing in scientific publications/presentations.
R includes powerful packages of graphics that help in data visualization. These graphics can be: Viewed on the screen, saved in various formats such as ,pdf, .png, .jpg etc and customized according to the varied graphic needs.
R supports 8 types of graphics: Bar charts, Pie Charts, Histogram, line charts, box plot, Kernel density plot, Heat map, word cloud.
#1. Histogram For one variable comparison, we can use hist() function to create a histogram. Histogram displays the distribution of continuous variable and the frequency of scores in each bin on y-axix by dividing the ranges of scores into bins on the x-axis
hist(malaria_data$prev)
hist(malaria_data$prev, col = "tomato")
hist(malaria_data$prev,
breaks = 5,
col = "tomato",
border = "darkblue",
main = "Distribution of Malaria Prevalence",
xlab = "Malaria Prevalence",
ylab = "Frequency")
#2. Kernel Density Plots These display the distribution of a continuous
variable much more efficiently than histogram
The density plot is useful for visualizing changes in distributions of a continuous variable
density <- density(malaria_data$prev)
plot(density)
polygon(density, col="red",border="black")
#Save as PDF in Rstudio
pdf("Density Plots.pdf", width = 8, height = 6) # Open PDF device
plot(density) # Create the plot
polygon(density, col="red",border="black")
dev.off() # Close device & finalize the file
png
2
#3. Line chart Line charts represent a series of data point connected by a straight line and are generally used to visualize data that changes over time.
plot(malaria_data$total, malaria_data$positive, type ="l")
#4. Barplot Another useful function for single variable comparisons is barplot(). In this case, we will use the table() function to count the number of observations in each category, then use barplot() to create a barplot.
Barcharts are horizontal or vertical bars to show comparisons between categorical values. They represent lengths, frequency or proportions of categorical values
counts <- table(malaria_data$year)
barplot(counts)
counts <- table(malaria_data$year)
counts
2018 2019 2020
179 201 134
barplot(counts,
col=c("blue", "red", "green"),
main= "Simple Bar chart",
xlab= "Year",
ylab="Frequency")
#Save as PNG in Rstudio
png("Simple Bar chart.png") # Open PNG device
barplot(counts, # Create the plot
col=c("red", "blue", "green"),
main= "Simple Bar chart",
xlab= "Year",
ylab="Frequency")
dev.off() # Close device
png
2
#Save as PDFin Rstudio
pdf("Simple Bar chart.pdf") # Open PDF device
barplot(counts, # Create the plot
col=c("red", "blue", "green"),
main= "Simple Bar chart",
xlab= "Year",
ylab="Frequency")
dev.off() # Close device
png
2
It is a type of graph in which a circle is divided into sectors, each representing a proportion of the whole.
df <- prop.table(table(malaria_data$location))
p1 <- pie(df,
labels=paste(names(df),"(", round(df*100,1),"%)"),col=c("green","tomato","blue","purple","black"),
main="Malaria cases Distribution by Location")
library(plotrix)
df <- prop.table(table(malaria_data$location))
p2 <- pie3D(df,
labels=paste(names(df),"(", round(df*100,1),"%)"),col=c("green","tomato","blue","purple","lightblue"),
main="Malaria cases Distribution by Location")
library(plotly)
df <- as.data.frame(table(malaria_data$location))
colnames(df) <- c("location", "freq")
fig <- plot_ly(df,
labels = ~df$location,
values = ~df$freq,
type = 'pie',
hole = 0.3, # Makes a doughnut chart; set to 0 for a full pie chart
textinfo = 'label+percent', # Shows both labels and percentages
marker = list(colors = colorRampPalette(c('blue', 'green', 'tomato', 'skyblue'))(nrow(df))))
fig
For multiple variables, we can use plot() function to create a scatterplot, multiple bar chart, boxplot etc.
In this case, we will use plot() to create a scatterplot. The first argument in plot() is the x variable, and the second argument is the y variable.
plot(malaria_data$total, malaria_data$positive)
plot(malaria_data$total, malaria_data$positive, col="red")
#Save as JPEG in Rstudio
jpeg("Scatter plot.jpg") # Open JPEG device
plot(malaria_data$total, malaria_data$positive, col="red") # Create the plot
dev.off() # Close device
png
2
counts <- table(malaria_data$year,malaria_data$ages)
barplot(counts,
col=c("red", "blue", "green"),
main= "Multiple Bar chart",
xlab= "Age Group",
ylab="Frequency",
legend=rownames(counts), beside=T)
# Boxplot
A boxplot, also known as a box-and-whisker plot, is a standardized way of displaying the distribution of data based on a five-number summary: minimum, first quartile (Q1), median, third quartile (Q3), and maximum.
Provide a clear and concise visual summary of the data distribution, showing central tendency, variability, and symmetry or skewness.
They are excellent tools for comparing distributions across different groups or categories, allowing quick visual comparisons.
Identifying Outliers: Boxplots help identify outliers in the data, which can be crucial for understanding data quality and distribution.
Understanding Spread: They reveal the spread and range of the data, indicating the variability within the dataset
boxplot(malaria_data$prev ~ malaria_data$location)
boxplot(malaria_data$prev ~ malaria_data$location, col="tomato")
#Save as TIFF in Rstudio
tiff("Box-and-Whisker plot.tiff", width = 800, height = 600) # Open TIFF device
boxplot(malaria_data$prev ~ malaria_data$location, col="tomato") # create the plot
dev.off () # Close device
png
2
Are their any interesting patterns in individual variables/columns? Are there any relationships between variables/columns? # ————————————————————————– # Data Visualization with ggplot2 # ————————————————————————– Base R functions like hist() and barplot() are great for quickly exploring our data, but we may want to use more powerful visualization techniques when preparing outputs for scientific reports, presentations, and publications.
The ggplot2 package is a popular visualization package for R. It provides an easy-to-use interface for creating data visualizations. The ggplot2 package is based on the “grammar of graphics” and is a powerful way to create complex visualizations that are useful for creating scientific and publication-quality figures.
The “grammar of graphics” used in ggplot2 is a set of rules that are used to develop data visualizations using a layering approach. Layers are added using the ‘+’ operator.
There are three main components of a ggplot: 1. The data: the dataset we want to visualize 2. The aesthetics: the visual properties from the data used in the plot 3. The geometries: the visual representations of the data (e.g., points, lines, bars)
All ggplot2 plots require a data frame as input. Just running this line will produce a blank plot because we have stated which elements from the data we want to visualize or how we want to visualize them.
ggplot(data = malaria_data)
Next, we need to specify the visual properties of the plot that are determined by the data. The aesthetics are specified using the aes() function. The output should now produce a blank plot but with determined visual properties (e.g., axes labels).
ggplot(data = malaria_data, aes(x = total, y = positive))
Finally, we need to specify the visual representation of the data. The geometries are specified using the geom_* function. There are many different types of geometries that can be used in ggplot2. We will use geom_point() in this example and we will append it to the previous plot using the + operator. The output should now produce a plot with the specified visual representation of the data.
library(ggthemes)
ggplot(data = malaria_data, aes(x = total, y = positive)) +
geom_point()
ggplot(data = malaria_data, aes(x = total, y = positive)) +
geom_point(colour = "tomato")
library(ggpubr)
ggplot(data = malaria_data, aes(x = total, y = positive)) +
geom_point(colour = "tomato") +
theme_classic()
ggplot(data = malaria_data, aes(x = total, y = positive)) +
geom_point(colour = "tomato") +
theme_economist()
ggplot(data = malaria_data, aes(x = total, y = positive)) +
geom_point(colour = "tomato") +
theme_pubclean()
ggplot(data = malaria_data, aes(x = total, y = positive, color = location)) +
geom_point() +
theme_economist()
ggplot(data = malaria_data, aes(x = total, y = positive)) +
geom_point() +
geom_smooth(method = "lm") # The smooth geom add a smoothed line to the plot
ggplot(data = malaria_data, aes(x = total, y = positive, color = year)) +
geom_point() +
facet_wrap(~year) +
theme_bw()
ggplot(data = malaria_data, aes(x = total, y = positive, color = location)) +
geom_point() +
facet_wrap(~year) +
theme_bw()
ggplot(data = malaria_data, aes(x = total, y = positive, color = location)) +
geom_point() +
geom_smooth(method = "lm") + # The smooth geom add a smoothed line to the plot
theme_classic()
ggplot(data = malaria_data, aes(x = total, y = positive, color = location)) +
geom_point() +
stat_ellipse() +
theme_classic()
ggplot(data = malaria_data, aes(x= prev))+
geom_histogram(bins = 5, fill = "tomato", color = "blue") +
theme_economist()
ggplot(data = malaria_data, aes(x= prev))+
geom_histogram(bins = 5, fill = "tomato", color = "blue") +
theme_classic()
#Here are some examples of different geom functions:
data <- data.frame(Category = c(“A”, “B”, “C”, “A”, “B”, “C”, “A”, “A”, “B”, “C”))
ggplot(data = malaria_data, aes(x = year)) +
geom_bar(fill = "blue") + # the "fill" argument specifies the color of the bars
theme_pubclean()
ggplot(data = malaria_data, aes(x = year)) +
geom_bar(fill = "tomato") +
labs(title="Simple Bar Plot",
x="Year",
y="Frequency",
caption = "Source: Malaria data") +
theme_classic()
When using the aes() function, the visual properties will be determined by a variable in the dataset. This allows us to visualize relationships between multiple variables at the same time.
# Compute the frequency
df <- malaria_data %>%
group_by(year) %>%
summarise(counts = n())
# Create the bar plot
ggplot(df, aes(x = year, y = counts)) +
geom_bar(fill = "tomato", stat = "identity") +
geom_text(aes(label = counts), vjust = -0.3) +
labs(title="Simple Bar Plot",
x="Year",
y="Frequency",
caption = "Source: Malaria data") +
theme_classic()
#frq(malaria_data, ages)
ggplot(data = malaria_data, aes(x = ages, y=positive, fill = ages)) +
geom_bar(stat = "identity") +
labs(title="Multiple Bar Plot",
x="Age Group",
y="Malaria Positive Cases Reported",
caption = "Source: Malaria data") +
theme_classic()
# Create multiple bar plots using facet_wrap()
ggplot(data = malaria_data, aes(x = ages, y=positive, fill = ages)) +
geom_bar(stat = "identity") +
facet_wrap(~year) +
labs(title="Multiple Bar Plot by year",
x="Age Group",
y="Malaria Positive Cases Reported",
caption = "Source: Malaria data") +
theme_classic()
ggplot(data = malaria_data, aes(x= prev, fill = ages))+
geom_histogram(bins = 10) +
theme_classic()
ggplot(data = malaria_data, aes(x= prev, fill = ages))+
geom_histogram(bins = 10, color ="black") +
labs(title = "Multiple Histogram by Age group",
x="Malaria Prevalance Rate",
y="Numver of Malaria case",
caption = "Source: Malaria data") +
theme_classic()
ggplot(data = malaria_data, aes(x = prev, fill = ages)) +
geom_histogram(color = "black") +
theme_classic()
# Density ridgeline plot The density ridgeline plot is useful for
visualizing changes in distributions of a continuous variable, over time
or space. Ridgeline plots are partially overlapping line plots that
create the impression of a mountain range.
library(ggridges)
ggplot(malaria_data, aes(x = prev, y = ages, fill = ages)) +
geom_density_ridges()
Are their any interesting patterns in individual variables/columns? How can we use the aes() function to view multiple variables in a single plot? Are there any additional geometries that may be useful for visualizing this dataset?
#The examples above show how to use colors for categorical variables, but we can also use custom color palettes for continuous variables.
ggplot(data = malaria_data, aes(x = total, y = positive, color = prev)) +
geom_point() +
scale_color_gradient(low = "blue", high = "red")
ggplot(data = malaria_data, aes(x = total, y = positive, color = prev)) +
geom_point() +
# use viridis package to create custom color palettes
scale_color_viridis_c(option = "magma")
ggplot(data = malaria_data, aes(x = location, y = prev)) +
geom_boxplot(fill = "lightblue") +
geom_jitter(alpha = 0.2) +
theme_classic()
ggplot(data = malaria_data, aes(x = location, y = prev, fill = location)) +
geom_boxplot() +
geom_jitter(alpha = 0.2, aes(color = location)) +
theme_classic()
library(ggpubr)
p <- ggplot(data = malaria_data, aes(x = location, y = prev, fill = location)) +
geom_boxplot() +
geom_jitter(alpha = 0.2, aes(color = location))
# theme_classic()
# Add p-value
p + stat_compare_means()
# Violin plots
Violin plots are similar to box plots, except that they also show the kernel probability density of the data at different values. Typically, violin plots will include a marker for the median of the data and a box indicating the interquartile range, as in standard box plots.
Key function:
geom_violin(): Creates violin plots. Key arguments: fill: Areas fill color Create basic violin plots with summary statistics:
ggplot(data = malaria_data, aes(x = location, y = prev)) +
geom_violin(fill = "tomato") +
geom_jitter(alpha = 0.2) +
theme_classic()
sinaplot is inspired by the strip chart and the violin plot. By letting the normalized density of points restrict the jitter along the x-axis, the plot displays the same contour as a violin plot, but resemble a simple strip chart for small number of data points (Sidiropoulos et al. 2015).
In this way the plot conveys information of both the number of data points, the density distribution, outliers and spread in a very simple, comprehensible and condensed format.
Key function: geom_sina() [ggforce]:
library(ggforce)
# Create some data
d1 <- data.frame(
MalariaPositiveCases = c(rnorm(200, 4, 1), rnorm(200, 5, 2), rnorm(400, 6, 1.5)),
ageGroup = rep(c("under 5 years", "5 to 15 years", "Above 15 years"), c(200, 200, 400)))
# Sinaplot
ggplot(d1, aes(ageGroup, MalariaPositiveCases)) +
geom_sina(aes(color = ageGroup), size = 0.7)+
scale_color_manual(values = c("red", "blue", "green"))
#Correlation matrix with ggally package
library(GGally)
# correlation analysis
data(malaria_data)
ggpairs(malaria_data, columns = 5:6, ggplot2::aes(colour=location))
library(ggplot2)
library(ggExtra)
g <- ggplot(malaria_data, aes(total, positive)) +
geom_count(color = "red") +
geom_smooth(method="lm", se=F)
ggMarginal(g, type = "histogram", fill="blue")
ggMarginal(g, type = "boxplot", fill="transparent")
ggMarginal(g, type = "density", fill="blue")
library(leaflet)
# Sample data
attach(malaria_data)
# Create an interactive map with leaflet
leaflet(malaria_data) %>%
addTiles() %>%
addCircleMarkers(
~xcoord, ~ycoord,
radius = ~prev *10,
color = "red",
stroke = FALSE,
fillOpacity = 0.5,
label = ~paste0(location, ": ", prev)) %>%
addLegend("bottomright",
colors = "red",
labels = "Malaria Prevalence",
title = "Prevalence")
data <- read.csv("malaria_survey_data1.csv")
### Code the Counties and Give their Appropriate Name
data$County <- factor(data$County, levels = c(101, 201, 202, 203, 204, 205, 301, 302, 303, 304, 305, 306,
401, 402, 403, 404, 405, 406, 407, 408, 501, 502, 503, 601, 602,
603, 604, 605, 606, 701, 702, 703, 704, 705, 706, 707, 708, 709,
710, 711, 712, 713, 714, 801, 802, 803, 804),
labels = c("nairobi", "nyandarua", "nyeri", "kirinyaga", "muranga", "kiambu",
"mombasa", "kwale", "kilifi", "tana river", "lamu", "taita taveta",
"marsabit", "isiolo", "meru", "tharaka", "embu", "kitui", "machakos",
"makueni", "garissa", "wajir", "mandera", "siaya", "kisumu", "migori",
"homa bay", "kisii", "nyamira", "turkana", "west pokot", "samburu",
"trans-nzoia", "baringo", "uasin gishu", "elgeyo marakwet", "nandi",
"laikipia", "nakuru", "narok", "kajiado", "kericho", "bomet", "kakamega",
"vihiga", "bungoma", "busia"))
gps <- read.csv("longitude_latitude.csv")
## Merge the data set
data <- merge(gps, data, by = "County", all.x = TRUE)
library(leaflet)
# Create an interactive map with leaflet
leaflet(data) %>%
addTiles() %>%
addCircleMarkers(
~Longitude, ~Latitude,
radius = ~Final_Malaria_Test_Results*3,
color = "red",
stroke = FALSE,
fillOpacity = 0.5,
label = ~paste0(County, ": ", Final_Malaria_Test_Results)) %>%
addLegend("bottomright",
colors = "red",
labels = "Malaria Results",
title = "Positive Malaria Test Results")
library(paletteer)
library(leaflet)
# Sample data
africa <- read.csv("DatasetAfricaMalaria.csv", header = TRUE)
# Create an interactive map with leaflet
leaflet(africa) %>%
addTiles() %>%
addCircleMarkers(
~longitude, ~latitude,
radius = ~Incidence.of.malaria..per.1.000.population.at.risk.*0.01,
color = "red",
stroke = FALSE,
fillOpacity = 0.5,
label = ~paste0(Country.Name, ": ", Incidence.of.malaria..per.1.000.population.at.risk.)) %>%
addLegend("bottomright",
colors = "red",
labels = "Malaria Prevalence",
title = "Prevalence in Africa")
dat<-data.frame(t=seq(0,2*pi,by=0.1))
xhrt<-function(t)16*sin(t)^3
yhrt<-function(t)13*cos(t)-5*cos(2*t)-2*cos(3*t)-cos(4*t)
dat$y=yhrt(dat$t)
dat$x=xhrt(dat$t)
with(dat,plot(x,y,type="l"))
with(dat,polygon(x,y,col="red"))
# Load necessary libraries
library(ggplot2)
library(paletteer)
# Load the data
df <- read.csv("https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/simple-scatterplot.csv")
# Create the ggplot
ggplot(df, aes(x = gdpPercap, y = lifeExp, size = pop, color = continent)) +
geom_point(size=3) +
scale_fill_paletteer_d("nationalparkcolors::Acadia") +
theme(legend.position = "none")