Thanks for uploading the dataset here is my Exploratory Data analysis for the data.
For visualization purposes I have used the rworldmap and ggplot2 pacakages in R. Please suggest what more I can explore in this data for learning purposes, Thanks again!
Importing Libraries and the data :
## Loading required package: sp
## ### Welcome to rworldmap ###
## For a short introduction type : vignette('rworldmap')
summary(data)
## Country Doses.administered.per.100.people Total.doses.administered
## Length:182 Min. : 0.000 Length:182
## Class :character 1st Qu.: 3.125 Class :character
## Mode :character Median : 16.000 Mode :character
## Mean : 29.056
## 3rd Qu.: 48.750
## Max. :140.000
##
## X..of.population.vaccinated X..of.population.fully.vaccinated
## Min. : 0.00 Min. : 0.0
## 1st Qu.: 2.80 1st Qu.: 0.8
## Median :11.00 Median : 5.2
## Mean :18.57 Mean :11.7
## 3rd Qu.:30.00 3rd Qu.:20.0
## Max. :67.00 Max. :57.0
## NA's :9 NA's :31
By the data summary we can see that the total doses administered is character type, this is because of commas in the numbers. We can handle this using gsub funtion.
data$Total.doses.administered <- as.numeric(gsub(",", "", data$Total.doses.administered))
Also renaming the last 2 columns.
colnames(data)[4] <- "percentage_population_vaccinated"
colnames(data)[5] <- "percentage_population_fully_vaccinated"
Before using the mapCountryData we need to join the data to map using the country names, as we dont have the ISO3 codes in the data. We can do this easily in rwordlmap, using JoinCountryData2Map() method.
First starting with plotting the MAP figures of the available data.
The legend is attached for each plot. Note : Countries for which the map join is not found then it is represented by white colour on the map
par(mai=c(0,0,0.2,0),xaxs="i",yaxs="i")
mapParams <- mapCountryData(joined_map, nameColumnToPlot="Doses.administered.per.100.people",addLegend = FALSE, oceanCol = "lightblue", missingCountryCol = "white", colourPalette = c("red", "yellow", "green"), borderCol = "black")
do.call( addMapLegend, c( mapParams, legendLabels="all",legendWidth=0.5))
par(mai=c(0,0,0.2,0),xaxs="i",yaxs="i")
mapParams <- mapCountryData( joined_map, nameColumnToPlot="Total.doses.administered", addLegend = FALSE, oceanCol = "lightblue", missingCountryCol = "white", colourPalette = c("red", "yellow", "green"), borderCol = "black")
do.call( addMapLegend, c( mapParams, legendLabels="all", legendWidth=0.5))
par(mai=c(0,0,0.2,0),xaxs="i",yaxs="i")
mapParams <- mapCountryData( joined_map, nameColumnToPlot="percentage_population_vaccinated", addLegend = FALSE, oceanCol = "lightblue", missingCountryCol = "white", colourPalette = c("red", "yellow", "green"), borderCol = "black")
do.call( addMapLegend, c( mapParams, legendLabels="all", legendWidth=0.5))
par(mai=c(0,0,0.2,0),xaxs="i",yaxs="i")
mapParams <- mapCountryData( joined_map, nameColumnToPlot="percentage_population_fully_vaccinated", addLegend = FALSE, oceanCol = "lightblue", missingCountryCol = "white", colourPalette = c("red", "yellow", "green"), borderCol = "black")
do.call( addMapLegend, c( mapParams, legendLabels="all", legendWidth=0.5))
As Mapping joins are not available for some countries(due to different name in the data) we can visualise top 10 countries to see if we are missing some valuable info about a country.
Visualising the TOP 10 in each sector :
order_data_100 <- data[order(-data$Doses.administered.per.100.people),]
order_data_100_top10 <- order_data_100[1:10, ]
graph_a <- ggplot(data = order_data_100_top10, aes(reorder(Country, Doses.administered.per.100.people), Doses.administered.per.100.people))
graph_a + geom_bar(stat = "Identity", fill = "steelblue") + coord_flip()+ labs(y = "Doses per 100 people", x = "Country") + ggtitle("Top 10 Countries in doses/100 people")
order_data_total <- data[order(-data$Total.doses.administered),]
order_data_total_top10 <- order_data_total[1:10, ]
graph_b <- ggplot(data = order_data_total_top10, aes(reorder(Country, Total.doses.administered), Total.doses.administered))
graph_b + geom_bar(stat = "Identity", fill = "steelblue") + coord_flip()+ labs(y = "Total Doses", x = "Country") + ggtitle("Top 10 Countries in Total Doses")
order_data_per <- data[order(-data$percentage_population_vaccinated),]
order_data_per_top10 <- order_data_per[1:10, ]
graph_c <- ggplot(data = order_data_per_top10, aes(reorder(Country, percentage_population_vaccinated), percentage_population_vaccinated))
graph_c + geom_bar(stat = "Identity", fill = "steelblue") + coord_flip()+ labs(y = "Percentage vaccinated", x = "Country") + ggtitle("Top 10 Countries in % vaccinated")
order_data_per_fully <- data[order(-data$percentage_population_fully_vaccinated),]
order_data_per_fully_top10 <- order_data_per_fully[1:10, ]
graph_d <- ggplot(data = order_data_per_fully_top10, aes(reorder(Country, percentage_population_fully_vaccinated), percentage_population_fully_vaccinated))
graph_d + geom_bar(stat = "Identity", fill = "steelblue") + coord_flip() + labs(y = "Percentage fully vaccinated", x = "Country") + ggtitle("Top 10 Countries in % fully vaccinated")
From the above graghs we can clearly see that UK is in Top 10 in every sector in the available data.
Now to see underlying realtionships between the data some scatter visualisations :
graph_e <- ggplot(data = data, aes(x = percentage_population_vaccinated,percentage_population_fully_vaccinated))
graph_e + geom_point(color = "darkgreen") + labs(x = "% Population Vaccinated", y = "% Population Fully Vaccinated") + ggtitle("Relationship between % Population Vaccinated & Fully Vaccinated")
graph_f <- ggplot(data = data, aes(x = Doses.administered.per.100.people,percentage_population_fully_vaccinated))
graph_f + geom_point(color = "darkgreen") + labs(x = "Doses per 100 people", y = "% Population Fully Vaccinated") + ggtitle("Relationship between % Population Fully Vaccinated & doses per 100 people")
Thats all for my EDA on the Worldwide Vaccine data. Please comment down if there are more visualisation improvements or methods.
Thank You