In this project, we were assigned to create a multivariable graph. We used the “data on infant mortality” from the following dataset: http://vincentarelbundock.github.io/Rdatasets/datasets.html.
With 105 observations, this data frame contains the following variables:
1- country:
2- income: Per-capita income in U. S. dollars.
3- infant: Infant-mortality rate per 1000 live births.
4- region: Africa; Americas; Asia, Asia and Oceania; Europe.
5- oil: Oil-exporting country. A factor with levels: no, yes.
The observations are the nations of the world around 1970. We will visualize the possible factors of the infant-mortality rate. Otherwise, We will learn how variables income and region could affect the infant mortality rate.
library(readr)
## Warning: package 'readr' was built under R version 3.6.1
library(ggplot2)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.1
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(plotly)
## Warning: package 'plotly' was built under R version 3.6.1
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(RColorBrewer)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.1
## -- Attaching packages ------------------------------------------- tidyverse 1.2.1 --
## v tibble 2.1.3 v purrr 0.3.2
## v tidyr 0.8.3 v stringr 1.4.0
## v tibble 2.1.3 v forcats 0.4.0
## Warning: package 'tibble' was built under R version 3.6.1
## Warning: package 'tidyr' was built under R version 3.6.1
## Warning: package 'purrr' was built under R version 3.6.1
## Warning: package 'stringr' was built under R version 3.6.1
## Warning: package 'forcats' was built under R version 3.6.1
## -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
## x plotly::filter() masks dplyr::filter(), stats::filter()
## x dplyr::lag() masks stats::lag()
library(highcharter)
## Warning: package 'highcharter' was built under R version 3.6.1
## Registered S3 method overwritten by 'xts':
## method from
## as.zoo.xts zoo
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## Highcharts (www.highcharts.com) is a Highsoft software product which is
## not free for commercial and Governmental use
infant_mortality <- read.csv("infant_mortality.csv")
dim(infant_mortality)
## [1] 105 5
str(infant_mortality)
## 'data.frame': 105 obs. of 5 variables:
## $ X : Factor w/ 105 levels "Afganistan","Algeria",..: 4 5 7 15 23 29 30 101 41 43 ...
## $ income: int 3426 3350 3346 4751 5029 3312 3403 5040 2009 2298 ...
## $ infant: num 26.7 23.7 17 16.8 13.5 10.1 12.9 20.4 17.8 25.7 ...
## $ region: Factor w/ 4 levels "Africa","Americas",..: 3 4 4 2 4 4 4 4 4 4 ...
## $ oil : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
At this step, we are going to remove all the missing values by using complete.cases() function. We will call the new data frame “infant_mortality1”
infant_mortality1 <- infant_mortality[complete.cases(infant_mortality),]
dim(infant_mortality1)
## [1] 101 5
Now, We have 101 observations and 27 variables.
g<- ggplot(data = infant_mortality1, mapping = aes(x= income, y = infant, color = region))
class(g)
## [1] "gg" "ggplot"
g + geom_point() +
labs(title = "Infant Mortality Rate vs Income Per Capita by Region in 1970",
x = "Per-capita Income in u.s. dollars", y = "Infant Mortality Rate", color = "Region")
###Analysis The result shows that countries with a low income per capita experienced a high rate of infant mortality. For example, the visualization indicates that the infant mortality rate is high in Africa followed by Asia and Americas. I also found that the dataset was limited because it did not really indicate whether the income per capita values were annual, median or mean.However, generally, the per-capita income in U.S. dollars is annual. According to Current U.S. Statistics and Trends, in the U.S, the 2017 nominal median income per capita was $31,786 and the mean income per capita was $48,150. I think these values are much higher than 1970 because, nowadays, there are more wealthy individuals in the world.
The visualization showed that the infant mortality rate can be affected by counrties where the per-capita income in U. S. dollars is low. That is what happened in 1970. In my opinion, we can do the same visualization with a current data frame to figure out whether the tend has changed in the world.