This data was collected on Slack from the student: FOMBA KASSOH.
df3 = read.csv("https://raw.githubusercontent.com/Kossi-Akplaka/Data607-data_acquisition_and_management/main/Project%202/Data3-USA-Development-Indicators.csv")
kable(head(df3)) | Country.Name | Country.Code | Series.Name | Series.Code | X1990..YR1990. | X2000..YR2000. | X2013..YR2013. | X2014..YR2014. | X2015..YR2015. | X2016..YR2016. | X2017..YR2017. | X2018..YR2018. | X2019..YR2019. | X2020..YR2020. | X2021..YR2021. | X2022..YR2022. |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| United States | USA | Population, total | SP.POP.TOTL | 249623000 | 282162411 | 316059947 | 318386329 | 320738994 | 323071755 | 325122128 | 326838199 | 328329953 | 331511512 | 332031554 | 333287557 |
| United States | USA | Population growth (annual %) | SP.POP.GROW | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 |
| United States | USA | Surface area (sq. km) | AG.SRF.TOTL.K2 | 9629090 | 9632030 | 9831510 | 9831510 | 9831510 | 9831510 | 9831510 | 9831510 | 9831510 | 9831510 | 9831510 | .. |
| United States | USA | Population density (people per sq. km of land area) | EN.POP.DNST | 27 | 31 | 35 | 35 | 35 | 35 | 36 | 36 | 36 | 36 | 36 | .. |
| United States | USA | Poverty headcount ratio at national poverty lines (% of population) | SI.POV.NAHC | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. | .. |
| United States | USA | Poverty headcount ratio at $2.15 a day (2017 PPP) (% of population) | SI.POV.DDAY | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | .. |
Remove columns 56 to 60
We can transform the data from wide to long
| Country.Name | Country.Code | Series.Name | Series.Code | Year | Count |
|---|---|---|---|---|---|
| United States | USA | Population, total | SP.POP.TOTL | X1990..YR1990. | 249623000 |
| United States | USA | Population growth (annual %) | SP.POP.GROW | X1990..YR1990. | 1 |
| United States | USA | Surface area (sq. km) | AG.SRF.TOTL.K2 | X1990..YR1990. | 9629090 |
| United States | USA | Population density (people per sq. km of land area) | EN.POP.DNST | X1990..YR1990. | 27 |
| United States | USA | Poverty headcount ratio at national poverty lines (% of population) | SI.POV.NAHC | X1990..YR1990. | .. |
| United States | USA | Poverty headcount ratio at $2.15 a day (2017 PPP) (% of population) | SI.POV.DDAY | X1990..YR1990. | 1 |
Now we can tidy the column Year
df3_long$Year <- as.integer(sub("X(\\d+)\\.\\.YR\\d+\\.", "\\1", df3_long$Year))
head(df3_long$Year)## [1] 1990 1990 1990 1990 1990 1990
Let’s create another data frame with the total population of USA
| Country.Name | Country.Code | Series.Name | Series.Code | Year | Count |
|---|---|---|---|---|---|
| United States | USA | Population, total | SP.POP.TOTL | 1990 | 249623000 |
| United States | USA | Population, total | SP.POP.TOTL | 2000 | 282162411 |
| United States | USA | Population, total | SP.POP.TOTL | 2013 | 316059947 |
| United States | USA | Population, total | SP.POP.TOTL | 2014 | 318386329 |
| United States | USA | Population, total | SP.POP.TOTL | 2015 | 320738994 |
| United States | USA | Population, total | SP.POP.TOTL | 2016 | 323071755 |
| United States | USA | Population, total | SP.POP.TOTL | 2017 | 325122128 |
| United States | USA | Population, total | SP.POP.TOTL | 2018 | 326838199 |
| United States | USA | Population, total | SP.POP.TOTL | 2019 | 328329953 |
| United States | USA | Population, total | SP.POP.TOTL | 2020 | 331511512 |
| United States | USA | Population, total | SP.POP.TOTL | 2021 | 332031554 |
| United States | USA | Population, total | SP.POP.TOTL | 2022 | 333287557 |
Now we can plot the total population of USA between 1990 to 2022
The population of USA grew from 249623000 to 333287557.