This is a simple template to create a reproducible, csv document that can be uploaded to GitHub Gist without using GistR and create.gist. I was inspired to wite this because I was taking a d3.js class from FreeCodeCamp. In the Data Visualization course the data is uploaded to Gist. While I understand why Carran did not use any specific language to augment the data. I wanted to be able to manipulate the data from the language I knew best, so that it was a reproducible method. I did find that there were some tricks to getting this to upload correctly to Gist. This code can be easily modified to suit any R user’s needs. The audience for this document is meant to those with only elementary knowledge of R needing this upload for d3.js. As I am sure there is probably a more intuitive way.
(The assumption right now is that the working directory is already set to the desired location and that the file does not already exist.) We are interested in the Countries and their populations from 1950-2020; convert the population to billions, and sort by most populous. First, we will begin by downloading data from the United Nations Population Prospectives website.
require(tidyverse)
## Loading required package: tidyverse
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.4
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
require(readxl)
## Loading required package: readxl
UNUrl <-"https://population.un.org/wpp/Download/Files/1_Indicators%20(Standard)/EXCEL_FILES/1_Population/WPP2019_POP_F01_1_TOTAL_POPULATION_BOTH_SEXES.xlsx"
temp <-tempfile()
download.file(UNUrl, destfile = temp, mode="wb")
World_population <- read_excel(temp)
## New names:
## * `` -> ...2
## * `` -> ...3
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * ...
At this point I usually inspect the data to ensure that it uploaded in a workable form.
View(World_population)
I see that the first seven rows are basically useless to my needs. So, I will use tidyverse’s call slice to remove these.
World_population_2 <-World_population%>%
slice(-c(1:11))
head(World_population_2)
## # A tibble: 6 x 78
## `United Nations` ...2 ...3 ...4 ...5 ...6 ...7 ...8 ...9 ...10 ...11
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Index Varia~ Regio~ Notes Coun~ Type Pare~ 1950 1951 1952 1953
## 2 1 Estim~ WORLD <NA> 900 World 0 2536~ 2584~ 2630~ 2677~
## 3 2 Estim~ UN de~ a 1803 Labe~ 900 ... ... ... ...
## 4 3 Estim~ More ~ b 901 Deve~ 1803 8148~ 8240~ 8337~ 8437~
## 5 4 Estim~ Less ~ c 902 Deve~ 1803 1721~ 1760~ 1797~ 1833~
## 6 5 Estim~ Least~ d 941 Deve~ 902 1954~ 1991~ 2030~ 2069~
## # ... with 67 more variables: ...12 <chr>, ...13 <chr>, ...14 <chr>,
## # ...15 <chr>, ...16 <chr>, ...17 <chr>, ...18 <chr>, ...19 <chr>,
## # ...20 <chr>, ...21 <chr>, ...22 <chr>, ...23 <chr>, ...24 <chr>,
## # ...25 <chr>, ...26 <chr>, ...27 <chr>, ...28 <chr>, ...29 <chr>,
## # ...30 <chr>, ...31 <chr>, ...32 <chr>, ...33 <chr>, ...34 <chr>,
## # ...35 <chr>, ...36 <chr>, ...37 <chr>, ...38 <chr>, ...39 <chr>,
## # ...40 <chr>, ...41 <chr>, ...42 <chr>, ...43 <chr>, ...44 <chr>,
## # ...45 <chr>, ...46 <chr>, ...47 <chr>, ...48 <chr>, ...49 <chr>,
## # ...50 <chr>, ...51 <chr>, ...52 <chr>, ...53 <chr>, ...54 <chr>,
## # ...55 <chr>, ...56 <chr>, ...57 <chr>, ...58 <chr>, ...59 <chr>,
## # ...60 <chr>, ...61 <chr>, ...62 <chr>, ...63 <chr>, ...64 <chr>,
## # ...65 <chr>, ...66 <chr>, ...67 <chr>, ...68 <chr>, ...69 <chr>,
## # ...70 <chr>, ...71 <chr>, ...72 <chr>, ...73 <chr>, ...74 <chr>,
## # ...75 <chr>, ...76 <chr>, ...77 <chr>, ...78 <chr>
As we can see row 1 contains all the column names so lets make that happen; then, we will drop that row.
names(World_population_2) <- World_population_2 %>% slice(1) %>% unlist()
World_population_3 <- World_population_2 %>% slice(-1)
head(World_population_2)
## # A tibble: 6 x 78
## Index Variant `Region, subreg~ Notes `Country code` Type `Parent code` `1950`
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Index Variant Region, subregi~ Notes Country code Type Parent code 1950
## 2 1 Estima~ WORLD <NA> 900 World 0 25364~
## 3 2 Estima~ UN development ~ a 1803 Labe~ 900 ...
## 4 3 Estima~ More developed ~ b 901 Deve~ 1803 81481~
## 5 4 Estima~ Less developed ~ c 902 Deve~ 1803 17216~
## 6 5 Estima~ Least developed~ d 941 Deve~ 902 19542~
## # ... with 70 more variables: 1951 <chr>, 1952 <chr>, 1953 <chr>, 1954 <chr>,
## # 1955 <chr>, 1956 <chr>, 1957 <chr>, 1958 <chr>, 1959 <chr>, 1960 <chr>,
## # 1961 <chr>, 1962 <chr>, 1963 <chr>, 1964 <chr>, 1965 <chr>, 1966 <chr>,
## # 1967 <chr>, 1968 <chr>, 1969 <chr>, 1970 <chr>, 1971 <chr>, 1972 <chr>,
## # 1973 <chr>, 1974 <chr>, 1975 <chr>, 1976 <chr>, 1977 <chr>, 1978 <chr>,
## # 1979 <chr>, 1980 <chr>, 1981 <chr>, 1982 <chr>, 1983 <chr>, 1984 <chr>,
## # 1985 <chr>, 1986 <chr>, 1987 <chr>, 1988 <chr>, 1989 <chr>, 1990 <chr>,
## # 1991 <chr>, 1992 <chr>, 1993 <chr>, 1994 <chr>, 1995 <chr>, 1996 <chr>,
## # 1997 <chr>, 1998 <chr>, 1999 <chr>, 2000 <chr>, 2001 <chr>, 2002 <chr>,
## # 2003 <chr>, 2004 <chr>, 2005 <chr>, 2006 <chr>, 2007 <chr>, 2008 <chr>,
## # 2009 <chr>, 2010 <chr>, 2011 <chr>, 2012 <chr>, 2013 <chr>, 2014 <chr>,
## # 2015 <chr>, 2016 <chr>, 2017 <chr>, 2018 <chr>, 2019 <chr>, 2020 <chr>
Well, this is looking better, but there is alot of superfulous information and renaming of columns that need to be done.
From these lines we will deselect any information we don't want and filter by just Country using `filter`.
Country_population <- World_population_3%>%
filter (Type=="Country/Area")%>%
select (-Index, -Variant, -Notes, -`Country code`, -`Parent code`)
and we will change the name of the ’ Region, subregion, country or Area *’, to a name that is more meaningful:
colnames(Country_population)[1] <-"Country"
During the import R assign as.character to the populations; we will need to change all 76 of them to as.numeric. Rather than calling all of those variables individually we will use mutate_at. This is another function in tidyverse.
Country_population_2<-Country_population %>%
mutate_at(vars(-"Country", -"Type"), as.numeric)
We will write our own function to multiple all the populations by 1000.
convert_to_billions <-function(x){
y=x*1000
return(y)
}
Country_population_3 <-Country_population_2%>%
mutate_at(vars(-"Country", -"Type"), convert_to_billions)
Now we will sort from greatest to least from the year 2020:
Country_population_3<-Country_population_3 %>%
#group_by(Country)%>%
arrange(desc(Country_population_3$'2020'))
Most people would at this point would think that “Oh, now we just need to put it into a CSV and viola copy paste to Gist.” But isn’t QUITE that simple because Gist does not like quotes. Or atleast I have found in my case, it does not like quotes.
So without further delay, the money shots.
Country_population_3<-Country_population_3%>%
mutate_all( as.character)
You will notice the addition of quote=FALSE in the write file. This is very important and cannot be left out when make a Gist-friendly file in R.
write.csv(Country_population_3,file= "World_Population_3.csv", row.names=FALSE, quote=FALSE)
Now, all that is left is uploading it your Gist account, which I will leave to the user to do.