You will have an idea of what R can do.
You will learn how to install R and R Studio onto your computer.
You will learn about R packages and libraries.
You be exposed to simple and complex code.
You will see two different examples of data visualization.
You will see how you can publish your document here: RPubs
SPSS
SAS
Stata
Minitab
ArcGIS
Working mainly with R since 2021
Great data visualizations:
Capability of creating reports and presentations in a single document
Ability to scrape data directly from the Web
Reproducibility and replicability
Parallels to Python
Look at documentation produced by R Developers.
Collect Posit Cheat Sheets.
Identify errors in your code in a sequential manner with code chunks.
Copy the error message into Google.
Be sure you have all the appropriate packages and libraries.
Check the variable types and case of letters.
Enjoy the challenges!
Uninstall R and RStudio any time you want download new versions.
See the instructions here: Installing R and RStudio
Download R.
Download RStudio.
Download the Quarto document “Using R to Study Baby Names” from Canvas to your computer.
Open the Quarto document in RStudio.
Look for sample code like this: Analyzing US Baby Names
Save the links for materials that look useful.
Make the necessary changes to the code.
Debug the code.
Ask yourself if the results make any sense.
Add comments soon after you have cracked the code.
The RStudio user interface has 4 primary panes: source, console, environment, and output
My Quarto document appears in the upper left pane.
Click on Install in the lower right corner to load packages that you need.
List the libraries that you need.
1) Visualizing Trends in the Rate at Which Babies are Named Taylor
2) Mapping the Distribution of Boys Named Nicholas across States
WARNING: Today’s examples require data wrangling at a very advanced level.
You can easily switch the name, sex, and year.
I have downloaded the csv file (under Canvas) for this example.
Read the data into RStudio.
# A tibble: 6 × 4
name sex count year
<chr> <chr> <dbl> <dbl>
1 Mary F 7065 1880
2 Anna F 2604 1880
3 Emma F 2003 1880
4 Elizabeth F 1939 1880
5 Minnie F 1746 1880
6 Margaret F 1578 1880
# creating an indicator for whether the baby is named Taylor
baby_names_national$Taylor <- ifelse(baby_names_national$name == 'Taylor', 1, 0)
#calculating the number of girls named Taylor in each state
baby_names_national$Taylorcount<-(baby_names_national$Taylor)*(baby_names_national$count)
#creating a new data set that counts the number of girls and boys each year named Taylor, as well as the total numbers of boys and girls in each state
Taylorprobs<- baby_names_national %>%
group_by(year,sex) %>%
summarize(numTaylor=sum(Taylorcount),n=sum(count))
#calculating the proportion of boys and girls named Taylor each year
Taylorprobs$probTaylor<-(Taylorprobs$numTaylor)/(Taylorprobs$n)
#creating this variable for the figure
Taylorprobs$Sex<-Taylorprobs$sex # A tibble: 282 × 6
# Groups: year [141]
year sex numTaylor n probTaylor Sex
<dbl> <chr> <dbl> <dbl> <dbl> <chr>
1 1880 F 0 90994 0 F
2 1880 M 37 110490 0.000335 M
3 1881 F 0 91953 0 F
4 1881 M 39 100738 0.000387 M
5 1882 F 0 107847 0 F
6 1882 M 27 113686 0.000237 M
7 1883 F 0 112319 0 F
8 1883 M 27 104625 0.000258 M
9 1884 F 0 129019 0 F
10 1884 M 21 114442 0.000183 M
# ℹ 272 more rows
ggplot(Taylorprobs, aes(x = year, y = probTaylor, col = Sex)) + geom_line() + geom_point() + labs(x="Year of Birth", y="Percent",title="Percent of Babies Named Taylor by Calendar Year",caption="Source: Analyses in R by Joyner of data from https://data.world/nkrishnaswami/us-ssa-baby-names-national")baby_names_state <- read_csv("C:/Users/Ameria/Downloads/R1/baby_names_state.csv")
head(baby_names_state)# A tibble: 6 × 5
state_abb sex year name count
<chr> <chr> <dbl> <chr> <dbl>
1 AK F 1910 Mary 14
2 AK F 1910 Annie 12
3 AK F 1910 Anna 10
4 AK F 1910 Margaret 8
5 AK F 1910 Helen 7
6 AK F 1910 Elsie 6
#keeping only male babies born in 2000
baby_names_state <-filter(baby_names_state, sex=='M')
baby_names_state <-filter(baby_names_state, year=='2000')
#changing the name of a variable so I can join it to another file later
baby_names_state$state_abbv<-baby_names_state$state_abb
#creating an indicator of whether the baby is named Nicholas
baby_names_state$Nicholas <- ifelse(baby_names_state$name == 'Nicholas', 1, 0)
#filtering the data so that it consists only of boys names Nicholas
baby_names_state <-filter(baby_names_state, Nicholas==1)#now we are grouping the data by state so that we can determine the number and percent of boys named Nicholas in 2000 who reside in each state
baby_names_state_transformed <- baby_names_state %>%
group_by(state_abbv) %>%
# Variable to be transformed
mutate(num = count) %>%
ungroup() %>%
mutate(perc = num / sum(num)) %>%
arrange(perc) %>%
mutate(labels = scales::percent(perc))# A tibble: 51 × 10
state_abb sex year name count state_abbv Nicholas num perc labels
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
1 WY M 2000 Nicholas 30 WY 1 30 0.00122 0.121…
2 AK M 2000 Nicholas 36 AK 1 36 0.00146 0.146…
3 MT M 2000 Nicholas 43 MT 1 43 0.00174 0.174…
4 VT M 2000 Nicholas 47 VT 1 47 0.00191 0.190…
5 ND M 2000 Nicholas 51 ND 1 51 0.00207 0.206…
6 HI M 2000 Nicholas 64 HI 1 64 0.00260 0.259…
7 SD M 2000 Nicholas 65 SD 1 65 0.00264 0.263…
8 DC M 2000 Nicholas 69 DC 1 69 0.00280 0.279…
9 ID M 2000 Nicholas 89 ID 1 89 0.00361 0.361…
10 DE M 2000 Nicholas 94 DE 1 94 0.00381 0.381…
# ℹ 41 more rows
install.packages("remotes")
remotes::install_github("UrbanInstitute/urbnthemes")
library(urbnmapr)
library(urbnthemes)
library(janitor)
library(sf)
library(devtools)
install_github("UrbanInstitute/urbnthemes", force = TRUE)
── R CMD build ─────────────────────────────────────────────────────────────────
* checking for file 'C:\Users\Ameria\AppData\Local\Temp\RtmpK4KRpC\remotes435c28aa268c\UrbanInstitute-urbnthemes-f6a368d/DESCRIPTION' ... OK
* preparing 'urbnthemes':
* checking DESCRIPTION meta-information ...Warning in person1(given = given[[i]], family = family[[i]], middle = middle[[i]], :
It is recommended to use 'given' instead of 'middle'.
OK
Warning in person1(given = given[[i]], family = family[[i]], middle = middle[[i]], :
It is recommended to use 'given' instead of 'middle'.
* checking for LF line-endings in source and make files and shell scripts
* checking for empty or unneeded directories
* building 'urbnthemes_0.0.2.tar.gz'
# A tibble: 51 × 13
state_abb sex year name count state_abbv Nicholas num perc labels
<chr> <chr> <dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl> <chr>
1 WY M 2000 Nicholas 30 WY 1 30 0.00122 0.121…
2 AK M 2000 Nicholas 36 AK 1 36 0.00146 0.146…
3 MT M 2000 Nicholas 43 MT 1 43 0.00174 0.174…
4 VT M 2000 Nicholas 47 VT 1 47 0.00191 0.190…
5 ND M 2000 Nicholas 51 ND 1 51 0.00207 0.206…
6 HI M 2000 Nicholas 64 HI 1 64 0.00260 0.259…
7 SD M 2000 Nicholas 65 SD 1 65 0.00264 0.263…
8 DC M 2000 Nicholas 69 DC 1 69 0.00280 0.279…
9 ID M 2000 Nicholas 89 ID 1 89 0.00361 0.361…
10 DE M 2000 Nicholas 94 DE 1 94 0.00381 0.381…
# ℹ 41 more rows
# ℹ 3 more variables: state_fips <chr>, state_name <chr>,
# geometry <MULTIPOLYGON [m]>
ggplot() + geom_sf(spatial_data, mapping = aes(fill = perc, geometry=geometry), color = "#ffffff", size = 0.25) + scale_fill_gradientn(labels = scales::percent) + labs(fill = "Percent") + coord_sf(datum = NA) + labs(title="Distribution across States of Boys Named Nicholas in 2000",caption="Source: Analyses in R by Joyner of data from https://data.world/nkrishnaswami/us-ssa-baby-names-state")Render your document to find any problems.
Go to this Website: R Pubs
Register in RPubs if you don’t already have an account.
After you register you can publish your document by clicking the light blue symbol in the upper right of the upper left pane.