Agenda

I. Download datasets
1. Create your own dataset
- Demo for finding your data
1. RStudio for Dataset Creation
- Start your Mini-Excercise#1 in RStudio (due before Lab3)

Part I

Download Datasets (csv files)

CCLE Datasets Section (link)
- Westwood
- Belair

Create your own dataset

Download “Lab 2 worksheet” (link)
Complete the worksheet by going to: https://data.census.gov/cedsci/
See the demo on the next few slides and complete your worksheet after class

Demo for finding your data

go to Advanced Search
Surveys-American Community Survey-5 year estimates
Don’t click “Search” Yet

Demo for finding your data

Year filter 2019

Demo for finding your data

Geography filter whatever ZIP CODE you want to explore (preferably your high school location)

Demo for finding your data

in “advanced search”, type “variable names that required in the worksheet”.

Demo for finding your data

example of total population (variable found in a table related to age and sex)

Finding your data

You may need to calculate the percentage if you only find raw numbers

Part II

RStudio for Dataset Creation (Mini-Excercise #1)

Carefully read and type the variable names in RStudio
- Double and triple check that you have spelled all the variable names EXACTLY as they are spelled in the worksheet.
No comma when inputting number, use proportion (in decimal) not percentage (with a % sign)
Write clear code and comments that you save in an R script file

Creating a R Script file

Open a new R Script file.
Name it “miniexercise1_UID” and save it into your code folder.
Describe the file

# title: Mini_Exercise_#1
# author: Your_Name
# date: 10/9/21
# purpose: Create a dataset about my senior year of high school zip code (and complete my miniexercise #1 for PA 60)
#The zip code I am collecting data about is: [list your zip code here]

Setting up the working directory and load the packages

# Set the working directory where I want my files to live
setwd("/Users/linlizhou/Documents/PA60/datawork/data_source") 

# Load packages
library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.4     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   2.0.1     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(readxl)

Creating Variables (examples)

# assign the value of your zipcode to the variable “zipcode” 
zip <- 90064 

# assign the value of your total population to the variable “totpop” 
totpop <- 27000 # no thousands separator

#  assign the value of Poverty status in the past 12th months of families to the variable “pctfampov”; 
pctfampov <- 6.2 # enter percentage as a number out of 100 without the percentage sign

# If a value is not available, enter NA.
mdfaminc<- NA # values for most of the variables should be available for all the zip codes.
mdageyrs<- NA
pctbaplus<- NA
pctunemployed<- NA
mdfaminc<- NA
pctfampov<- NA
pctwhite<- NA 
pctblack <- NA 
pctasian<- NA
pctlatino<- NA
# finish creating all the variables in the worksheet

Creating a data frame

# Make a dataset out of my variables
mycommunity90064 <- data.frame(zip, totpop, mdageyrs, pctbaplus, pctunemployed, mdfaminc, pctfampov, pctwhite, pctblack, pctasian, pctlatino)

# Look at my dataset to make sure each variable looks correct
View(mycommunity90064)

Save the data frame as a R data file

# Save it as an R dataset in my data folder
save(mycommunity90064,file="mycommunity90064.Rdata")

Part III

read other data files

# Read in a comma separated text file from my source folder that contains data on the westwood zip code
westwood <- read.csv("westwood.csv")

# Read in an excel file on the belair zip code
belair <- readxl::read_excel("belair.xlsx") # alternatively, belair <- read_excel("source/belair.xlsx")

#Examining my westwood data set to make sure the variables look reasonable
View(westwood)
View(belair)

Transform data files into R Data file

#Save it as an R dataset into my data folder
save(westwood,file="westwood.Rdata")
save(belair,file="belair.Rdata")

Append dataset

# Append the three files together
threecommunities <- rbind(mycommunity90064, westwood, belair) #use ?rbind(), if you are wondering what this code do

#Look at the data and make sure everything looks reasonable
View(threecommunities)

# Save the appended file to my data folder
save(threecommunities,file="data/threecommunities.Rdata")

## Warning in gzfile(file, "wb"): cannot open compressed file 'data/
## threecommunities.Rdata', probable reason 'No such file or directory'

## Error in gzfile(file, "wb"): cannot open the connection

Best practices: clear up

#Remove the original variables I made for mycommunity
rm(zip, totpop, mdageyrs, pctbaplus, pctunemployed, mdfaminc, pctfampov, pctwhite, pctblack, pctasian, pctlatino)

#Remove each of the separate datasets
rm(mycommunity90064, westwood, belair)

find a learning buddy

have a learning buddy to:
- Trade your filled-out worksheet with a partner and have them double-check some of your values, to make sure you’ve entered everything correctly. (You can share as a google doc, by emailing to each other, etc.)
- Trade your mycommunity dataset with a partner. view it, and check to make sure your values look reasonable and variable names are spelled correctly.
- Check to make sure each of your datasets contains the same number of variables and all the variable names are the same in all the datasets (the variable names don’t need to be in the same order within the datasets).
- Check your R Script file to make sure it makes sense and is well commented, edit as needed it
- seek help if the code doesn’t work/ run all the way through
Once you’ve done that, comment at the beginning of your R Script this: # My partner looked at my dataset and confirmed the values for each of the variables make sense and correspond to my spreadsheet. My partner also confirmed that the variable names are spelled correctly.

Submit before Lab 3

1. your R script file
titled “miniexercise1_[yourUID].R”
1. the R data file
- titled “mycommunity[yourzipcode].Rdata” Important: It is the mycommunity data file NOT the appended data file

PA60Lab2

Linli Zhou

10/8/2021

Agenda

Part I

Download Datasets (csv files)

Create your own dataset

Demo for finding your data

Demo for finding your data

Demo for finding your data

Demo for finding your data

Demo for finding your data

Finding your data

Part II

RStudio for Dataset Creation (Mini-Excercise #1)

Creating a R Script file

Setting up the working directory and load the packages

Creating Variables (examples)

Creating a data frame

Save the data frame as a R data file

Part III

read other data files

Transform data files into R Data file

Append dataset

Best practices: clear up

find a learning buddy

Submit before Lab 3