Part II
RStudio for Dataset Creation (Mini-Excercise #1)
- Carefully read and type the variable names in RStudio
- Double and triple check that you have spelled all the variable names EXACTLY as they are spelled in the worksheet.
- No comma when inputting number, use proportion (in decimal) not percentage (with a % sign)
- Write clear code and comments that you save in an R script file
Creating a R Script file
- Open a new R Script file.
- Name it “miniexercise1_UID” and save it into your code folder.
- Describe the file
# title: Mini_Exercise_#1
# author: Your_Name
# date: 10/9/21
# purpose: Create a dataset about my senior year of high school zip code (and complete my miniexercise #1 for PA 60)
#The zip code I am collecting data about is: [list your zip code here]
Setting up the working directory and load the packages
# Set the working directory where I want my files to live
setwd("/Users/linlizhou/Documents/PA60/datawork/data_source")
# Load packages
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.4 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.1 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readxl)
Creating Variables (examples)
# assign the value of your zipcode to the variable “zipcode”
zip <- 90064
# assign the value of your total population to the variable “totpop”
totpop <- 27000 # no thousands separator
# assign the value of Poverty status in the past 12th months of families to the variable “pctfampov”;
pctfampov <- 6.2 # enter percentage as a number out of 100 without the percentage sign
# If a value is not available, enter NA.
mdfaminc<- NA # values for most of the variables should be available for all the zip codes.
mdageyrs<- NA
pctbaplus<- NA
pctunemployed<- NA
mdfaminc<- NA
pctfampov<- NA
pctwhite<- NA
pctblack <- NA
pctasian<- NA
pctlatino<- NA
# finish creating all the variables in the worksheet
Creating a data frame
# Make a dataset out of my variables
mycommunity90064 <- data.frame(zip, totpop, mdageyrs, pctbaplus, pctunemployed, mdfaminc, pctfampov, pctwhite, pctblack, pctasian, pctlatino)
# Look at my dataset to make sure each variable looks correct
View(mycommunity90064)
Save the data frame as a R data file
# Save it as an R dataset in my data folder
save(mycommunity90064,file="mycommunity90064.Rdata")
Part III
read other data files
# Read in a comma separated text file from my source folder that contains data on the westwood zip code
westwood <- read.csv("westwood.csv")
# Read in an excel file on the belair zip code
belair <- readxl::read_excel("belair.xlsx") # alternatively, belair <- read_excel("source/belair.xlsx")
#Examining my westwood data set to make sure the variables look reasonable
View(westwood)
View(belair)
Append dataset
# Append the three files together
threecommunities <- rbind(mycommunity90064, westwood, belair) #use ?rbind(), if you are wondering what this code do
#Look at the data and make sure everything looks reasonable
View(threecommunities)
# Save the appended file to my data folder
save(threecommunities,file="data/threecommunities.Rdata")
## Warning in gzfile(file, "wb"): cannot open compressed file 'data/
## threecommunities.Rdata', probable reason 'No such file or directory'
## Error in gzfile(file, "wb"): cannot open the connection
Best practices: clear up
#Remove the original variables I made for mycommunity
rm(zip, totpop, mdageyrs, pctbaplus, pctunemployed, mdfaminc, pctfampov, pctwhite, pctblack, pctasian, pctlatino)
#Remove each of the separate datasets
rm(mycommunity90064, westwood, belair)
find a learning buddy
- have a learning buddy to:
- Trade your filled-out worksheet with a partner and have them double-check some of your values, to make sure you’ve entered everything correctly. (You can share as a google doc, by emailing to each other, etc.)
- Trade your mycommunity dataset with a partner. view it, and check to make sure your values look reasonable and variable names are spelled correctly.
- Check to make sure each of your datasets contains the same number of variables and all the variable names are the same in all the datasets (the variable names don’t need to be in the same order within the datasets).
- Check your R Script file to make sure it makes sense and is well commented, edit as needed it
- seek help if the code doesn’t work/ run all the way through
- Once you’ve done that, comment at the beginning of your R Script this: # My partner looked at my dataset and confirmed the values for each of the variables make sense and correspond to my spreadsheet. My partner also confirmed that the variable names are spelled correctly.
Submit before Lab 3
- your R script file
- titled “miniexercise1_[yourUID].R”
- the R data file
- titled “mycommunity[yourzipcode].Rdata” Important: It is the mycommunity data file NOT the appended data file