Overview

This file is provided as a preliminary resource until official data is added to the critstats package. You may also use this code to gather data related to your class project, thesis, or other academic tasks beyond what is provided below. Content in this file comes from a host of different sources which you should be familiar with prior to access and analyzing any data.

Set up your work enviornment

Open up a new .Rmd file.

Use {r setup, include=F} in your first code chunk.

knitr::opts_chunk$set(echo = TRUE)

# Load necessary libraries
library(knitr)
library(kableExtra)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter()     masks stats::filter()
## ✖ dplyr::group_rows() masks kableExtra::group_rows()
## ✖ dplyr::lag()        masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readr) 
library(dplyr)
library(tidyr)

Data

We start with Pew to get an idea of the publicly available files.

We then load the data from the Annual Business Survey (ABS) Program.

I extract data from the 2007 Survey of Business Owners Public Use Microdata Sample.

# Load necessary library
# install.packages("readr")  # Uncomment if 'readr' is not installed
library(readr)

# Define the URL of the ZIP file
url <- "https://www2.census.gov/programs-surveys/sbo/datasets/2007/pums_csv.zip"

# Download the ZIP file
download.file(url, destfile = "pums_csv.zip")

# Unzip the file
unzip("pums_csv.zip", exdir = "pums_data")

# List the contents of the unzipped directory
files <- list.files("pums_data", full.names = TRUE)
print(files)

# Read a specific CSV file (replace 'your_file.csv' with the actual filename)
data <- read_csv("pums_data/pums.csv")  # Adjust index as necessary based on your files

# Sample 10% of the data
data %>% sample_frac(0.1)

Plot

Try to work with the data and generate some summary statistics.

Percent Black or African American-owned businesses (2021)

Survey of Business Owners Public Use Microdata Sample

Nathan Alexander

Center for Applied Data Science and Analytics (CADSA)

Howard University

2024-11-16

Overview

Set up your work enviornment

Data

Plot