Setup

Load in data, review regional distribution, filter for NYC only.

Demographics

Age

Race

Income

Gender Identity/Sexuality

note: No good data for this in census

Discipline

rank 1 (V1_1)

Income Sources Analysis (Multiple Response)

Additional Income Sources Analysis (V11)

## [1] No                   Yes                  Skipped Question    
## [4] Prefer not to answer
## 4 Levels: No                   Prefer not to answer ... Yes
##              V11_clean    n   percent
## 1                  Yes 1719 16.997923
## 2                   No 7704 76.179175
## 3 Prefer not to answer  470  4.647483
## 4              Skipped  220  2.175418
## 5                Other    0  0.000000

Comparing to ACS data

Taking five year 2025 ACS data for NYC

Age

Parse

## Getting data from the 2019-2023 5-year ACS

combine with survey data

Race

Harmonize race categories

## Getting data from the 2019-2023 5-year ACS

compare %

viz

## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_col()`).

Notes:

Category Alignment:

Combined ACS “Asian” and “Native Hawaiian/Pacific Islander” to match survey

Kept “Arab or Middle Eastern” separate (not in ACS standard categories)

Included “Multiracial” from ACS (not in your survey)

Hispanic Handling:

ACS treats Hispanic origin separately from race

Your survey combines them (respondents can select both)

Special Notes:

The “Other” category in ACS may include some Middle Eastern respondents

ACS doesn’t have “Prefer not to answer” category

Income

## Getting data from the 2019-2023 5-year ACS

IPUMS

extract

2016–2020 ACS 5-year

clean

filter

#filter for hpd sampling 
nyc_data <- data %>%
  filter(PUMA %in% nyc_puma_codes, GQ==1) %>%  # GQ==1 means household, not group quarters
  mutate(
    is_renter = OWNERSHP == 2,
    household_size = PERNUM,
    ami_100_limit = case_when(
      household_size == 1 ~ 82400,
      household_size == 2 ~ 94250,
      household_size == 3 ~ 106050,
      household_size == 4 ~ 117900,
      household_size == 5 ~ 127400,
      TRUE ~ 127400 + (household_size - 5) * 8000
    ),
    under_100_ami = HHINCOME < ami_100_limit
  ) %>%
  filter(is_renter, under_100_ami) %>% 
  filter(household_size < 9)

# weighted totals
nyc_data %>%
  group_by(OCC) %>%
  summarise(weighted_total = sum(PERWT, na.rm = TRUE)) %>%
  arrange(desc(weighted_total))
## # A tibble: 507 × 2
##    OCC       weighted_total
##    <int+lbl>          <dbl>
##  1    0 []           498433
##  2 4720               23264
##  3 4220               17652
##  4 9130               15201
##  5 6260               15101
##  6 4760               15084
##  7 5240               14353
##  8 4110               13564
##  9 4230               12871
## 10 4020               12330
## # ℹ 497 more rows
# HPD Artist definition

# Create a named vector of occupation codes and labels
artist_occ_labels <- c(
  "2600" = "ENT-Artists And Related Workers",
  "2640" = "ENT-Other Designers",
  "2700" = "ENT-Actors",
  "2740" = "ENT-Dancers And Choreographers",
  "2752" = "ENT-Musicians and Singers",
  "2770" = "ENT-Entertainers And Performers, Sports and Related Workers, All Other",
  "2850" = "ENT-Writers And Authors",
  "2910" = "ENT-Photographers"
)

# Filter existing NYC PUMS data
nyc_artists <- nyc_data %>%
  filter(OCC %in% as.numeric(names(artist_occ_labels))) %>%
  mutate(artist_label = artist_occ_labels[as.character(OCC)])

PUMS breakdown

race breakdown

race viz

unweighted

Race/Ethnicity: NYC Artists vs All NYC Residents
Race/Ethnicity Count (All NYC) % (All NYC) Count (Narrow Artists) % (Narrow Artists)
American Indian or Alaska Native 2,395 6.4 18 9.1
Asian (Non-Hispanic) 4,383 11.6 25 12.6
Black or African American (Non-Hispanic) 6,671 17.7 17 8.6
Hispanic or Latino 9,111 24.2 25 12.6
Some Other Race 1,124 3.0 6 3.0
Two or More Races 121 0.3 2 1.0
White (Non-Hispanic) 13,859 36.8 105 53.0

weighted

# Weighted race breakdown for all NYC residents
race_all_wt <- nyc_data %>%
  group_by(race_group) %>%
  summarise(count_all = sum(PERWT, na.rm = TRUE)) %>%
  mutate(percent_all = count_all / sum(count_all))

# Weighted race breakdown for NYC artists
race_artists_wt <- nyc_artists %>%
  group_by(race_group) %>%
  summarise(count_artists = sum(PERWT, na.rm = TRUE)) %>%
  mutate(percent_artists = count_artists / sum(count_artists))

# Join weighted data
race_compare_wt <- full_join(race_all_wt, race_artists_wt, by = "race_group") %>%
  mutate(
    percent_all = percent_all * 100,
    percent_artists = percent_artists * 100,
  ) %>%
  select(race_group, count_all, percent_all, 
         count_artists, percent_artists)

race_compare_wt %>%
  gt() %>%
  cols_label(
    race_group = "Race/Ethnicity",
    count_all = "Weighted Count (All NYC)",
    percent_all = "% (All NYC)",
    count_artists = "Weighted Count (Artists)",
    percent_artists = "% (Artists)"
  ) %>%
  fmt_number(
    columns = c(count_all, count_artists),
    decimals = 0,
    use_seps = TRUE
  ) %>%
  fmt_number(
    columns = c(percent_all, percent_artists),
    decimals = 1
  ) %>%
  tab_header(
    title = "Weighted Race/Ethnicity: NYC Artists vs All NYC Residents"
  )
Weighted Race/Ethnicity: NYC Artists vs All NYC Residents
Race/Ethnicity Weighted Count (All NYC) % (All NYC) Weighted Count (Artists) % (Artists)
American Indian or Alaska Native 69,605 6.4 509 10.6
Asian (Non-Hispanic) 108,879 10.0 580 12.0
Black or African American (Non-Hispanic) 217,290 20.0 515 10.7
Hispanic or Latino 283,698 26.2 705 14.6
Some Other Race 33,181 3.1 98 2.0
Two or More Races 2,799 0.3 25 0.5
White (Non-Hispanic) 368,501 34.0 2,390 49.6

Cross Variable Analysis (Survey)

race / income

reshape

viz

by income quartile