Course Exercise #3

Load the Data

options(scipen=999)
ppp_ma <- read_csv("https://zamith.umasscreate.net/data/ppp_loans_ma.csv", col_types=cols(LoanAmount="d"))
naics_codes <- read_csv("https://zamith.umasscreate.net/data/naics_2017.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   id = col_double(),
##   name = col_character()
## )

city_standardized <- read_csv("https://zamith.umasscreate.net/data/ppp_cities_standardized_names.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   City = col_character(),
##   CityStandardized = col_character()
## )

Let’s confirm that the data were imported correctly:

head(ppp_ma)

head(naics_codes)

head(city_standardized)

1. Combining Data

Briefly explain what you are trying to do here.

Join the ppp_ma and naics_codes datasets using an innerjoin function. This way, we can use the column with the NAICS codes in each dataset to merge them, and the column from the naics_codes dataset that has descriptions of naics codes will be added to the dataset
Join the dataset just made (ppp_ma_clean) and city_standardized datasets using an innerjoin function. This way, we can use the column with the City names in each dataset to merge them, and the column from the city_standardized dataset that has cleaned City names will be added to the dataset
Remove the messy city name column from the final dataset (ppp_ma_cleaner) by using the select function with the - sign infront of the column name, indicating the removal of the variable from the dataset
Rename the Naics code description column by using the rename function in the same pipe

#Step 1:
#return all rows from ppp_ma where there are matching values in naics_codes and all columns from ppp_ma and naics_codes
ppp_ma_clean <- inner_join(ppp_ma, naics_codes, by = c("NAICSCode" = "id"))
##new data frame <- join_function(primary dataframe, secondary dataframe, by = c("column name in primary dataframe" = "column name in secondary dataframe"))##

#Step 2:
#return all rows from ppp_ma_clean where there are matching values in city_standardized and all columns from ppp_ma and city_standardized
ppp_ma_cleaner <- inner_join(ppp_ma_clean, city_standardized, by = c("City" = "City"))


#Steps 3&4:
ppp_ma_cleaner <- ppp_ma_cleaner %>% #use ppp_ma_cleaner dataset and rewrite over it
  select(-City) %>% #choose just the column City and remove it
  rename(NAICSdesc = name) #rename the column name to NAICSdesc

2. Data Analysis

Question #1

How many businesses recievied loans in North Hampton?

noho <- ppp_ma_cleaner %>% #created a northampton only dataset
  filter(CityStandardized == "Northampton") #filtering out all cities that arent noho

unique(noho$LoanRange) #select just one variable (column) and return all unique values

## [1] "$5-10 million"      "$2-5 million"       "$1-2 million"      
## [4] "$350,000-1 million" "$150,000-350,000"   NA

#unique(noho$LoanAmount)

nohocount <- noho %>% #pipe northhampton only data into new dataframe
  filter( `LoanAmount` != "NA") %>% #filter out all observations with NA as the loan amount
  summarise(`BusinessName` = n()) #count the number of businesses in this subset 
#387
nohocount2 <- noho %>% #pipe northhampton only data into new dataframe
  filter( `LoanRange` != "NA") %>% #filter out all observations with NA as the loan amount
  summarise(`BusinessName` = n()) #count the number of businesses in this subset
#78

What is the answer to your question?

There are 387 businesses in Northampton that recieved loans under $150,000, and 78 that recieved loans over $150,000. I believe this would make for a total of 465 businesses in Northampton that recieved any type of loan, but I’m going to check the codebook for these data before I say that for sure.

Question #2

How many businesses recievied loans in each Massachusetts municipality?

loanamtbytown <- ppp_ma_cleaner %>% #pipe cleaned ma data into new dataframe
  group_by(CityStandardized) %>% # group by town
  filter( `LoanAmount` != "NA") %>% #filter out all observations with NA as the loan amount
  summarise(`BusinessName` = n()) %>% #count the number of businesses in this subset 
  arrange(desc(`BusinessName`)) #arrange subset in descending order so highest values show up first

head(loanamtbytown) #view data

What is the answer to your question?

This was evidently an open-ended answer! I took a look at the data by the number of businesses in each town recieving a loan. It appears that Boston is the municipality with the highest number of loans given to it’s businesses, but I probably could’ve assumed as much. However, this is great context to have going forward.

Question #3

How does the distribution of Loans across loan range brackets in the state compare with that of Northampton?

#using mutate below- create a new variable called LoanBracket by recoding the levels of the variable LoanRange into factors by setting each bracket equal to numeric dummy value

unique(noho$LoanRange) #select just one variable (column) and return all unique values

## [1] "$5-10 million"      "$2-5 million"       "$1-2 million"      
## [4] "$350,000-1 million" "$150,000-350,000"   NA

ppp_ma_cleaner <- ppp_ma_cleaner %>% #pipe clean data into new dataframe
  mutate(LoanBracket = recode_factor(LoanRange, "$150,000-350,000" = 1, 
                                     "$350,000-1 million" = 2, 
                                     "$1-2 million" = 3, 
                                     "$2-5 million" = 4, 
                                     "$5-10 million" = 5 ))

#Create a northampton subset 
NohoBracket <- ppp_ma_cleaner %>% #pipe data into new northampton data frame 
  filter( `CityStandardized` == "Northampton") %>% #filter out all observations that aren't Noho
  group_by(LoanBracket) %>% #group by loan bracket
  summarise(`BusinessName` = n())#count the number of businesses in this subset 

Noho <- ppp_ma_cleaner %>% #pipe data into new northampton data frame 
  filter( `CityStandardized` == "Northampton") %>% #filter out all observations that aren't Noho
  group_by(LoanBracket) #group by loan bracket
head(Noho)

#Number of Loans Given by Loan Range Bracket - Northampto
NohoBracket %>%
  filter(!is.na(`LoanBracket`)) %>%
  ggplot(., aes( x = `LoanBracket`, y = `BusinessName`)) +
    geom_bar(stat = "identity") +
    ggtitle("Number of Loans Given by Loan Range Bracket - Northampton")+
    ylab("Number of Loans Given")

#Create a Massachusetts subset 
MABracket <- ppp_ma_cleaner %>% #pipe data into new northampton data frame 
  group_by(LoanBracket) %>% #group by loan bracket
  summarise(`BusinessName` = n())#count the number of businesses in this subset 

#Number of Loans Given by Loan Range Bracket - Massachusetts
MABracket %>%
  filter(!is.na(`LoanBracket`)) %>%
  ggplot(., aes( x = `LoanBracket`, y = `BusinessName`)) +
    geom_bar(stat = "identity") +
    ggtitle("Number of Loans Given by Loan Range Bracket - Massachusetts")+
    ylab("Number of Loans Given")

What is the answer to your question?

Looking to get a little bit more aggregated here, I wanted to see how many loans of what loan bracket were given out overall. It appears that loans under $350,000 were given out most often, and that the number of loans given out in each sucessive category is substantially smaller. I’m looking for a little more context now, as a Northampton reporter, so I’ll do a little comparison next… and it appears that Northampton has a very similar distribution of loans across loan ranges. Be sure to note- the scales for these graphs are VERY different!

Question #4

Which industries are getting loans?

#create a new industry subset
Industry <- ppp_ma_cleaner %>% #pipe data into new dataframe
  group_by(NAICSCode, NAICSdesc) %>% #group by NAICS Code and Description for ease of viewing
  summarise(`BusinessName` = n())#count the number of businesses in this subset

## `summarise()` has grouped output by 'NAICSCode'. You can override using the `.groups` argument.

#Number of Loans Given by Industry - Massachusetts
Industry %>%
  filter(!is.na(`BusinessName`)) %>%
  filter(`BusinessName` > 2070) %>%
  ggplot(., aes( x = `NAICSdesc`, y = `BusinessName`)) +
    geom_bar(stat = "identity") +
    ggtitle("Number of Loans Given by Industry - Massachusetts")+
    ylab("Number of Loans Given") +
    theme(
     axis.text.x = element_text(angle = 5))

What is the answer to your question?

It appears that restaurants are recieving loans more than any other industry across Massachusetts, with full and limited service resteraunts coming in the top 5 as well as dentists, lawers and real estate agents.

Question #5

Which industries are getting loans in Northampton?

#create a new industry data frame for northampton only
IndustryNoho <- ppp_ma_cleaner %>%#pipe data into new dataframe
  filter( `CityStandardized` == "Northampton") %>% #filter out all observations that aren't Noho 
  group_by(NAICSCode, NAICSdesc) %>% #group by NAICS Code and Description for ease of viewing
  summarise(`BusinessName` = n())#count the number of businesses in this subset

## `summarise()` has grouped output by 'NAICSCode'. You can override using the `.groups` argument.

#Number of Loans Given by Industry - Northampton
IndustryNoho %>%
  filter(!is.na(`BusinessName`)) %>%
  filter(`BusinessName` > 11) %>%
  ggplot(., aes( x = `NAICSdesc`, y = `BusinessName`)) +
    geom_bar(stat = "identity") +
    ggtitle("Number of Loans Given by Industry - Northampton")+
    ylab("Number of Loans Given") +
    theme(
     axis.text.x = element_text(angle = 5))

What is the answer to your question?

It appears that restaurants are recieving loans more than any other industry across Northampton as well, with full and limited service resteraunts coming in the top 5 here as well, accompanied by lawyers (again), mental health practitioners and physicians.

3. Short Article (400-500 words)

My groupmates worked on this portion!

4. The Interviewees

Interviewee #1

Interviewee Info

Name, Title/Position, Organization - Jim McGovern, U.S. Congressman, Massachusetts 2nd Congressional District, U.S. House of Representatives

- Contact information (e.g., phone number or e-mail) - Northampton Office Phone Number: 413-341-8700

- URL to a page with details about them (e.g., biography page, personal website) - https://mcgovern.house.gov/about/

- Specifics about what you think they could add to your article (e.g., a specific finding you think they could help contextualize, and why you think they’d have the expertise/lived experience to do that) - Congressman Jim McGovern represents the Massachusetts 2nd Congressional District in the U.S. House of Representatives. This district encompasses Amherst, Northampton and other surrounding communities whose local small businesses directly benefit from the PPP legislation that Congressman McGovern was endorsing. I think in talking to him one can get a better idea of why it was important to be pushing for this financial aid to small businesses in Washington and what the Congressman has heard and seen personally that propelled him to get the PPP funding enacted and distributed.

Interview Questions

- First interview question you’d ask them. (Be sure to phrase these in an appropriate way.) - Congressman, why are you in favor of this PPP funding? What is the direct benefit of giving small businesses this aid amid the global pandemic?

- Second interview question you’d ask them. (Be sure to phrase these in an appropriate way.) - Have you met with and seen firsthand small businesses in your district struggling to support themselves right now and in what ways? What does this funding mean to them?

- Third interview question you’d ask them. (Be sure to phrase these in an appropriate way.) - Do you think additional PPP funding will be necessary in the future and if so, would you be in favor of supporting another round of such funding and why?

Interviewee #1

Interviewee Info

- Name, Title/Position, Organization - Ben Levy, Co-Owner, Amherst Dog Wash (local small pet grooming business which received $1,159 in PPP funding according to ProPublica, source: https://projects.propublica.org/coronavirus/bailouts/loans/states/MA?page=1639)

- Contact information (e.g., phone number or e-mail) - Phone Number: 413-253-9274

- URL to a page with details about them (e.g., biography page, personal website) - https://www.amherstdogwash.com/team-2

- Specifics about what you think they could add to your article (e.g., a specific finding you think they could help contextualize, and why you think they’d have the expertise/lived experience to do that) - Being the owner of a small business that caters to the local Amherst community’s pet grooming needs I would think that Ben Levy and his business are among the many which have been impacted by the COVID-19 pandemic. It even states on their website that due to the pandemic that they are only taking appointments and no walk-ins at this time as well as taking extra precautions. I think Ben would have the lived experience to explain just what this PPP funding means for him and his employees as they manage through these difficult times.

Interview Questions

- First interview question you’d ask them. (Be sure to phrase these in an appropriate way.) - You received $1,159 dollars from the PPP to help support your small business during this time. Why is that significant and what did it mean to you and your employees?

- Second interview question you’d ask them. (Be sure to phrase these in an appropriate way.) - In what ways has business had to be conducted differently due to the pandemic and have those changes put a hardship on your business model and revenue stream to pay your employees?

- Third interview question you’d ask them. (Be sure to phrase these in an appropriate way.) - Do you think future PPP funding may be necessary to assist your business until we trend back to normalcy and come out of the pandemic and its restrictions and limitations?

Course Exercise #3

Kazmiera Breest collaborating with Isha Mahajan and Christopher McLaughlin

Load the Data

1. Combining Data

2. Data Analysis

Question #1

Question #2

Question #3

Question #4

Question #5

3. Short Article (400-500 words)

4. The Interviewees

Interviewee #1

Interviewee #1