Reading USA Spending contracts

Download & load

Packages

You may have to install some if they give you an error.

require(tidyverse)
require(DT)
require(lubridate)
require(scales)

Sources

I had started with the idea we could use the API, but it was possible to just make a query that goes back to 2008 for all transactions. The only trick is that you have to choose the “sub-agency” FEMA, not the usual one, because before DHS it was an independent agency.

It says you can use the download link anytime, but if you don’t copy it (and the Copy Link button doesn’t work), I don’t see a way to get back to it.

Note that my search is on “Funding Agency”, not “Award agency”. The reason is that there are sometimes government-wide contracts that are managed by other agencies, and the agency just tags along. This happened during the Iraq war, when the Interior Department was hirng guards at Abu Ghraib.

I renamed the folder that was created from the zip file that starts with “all_prime_transactions_subawards…” to “fema_downlaods”. It was downloaded on June 4, 2019.

There were 4 files in there:

  • all_assistance_prime_transactions_1.csv (grants and loans)
  • all_assistance_subawards_1.csv (subawards from major awardees to sub-awardees. these are really hard to use.)
  • all_contracts_prime_transactions_1.csv (contracts)
  • all_contracts_subawards_1.csv (same idea - sub-contracts from major contractors like Boeing. Very hard to work with.)

Here’s an “analysts’ guide” to the USA spending that might answer some questions: https://datalab.usaspending.gov/assets/analyst-guide-1-2.pdf

Reading the data

The easiest thing to get is contracts. Let’s see how it guesses.

It seemed to guess pretty well except for a few fields that it guesses as logical that have some illegal values. I changed them so to override the default types.

This section shouldn’t have to be run again, and won’t be when you knit it.

  1. Created a subset of rows so that I could just look at them – ones that I know had issues from the “problems”.
  2. Fixed the obvious ones, by looking at problems(fema_contracts_orig)

Then I went back and fixed the import, this time taking care to set the problematic fields to “c” instead of logical or numeric. This code should never have to run again, just load the saved file.

fema_contracts_orig <- 
  read_csv(file="raw_fema_data/fema_downlaods/all_contracts_prime_transactions_1.csv", 
           col_types = cols (.default = col_guess(),
                             sam_exception = "c", 
                             other_statutory_authority = "c", 
                             research_code = "c", 
                             research = "c",
                             awarding_sub_agency_code = "c",
                             action_date=col_date())
           )

#just turning it into a simpler object so you don't accidentally try to print the whole thing. Also, this will show you the data types at the top of the view.  

fema_contracts_orig <- as_tibble(fema_contracts_orig)
#let's go ahead and save it so you don't have to do this again

save(fema_contracts_orig, file="usaspending_orig.RData")
rm(fema_contracts_orig)

This code is how I trouble-shooted the original reading. First, I looked at problems(fema_contracts_orig), which tells me which rows had issues. There, I could see that there were some fields that were being read as “logical” (T/F/1/0/etc) but weren’t, and a few others that should have been numbers but werent.

Second, I chose some of the rows (1650, 6664, and 16733) to just look at. Again, I just looked at it then removed the interim dataset. This code chouldn’t have to be run again, so it’s set in the r chunk options to {r eval=FALSE}

problems(fema_contracts_orig)

#look at a couple of the rows with problems
subset <- fema_contracts_orig [c(1650, 6664, 16733), ]

glimpse(subset)

rm(subset)

First look on contracts

Let’s see which states get the most by year:

First, we have to load the dataset I saved above. This takes a while because it has SO MANY columns (261 of them to be exact.) Note that the RData file name isn’t the same as the data frame / tibble name. That’s because you can save lots of data frames and objects into an RData file, which will then package them up for you and let you load them whenever you want.

load("fema_contracts.RData")

By state and year

The lubridate library lets us get the year, month, day from a date variable, so this way we could get the fiscal year. If the action_date is in months 10, 11, 12, then it’s the following fiscal year. (A fiscal year is noted by the year in which it ends, not begins. So we’re currently in FY 19 until Sept. 30)

This is a pretty straightforward set of code: figure out the fiscal year, group by and summarize, sort by descending amounts, then format the numbers into readable values in the DT (datatable) package. It will look better when it’s knitted.

fema_sumdata <-
  fema_contracts_orig %>%
  mutate( fiscal_year = ifelse(month(action_date) >= 10, year(action_date+1), year(action_date))) %>%
  select (fiscal_year, st_abbr = primary_place_of_performance_state_code, recipient_duns, total_dollars_obligated, 
          federal_action_obligation, award_id_piid) %>%
  group_by(fiscal_year, st_abbr) %>% 
  summarise ( num_of_actions = n(), num_of_contractors = n_distinct(recipient_duns), 
              obligations = sum(total_dollars_obligated, na.rm=TRUE), 
              num_of_contracts = n_distinct(award_id_piid),
              federal_dollars = sum(federal_action_obligation, na.rm=TRUE) 
             ) 



fema_sumdata %>%
  arrange(desc(federal_dollars)) %>%
  datatable() %>% 
  formatCurrency(c("obligations", "federal_dollars")) %>% 
  formatRound(c("num_of_actions", "num_of_contractors"), digits=0, mark=",")

A couple of notes on this:

  • DC (and state capitals) will always be high. It doesn’t have to do with disasters there – it has to do with, say, planning and logistics costs that are done in those offices. At state capitals, it involves money that will be used by state and local agencies that is federal money.
  • We have to find a way to get pre-2007 if we want to use anything related to Hurricane Katrina.

One thing I don’t understand - how, for example, the “federal action obligation” field sum can be negative, while the “total dollars obligated” can be positive.

Let’s look at the Lousisiana in FY 2008 to see how it works. It may be double-counting by summing.

USA Spending defines them this way:

Federal Action Obligation
In plain English: Amount of Federal Government’s obligation, de-obligation, or liability, in dollars, for an award transaction. (SC: transaction level item)
Total dollars obligated
SC: I can’t find a definition. I’m worried this is award level, not transaction level. This appears to be the amount of the action for a parent contract, and I’m not sure if it’s total for that “action”, including all vendors, or what. But no matter what, it’s not right to add it up!

Look at La. 2008 to determine money values

Look at all dollar fields for Lousiana FY 2008 values.

fema_contracts_orig %>%
  filter (primary_place_of_performance_state_code == 'LA' &
          between (action_date, as.Date('2007-10-01'), as.Date('2008-09-30'))
          ) %>%
  select (parent_award_id, parent_award_agency_id, parent_award_modification_number, award_id_piid, modification_number, transaction_number, recipient_name,
          federal_action_obligation, total_dollars_obligated, base_and_all_options_value, potential_total_value_of_award,
          action_date, period_of_performance_potential_end_date) %>%
  mutate (difference = total_dollars_obligated - federal_action_obligation) %>%
  arrange (award_id_piid, modification_number, transaction_number) 

Which money value to use

Use federal action obligated for actual spending. Use IDV values for the potential value, or look them up.

Here’s what I found based on this query. It’s very confusing.

The contract 0016 (Phillips & Jordan, and Enviornmental Chem Corp) are hired under the DOD contract shown in the parent award (W912P805D0022). The federal obligation is how much went to that contract, even though it’s really kind of a subcontract. The amount in “total obligation” is for the parent contract, which could include lots of different vendors in this case.

Take a look at this parent contract in the usaspending.gov site to see what I mean: https://www.usaspending.gov/#/award/69333261

(I’ll get rid of this field in the smaller sets so that you don’t get confused. It will end up double-counting anything that has these kinds of split-up contracts, which I think is common.)

Next steps

I suggest going through an example item that you already know about – say, one that you had to write a report on for class – and then you can choose the fields you want to save.

We really need to get the years before 2007 if we’re going to include Louisiana (Hurricane Katrina). The reason is that this is the year they had to pay back money for work that was canceled or never done. There are a lot of negative entries for it, because that is money they never used – either they didn’t need it, they got it back because they cracked down on fraud, or something else.

Some thoughts:

  • Get the dataset down to size, but don’t just pick out the most obvious fields. Really look at all of them, and see what might be interesting. It may be easier to look at one you know on the USA Spending site. Here’s an example:
fema_contracts_orig %>%
  select ( state = primary_place_of_performance_state_name,
           recipient_name, 
          award_description,
          fed_dollars = federal_action_obligation, 
          solicitation_identifier,
          total_value = base_and_exercised_options_value, 
          action_date,
          award = award_id_piid) %>%
  filter (fed_dollars > 1000000 ,
          state == "CALIFORNIA",
          action_date > "2018-01-01") %>%
  arrange(desc(total_value)) 

So let’s look up that one – the MLU Services Inc in 2019-05-07, which is $53 million!

A couple of other things to notice:

  • The description SOMETIMES includes the disaster declaration. In this case, DR-4407. But not always!

  • Of course, this isn’t everything from MLU Corp. Only things that were in California and more than $1 million in 2019

  • Look it up in USA Spending as a way to see what all of the other fields are for. Example:

fema_contracts_orig %>%
  filter (award_id_piid %in% c("70FB8019F00000043", "70FB8018D00000013")) %>%
  select (award_id_piid:solicitation_identifier) 

There should be a way to build a search for it into the table, but I can’t quite figure out what’s happening on the USA spending page. Instead, you can go to the Advanced Search page, and copy and paste the award ID into the Award ID box, then search. If I do that, I get one answer, and that leads to this page:

https://www.usaspending.gov/#/award/80471094

This gives you – in English, or to the extent contracting language is IN English – what each field means. Your data is at the transaction level, this is at the “award” level. In other words, this is an award with a total amount over time, but each transaction is a payment that’s made for pieces of the work.

I like a couple of things about this – you can see what office it’s coming from, and also what agency awarded the contract. For example, FEMA might be funding things that are done by HHS.

Once you find something of interest, you might look it up in FedBizOpps. For example, after fumbling around in FedBiz opps and working backwards from this, I found this:

https://www.fbo.gov/index.php?s=opportunity&mode=form&tab=core&id=93e40f780bf619e4bc2ce67e4ceeb128&_cview=0

Note that at the bottom, this company got a $50 million (?) contract with FEMA after a bid protest kicked out the other company! This is actually a great little tip: The previous company, Timberline Homes of LA, seemed to have had problems in past performance. (This is for a different contract.)

I can explain a little about how you get this. There usually won’t be solicitation numbers on “award” items, just on “IDV” items. I can explain. I actually took a little course in federal contracting rules when I was a reporter at the Post because we couldn’t figure this stuff out!

  • Don’t worry for now about the sub-contractors, but when you find something you want to dig into, you might check for them. They’ll be linked by the award numbers. (There may be none. Only the largest contractors have to report them, and that has been relatively recently)

  • Try finding a way to isolate events, like Sandy, Katrina, Maria, etc. – there may be something in here that does link them, and that would be good.

  • See about contractors that have a lot of de-obligations. This is money they never spent or had to give back to the government for some reason. Don’t worry about NET (sum) amounts – just look for substantial (say, over $100,000) negative numbers. If they’re all in a time or place, you might find some problems with the state or the federal government at that time.

  • Look into contractors that have high IDV’s (that means they contracted for a lot of money) but were actually awarded very little of it. That suggests that they didn’t do the job.

You might look for telltale signs in the data, using Tribute Contracting LLC , which this Times story said never did the work.

fema_contracts_orig %>%
  filter (str_detect(recipient_name, "TRIBUTE CONTRACTING")) %>%
  glimpse()
## Observations: 20
## Variables: 261
## $ award_id_piid                                              <chr> "HSFE…
## $ modification_number                                        <chr> "0", …
## $ transaction_number                                         <dbl> NA, N…
## $ parent_award_agency_id                                     <dbl> NA, N…
## $ parent_award_agency_name                                   <chr> NA, N…
## $ parent_award_id                                            <chr> NA, N…
## $ parent_award_modification_number                           <chr> "0", …
## $ federal_action_obligation                                  <dbl> 500, …
## $ total_dollars_obligated                                    <dbl> NA, N…
## $ base_and_exercised_options_value                           <dbl> 0, 0,…
## $ current_total_value_of_award                               <dbl> NA, N…
## $ base_and_all_options_value                                 <dbl> 22000…
## $ potential_total_value_of_award                             <dbl> NA, N…
## $ action_date                                                <date> 2013…
## $ period_of_performance_start_date                           <dttm> 2013…
## $ period_of_performance_current_end_date                     <dttm> NA, …
## $ period_of_performance_potential_end_date                   <dttm> NA, …
## $ ordering_period_end_date                                   <date> 2018…
## $ awarding_agency_code                                       <chr> "070"…
## $ awarding_agency_name                                       <chr> "DEPA…
## $ awarding_sub_agency_code                                   <chr> "7022…
## $ awarding_sub_agency_name                                   <chr> "FEDE…
## $ awarding_office_code                                       <chr> "LOG7…
## $ awarding_office_name                                       <chr> "LOGI…
## $ funding_agency_code                                        <chr> "070"…
## $ funding_agency_name                                        <chr> "DEPA…
## $ funding_sub_agency_code                                    <dbl> 7022,…
## $ funding_sub_agency_name                                    <chr> "FEDE…
## $ funding_office_code                                        <chr> "LOG7…
## $ funding_office_name                                        <chr> "LOGI…
## $ foreign_funding                                            <chr> NA, N…
## $ foreign_funding_description                                <chr> "NOT …
## $ sam_exception                                              <chr> NA, N…
## $ sam_exception_description                                  <chr> "NAN"…
## $ recipient_duns                                             <chr> "0790…
## $ recipient_name                                             <chr> "TRIB…
## $ recipient_doing_business_as_name                           <chr> NA, N…
## $ cage_code                                                  <chr> NA, N…
## $ recipient_parent_name                                      <chr> "TRIB…
## $ recipient_parent_duns                                      <chr> "0790…
## $ recipient_country_code                                     <chr> "UNIT…
## $ recipient_country_name                                     <chr> NA, N…
## $ recipient_address_line_1                                   <chr> "1097…
## $ recipient_address_line_2                                   <chr> NA, N…
## $ recipient_city_name                                        <chr> "ATLA…
## $ recipient_state_code                                       <chr> "GA",…
## $ recipient_state_name                                       <chr> NA, N…
## $ recipient_zip_4_code                                       <chr> "3030…
## $ recipient_congressional_district                           <chr> "GA05…
## $ recipient_phone_number                                     <chr> "4048…
## $ recipient_fax_number                                       <chr> NA, N…
## $ primary_place_of_performance_country_code                  <chr> NA, N…
## $ primary_place_of_performance_country_name                  <chr> NA, N…
## $ primary_place_of_performance_city_name                     <chr> NA, N…
## $ primary_place_of_performance_county_name                   <chr> NA, N…
## $ primary_place_of_performance_state_code                    <chr> NA, N…
## $ primary_place_of_performance_state_name                    <chr> NA, N…
## $ primary_place_of_performance_zip_4                         <chr> NA, N…
## $ primary_place_of_performance_congressional_district        <chr> NA, N…
## $ award_or_idv_flag                                          <chr> "IDV"…
## $ award_type_code                                            <chr> NA, N…
## $ award_type                                                 <chr> NA, N…
## $ idv_type_code                                              <chr> "B", …
## $ idv_type                                                   <chr> "IDC"…
## $ multiple_or_single_award_idv_code                          <chr> "M", …
## $ multiple_or_single_award_idv                               <chr> "MULT…
## $ type_of_idc_code                                           <chr> NA, N…
## $ type_of_idc                                                <chr> "INDE…
## $ type_of_contract_pricing_code                              <chr> "J", …
## $ type_of_contract_pricing                                   <chr> "FIRM…
## $ award_description                                          <chr> "CLEA…
## $ action_type_code                                           <chr> NA, N…
## $ action_type                                                <chr> NA, N…
## $ solicitation_identifier                                    <chr> "HSFE…
## $ number_of_actions                                          <dbl> NA, N…
## $ inherently_governmental_functions                          <chr> NA, N…
## $ inherently_governmental_functions_description              <chr> NA, N…
## $ product_or_service_code                                    <chr> "7920…
## $ product_or_service_code_description                        <chr> "BROO…
## $ contract_bundling_code                                     <chr> "D", …
## $ contract_bundling                                          <chr> "NOT …
## $ dod_claimant_program_code                                  <chr> NA, N…
## $ dod_claimant_program_description                           <chr> NA, N…
## $ naics_code                                                 <dbl> 42385…
## $ naics_description                                          <chr> "SERV…
## $ recovered_materials_sustainability_code                    <chr> "E", …
## $ recovered_materials_sustainability                         <chr> "BIO-…
## $ domestic_or_foreign_entity_code                            <chr> "A", …
## $ domestic_or_foreign_entity                                 <chr> "U.S.…
## $ dod_acquisition_program_code                               <chr> NA, N…
## $ dod_acquisition_program_description                        <chr> NA, N…
## $ information_technology_commercial_item_category_code       <chr> NA, N…
## $ information_technology_commercial_item_category            <chr> NA, N…
## $ epa_designated_product_code                                <chr> NA, N…
## $ epa_designated_product                                     <chr> NA, N…
## $ country_of_product_or_service_origin_code                  <chr> "NAN"…
## $ country_of_product_or_service_origin                       <chr> NA, N…
## $ place_of_manufacture_code                                  <chr> NA, N…
## $ place_of_manufacture                                       <chr> NA, N…
## $ subcontracting_plan_code                                   <chr> "B", …
## $ subcontracting_plan                                        <chr> "PLAN…
## $ extent_competed_code                                       <chr> "D", …
## $ extent_competed                                            <chr> "FULL…
## $ solicitation_procedures_code                               <chr> "NP",…
## $ solicitation_procedures                                    <chr> "NEGO…
## $ type_of_set_aside_code                                     <chr> "SBA"…
## $ type_of_set_aside                                          <chr> "SMAL…
## $ evaluated_preference_code                                  <chr> "NONE…
## $ evaluated_preference                                       <chr> "NO P…
## $ research_code                                              <chr> NA, N…
## $ research                                                   <chr> "NAN"…
## $ fair_opportunity_limited_sources_code                      <chr> NA, N…
## $ fair_opportunity_limited_sources                           <chr> "NAN"…
## $ other_than_full_and_open_competition_code                  <chr> NA, N…
## $ other_than_full_and_open_competition                       <chr> NA, N…
## $ number_of_offers_received                                  <dbl> 3, 4,…
## $ commercial_item_acquisition_procedures_code                <chr> "A", …
## $ commercial_item_acquisition_procedures                     <chr> NA, N…
## $ small_business_competitiveness_demonstration_program       <lgl> FALSE…
## $ commercial_item_test_program_code                          <chr> "N", …
## $ commercial_item_test_program                               <chr> "NO",…
## $ a76_fair_act_action_code                                   <chr> "NO",…
## $ a76_fair_act_action                                        <chr> NA, N…
## $ fed_biz_opps_code                                          <chr> "Y", …
## $ fed_biz_opps                                               <chr> "YES"…
## $ local_area_set_aside_code                                  <chr> "N", …
## $ local_area_set_aside                                       <chr> NA, N…
## $ price_evaluation_adjustment_preference_percent_difference  <dbl> NA, N…
## $ clinger_cohen_act_planning_code                            <chr> "Y: Y…
## $ clinger_cohen_act_planning                                 <chr> NA, N…
## $ materials_supplies_articles_equipment_code                 <chr> "X", …
## $ materials_supplies_articles_equipment                      <chr> "NOT …
## $ labor_standards_code                                       <chr> "X", …
## $ labor_standards                                            <chr> "NOT …
## $ construction_wage_rate_requirements_code                   <chr> "X", …
## $ construction_wage_rate_requirements                        <chr> "NOT …
## $ interagency_contracting_authority_code                     <chr> "X", …
## $ interagency_contracting_authority                          <chr> "NOT …
## $ other_statutory_authority                                  <chr> NA, N…
## $ program_acronym                                            <chr> NA, N…
## $ parent_award_type_code                                     <chr> NA, N…
## $ parent_award_type                                          <chr> NA, N…
## $ parent_award_single_or_multiple_code                       <chr> NA, N…
## $ parent_award_single_or_multiple                            <chr> NA, N…
## $ major_program                                              <chr> NA, N…
## $ national_interest_action_code                              <chr> "NONE…
## $ national_interest_action                                   <chr> "NONE…
## $ cost_or_pricing_data_code                                  <chr> "N", …
## $ cost_or_pricing_data                                       <chr> "NO",…
## $ cost_accounting_standards_clause_code                      <chr> "X", …
## $ cost_accounting_standards_clause                           <chr> "NOT …
## $ gfe_gfp_code                                               <chr> "N", …
## $ gfe_gfp                                                    <chr> "TRAN…
## $ sea_transportation_code                                    <chr> NA, N…
## $ sea_transportation                                         <chr> NA, N…
## $ undefinitized_action_code                                  <chr> "X", …
## $ undefinitized_action                                       <chr> "NO",…
## $ consolidated_contract_code                                 <chr> "N", …
## $ consolidated_contract                                      <chr> "NO",…
## $ performance_based_service_acquisition_code                 <chr> "X", …
## $ performance_based_service_acquisition                      <chr> "NOT …
## $ multi_year_contract_code                                   <chr> "N", …
## $ multi_year_contract                                        <chr> "NO",…
## $ contract_financing_code                                    <chr> NA, N…
## $ contract_financing                                         <chr> NA, N…
## $ purchase_card_as_payment_method_code                       <chr> NA, N…
## $ purchase_card_as_payment_method                            <chr> NA, N…
## $ contingency_humanitarian_or_peacekeeping_operation_code    <chr> NA, N…
## $ contingency_humanitarian_or_peacekeeping_operation         <chr> NA, N…
## $ alaskan_native_owned_corporation_or_firm                   <lgl> FALSE…
## $ american_indian_owned_business                             <lgl> FALSE…
## $ indian_tribe_federally_recognized                          <lgl> FALSE…
## $ native_hawaiian_owned_business                             <lgl> FALSE…
## $ tribally_owned_business                                    <lgl> FALSE…
## $ veteran_owned_business                                     <lgl> FALSE…
## $ service_disabled_veteran_owned_business                    <lgl> FALSE…
## $ woman_owned_business                                       <lgl> TRUE,…
## $ women_owned_small_business                                 <lgl> FALSE…
## $ economically_disadvantaged_women_owned_small_business      <lgl> FALSE…
## $ joint_venture_women_owned_small_business                   <lgl> FALSE…
## $ joint_venture_economic_disadvantaged_women_owned_small_bus <lgl> FALSE…
## $ minority_owned_business                                    <lgl> TRUE,…
## $ subcontinent_asian_asian_indian_american_owned_business    <lgl> FALSE…
## $ asian_pacific_american_owned_business                      <lgl> FALSE…
## $ black_american_owned_business                              <lgl> TRUE,…
## $ hispanic_american_owned_business                           <lgl> FALSE…
## $ native_american_owned_business                             <lgl> FALSE…
## $ other_minority_owned_business                              <lgl> FALSE…
## $ contracting_officers_determination_of_business_size        <chr> "SMAL…
## $ contracting_officers_determination_of_business_size_code   <chr> "S", …
## $ emerging_small_business                                    <lgl> FALSE…
## $ community_developed_corporation_owned_firm                 <lgl> FALSE…
## $ labor_surplus_area_firm                                    <lgl> FALSE…
## $ us_federal_government                                      <lgl> FALSE…
## $ federally_funded_research_and_development_corp             <lgl> FALSE…
## $ federal_agency                                             <lgl> FALSE…
## $ us_state_government                                        <lgl> FALSE…
## $ us_local_government                                        <lgl> FALSE…
## $ city_local_government                                      <lgl> FALSE…
## $ county_local_government                                    <lgl> FALSE…
## $ inter_municipal_local_government                           <lgl> FALSE…
## $ local_government_owned                                     <lgl> FALSE…
## $ municipality_local_government                              <lgl> FALSE…
## $ school_district_local_government                           <lgl> FALSE…
## $ township_local_government                                  <lgl> FALSE…
## $ us_tribal_government                                       <lgl> FALSE…
## $ foreign_government                                         <lgl> FALSE…
## $ organizational_type                                        <chr> NA, N…
## $ corporate_entity_not_tax_exempt                            <lgl> FALSE…
## $ corporate_entity_tax_exempt                                <lgl> FALSE…
## $ partnership_or_limited_liability_partnership               <lgl> TRUE,…
## $ sole_proprietorship                                        <lgl> FALSE…
## $ small_agricultural_cooperative                             <lgl> FALSE…
## $ international_organization                                 <lgl> FALSE…
## $ us_government_entity                                       <lgl> FALSE…
## $ community_development_corporation                          <lgl> FALSE…
## $ domestic_shelter                                           <lgl> FALSE…
## $ educational_institution                                    <lgl> FALSE…
## $ foundation                                                 <lgl> FALSE…
## $ hospital_flag                                              <lgl> FALSE…
## $ manufacturer_of_goods                                      <lgl> FALSE…
## $ veterinary_hospital                                        <lgl> FALSE…
## $ hispanic_servicing_institution                             <lgl> FALSE…
## $ receives_contracts                                         <lgl> TRUE,…
## $ receives_grants                                            <lgl> TRUE,…
## $ receives_contracts_and_grants                              <lgl> TRUE,…
## $ airport_authority                                          <lgl> FALSE…
## $ council_of_governments                                     <lgl> FALSE…
## $ housing_authorities_public_tribal                          <lgl> FALSE…
## $ interstate_entity                                          <lgl> FALSE…
## $ planning_commission                                        <lgl> FALSE…
## $ port_authority                                             <lgl> FALSE…
## $ transit_authority                                          <lgl> FALSE…
## $ subchapter_scorporation                                    <lgl> FALSE…
## $ limited_liability_corporation                              <lgl> FALSE…
## $ foreign_owned_and_located                                  <lgl> FALSE…
## $ for_profit_organization                                    <lgl> TRUE,…
## $ nonprofit_organization                                     <lgl> FALSE…
## $ other_not_for_profit_organization                          <lgl> FALSE…
## $ the_ability_one_program                                    <lgl> FALSE…
## $ number_of_employees                                        <dbl> 1, 1,…
## $ annual_revenue                                             <dbl> 1000,…
## $ private_university_or_college                              <lgl> FALSE…
## $ state_controlled_institution_of_higher_learning            <lgl> FALSE…
## $ `1862_land_grant_college`                                  <lgl> FALSE…
## $ `1890_land_grant_college`                                  <lgl> FALSE…
## $ `1994_land_grant_college`                                  <lgl> FALSE…
## $ minority_institution                                       <lgl> FALSE…
## $ historically_black_college                                 <lgl> FALSE…
## $ tribal_college                                             <lgl> FALSE…
## $ alaskan_native_servicing_institution                       <lgl> FALSE…
## $ native_hawaiian_servicing_institution                      <lgl> FALSE…
## $ school_of_forestry                                         <lgl> FALSE…
## $ veterinary_college                                         <lgl> FALSE…
## $ dot_certified_disadvantage                                 <lgl> FALSE…
## $ self_certified_small_disadvantaged_business                <lgl> FALSE…
## $ small_disadvantaged_business                               <lgl> TRUE,…
## $ c8a_program_participant                                    <lgl> FALSE…
## $ historically_underutilized_business_zone_hubzone_firm      <lgl> FALSE…
## $ sba_certified_8a_joint_venture                             <lgl> FALSE…
## $ last_modified_date                                         <dttm> 2013…

Selecting a subset of columns

I put the results of the previous step into Excel, then parsed it to get the names of the fields. I create a column YN for whether we wanted to keep it, filtered just for them, and the renamed them in a big, long select statemet. At least this gets it down to 70-some fields, which is still a lot, but they also have shorter names for typing. Look at the file here called rename_select_contract_variables.xlsx to see how it’s done.

Here is a quote on how to create a unique ID for each transaction:

The unique_transaction_ids were assigned by the legacy database and, therefore, couldn’t be transferred over to the new USAspending.gov database. However, you can create your own by concatenation. For contracts, you’d need these 6 fields: award_id_piid, modification_number, transaction_number, parent_award_id, parent_award_modification_number, and awarding_sub_agency_code.

Also, the analyst’s guide mentioned above has some really important information. For example:

Finally, it’s worth noting that if all initially obligated money isn’t used, the last modification will display a negative amount. For example, if an initial contract was made for $10,000 and an agency only used $9,000 of that initial obligation, the last record would display an amount of -$1,000.

Create the unique id using those 6 fields, with the paste() command, separating the pieces with “-”. (It will have “-NA-” if it doesn’t exist, but we don’t really care. It’s still unique.)

Add a fiscal year to the dataset so we don’t have to keep doing it over and over. I also suggest that you split out IDV from AWARD data – they kind of have different things in them.

Crap - there are no FIP codes anywhere in the original data! Do they keep this from us on purpose?? How do they do their maps?? It’s possible that they are consistent enough that they can be used. Otherwise, have to geocode the city.

See the spreadsheet in this folder called

fema_all <- 
  fema_contracts_orig %>%

  mutate( unique_id = 
             paste(award_id_piid, 
                   modification_number, 
                   transaction_number, 
                   parent_award_id, 
                   parent_award_modification_number, 
                   awarding_sub_agency_code, sep="-") , 
          fiscal_year = ifelse(month(action_date) >= 10, year(action_date+1), year(action_date))
          ) %>%
  select (fiscal_year, 
         piid=award_id_piid,
         mod=modification_number,
         trans_num=transaction_number,
         parent_agency_id=parent_award_agency_id,
         parent_agency=parent_award_agency_name,
         parent_award_id=parent_award_id,
         parent_mod=parent_award_modification_number,
         fed_obligation=federal_action_obligation,
         exercised_value=base_and_exercised_options_value,
         current_value=current_total_value_of_award,
         total_value=base_and_all_options_value,
         potential_value=potential_total_value_of_award,
         action_date=action_date,
         start_date=period_of_performance_start_date,
         current_end_date=period_of_performance_current_end_date,
         potential_end_date=period_of_performance_potential_end_date,
         order_end_date=ordering_period_end_date,
         award_agency_id=awarding_agency_code,
         award_agency=awarding_agency_name,
         award_subagency_id=awarding_sub_agency_code,
         award_subagency=awarding_sub_agency_name,
         award_office_id=awarding_office_code,
         award_office=awarding_office_name,
         funding_agency_id=funding_agency_code,
         funding_agency=funding_agency_name,
         funding_subagency_id=funding_sub_agency_code,
         funding_subagency=funding_sub_agency_name,
         funding_office_id=funding_office_code,
         funding_office=funding_office_name,
         recip_duns=recipient_duns,
         recip_name=recipient_name,
         recip_dba=recipient_doing_business_as_name,
         recip_parent=recipient_parent_name,
         recip_parent_duns=recipient_parent_duns,
         recip_address=recipient_address_line_1,
         recip_city=recipient_city_name,
         recip_stabbr=recipient_state_code,
         recip_state=recipient_state_name,
         recip_zip4=recipient_zip_4_code,
         recip_cd=recipient_congressional_district,
         recip_phone=recipient_phone_number,
         pop_city=primary_place_of_performance_city_name,
         pop_county=primary_place_of_performance_county_name,
         pop_stabbr=primary_place_of_performance_state_code,
         pop_state=primary_place_of_performance_state_name,
         pop_zip4=primary_place_of_performance_zip_4,
         pop_cd=primary_place_of_performance_congressional_district,
         award_idv_flag=award_or_idv_flag,
         award_type=award_type,
         idv_type=idv_type,
         mult_award_idv=multiple_or_single_award_idv,
         type_idc=type_of_idc,
         type_pricing=type_of_contract_pricing,
         description=award_description,
         type_action=action_type,
         solicitation_id=solicitation_identifier,
         action_ct=number_of_actions,
         gov_function=inherently_governmental_functions_description,
         prod_code=product_or_service_code,
         prod_name=product_or_service_code_description,
         bundled=contract_bundling,
         naics_code=naics_code,
         naics_name=naics_description,
         subcontracting=subcontracting_plan,
         competed=extent_competed,
         solicitation_procedure=solicitation_procedures,
         set_aside=type_of_set_aside,
         preference=evaluated_preference,
         not_competed=other_than_full_and_open_competition,
         offers_received=number_of_offers_received,
         fed_biz_opps_yn=fed_biz_opps,
         acronym=program_acronym,
         national_interest=national_interest_action,
         multi_year=multi_year_contract,
         last_modified_date=last_modified_date, 
         unique_id)


#check to see that all of them are in the two words. - it matches
#the number of records in the dataset.
fema_all %>%
  filter(award_idv_flag %in% c("IDV", "AWARD")) %>%
  summarise (n())
fema_idv <- 
  fema_all %>%
filter(award_idv_flag == "IDV")

fema_award <-
  fema_all %>%
  filter (award_idv_flag == "AWARD")


#Now create a new R data file that contains both smaller tables. 
save(fema_idv, fema_award , file="usaspending_split.RData")
rm(list=c('fema_idv', 'fema_award', 'fema_all'))

Now you can just load them when you need them. This time look at what’s in the National Interest field, to see if we can use it to isolate specific important events.

load("usaspending_split.RData")

#notice that you might get some of what you want this way, but I wouldn't count on them to fill it out.

fema_idv %>%
  filter (national_interest != "NONE") %>%
  group_by (national_interest) %>%
  summarise( num_idvs = n(), num_contracts = n_distinct(piid)) 
fema_idv %>%
  filter (national_interest== "HURRICANE MARIA 2017")

Now look at the awards that came from it:

fema_award %>%
  filter (national_interest == "HURRICANE MARIA 2017") %>%
  arrange (action_date)