This markdown gives you instructions for importing and understanding the PPP data for your mini-memo. There are other background documents and example stories in the Canvas module that you can also review.
Download the data
To load the data into your Global Environment add this line to a code chunk:
data_url <- "https://github.com/cronkitedata/rstudyguide/blob/master/data/az_ppp.Rda?raw=true"
load (url ( data_url) )
That will create a data fram called az_ppp in your RStudio session.
Source
The original data was published by the SBA here https://sba.app.box.com/s/5myd1nxutoq8wxecx2562baruz774si6 on Feb. 1, 2021. I downloaded and unzipped the PPP portion, and read it into R using a different program.
The SBA has never released a record layout or data dictionary for this data that I can find. I worked on a project over the Christmas break using an earlier version of the data, so I think I understand much of what it contains.
I selected about half of the variables that are included in the original data, and simplified a few as shown below. I also converted all character fields to upper case to ease filtering, and converted the Zip Codes to 5-digits .
This dataset includes all loans that were made to companies in Arizona. A handful (6) of these are going toward projects in other states.
Here are the columns I kept, and what I understand them to mean. Any changes other than those above are indicated in the record layout.
The loan application form . There’s one important thing I don’t know the answer to: There are no rows with a loan statust that suggests that it was forgiven, though we know from reports that more than $100 billion have already been forgiven. There is a “loan status” field in the data, but that doesn’t have anything related to forgiveness, and the “PAID IN FULL” indicator doesn’t add up to nearly enough money. I’m trying to find out what the answer is there, but I wanted to warn you not to use that column until we get some clarity on it.
| id |
n |
A unique identifier supplied by the SBA |
| date_approved |
d |
Date the loan was initially approved |
| status_date |
d |
When the status was reported (NA if unreported) |
| borrower |
c |
Name of the borrower, all upper case |
| address |
c |
Address of the borrower |
| city |
c |
|
| state |
c |
|
| zip |
c |
5-digit Zip code of the borrower |
| loan_status |
c |
Blank if redacted because of EXEMPTION 4 - VERY unclear what this means! |
| term |
n |
Months to pay back |
| initial_amt |
n |
Initial amount approved by SBA |
| payroll_amt |
n |
Amount used for payroll |
| nonpayroll_amt |
n |
Total amount used for other costs |
| franchise |
c |
Name of a franchisee |
| lender |
c |
Name of originating lender (not servicing lender) |
| lender_state |
c |
|
| rural_urban |
c |
|
| business_age |
c |
How long the business had been open before the loan - |
| project_city |
c |
City where the money will be used |
| project_county |
c |
County where the money will be used |
| project_state |
c |
State where the county will be used |
| jobs_reported |
n |
Number of jobs “saved” by the loan |
| naics_code |
c |
Standard industry code used in the federal government |
| naics_sector |
c |
Sector of that standard industry code, where it was a valid one |
| naics_descript |
c |
2017 description of the industry, blank if a previous year’s version used |
| race_ethnicity |
c |
Race and ethnicity of the business owner, but usually not filled out |
| gender |
c |
Same with gender |
| veteran |
c |
Same with veteran |
| non_profit |
c |
Only filled out if the borrower is a non-profit organization |
LS0tCnRpdGxlOiAiUFBQIGxvYW5zIGluIEFyaXpvbmEiCmF1dGhvcjogIlNhcmFoIENvaGVuIgpkYXRlOiAiYHIgU3lzLkRhdGUoKWAiCm91dHB1dDogCiAgaHRtbF9kb2N1bWVudDoKICAgIHRoZW1lOiB5ZXRpCiAgICBoaWdobGlnaHQ6IGthdGUKICAgIHRvYzogdHJ1ZQogICAgdG9jX2Zsb2F0OiB0cnVlCiAgICBkZl9wcmludDogcGFnZWQKICAgIGNvZGVfZG93bmxvYWQ6IHRydWUKLS0tCgoKYGBge3Igc2V0dXAsIGluY2x1ZGU9RkFMU0UsIG1lc3NhZ2U9RkFMU0UsIHdhcm5pbmc9RkFMU0V9CgpsaWJyYXJ5KGtuaXRyKQpsaWJyYXJ5KHJtZGZvcm1hdHMpCgprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUpCgpsaWJyYXJ5KHRpZHl2ZXJzZSkKbGlicmFyeShqYW5pdG9yKQpsaWJyYXJ5KGx1YnJpZGF0ZSkKCgpvcHRpb25zKHNjaXBlbj05OTksIGRpZ2l0cz0xNSkKCmRhdGFfdXJsIDwtICJodHRwczovL2dpdGh1Yi5jb20vY3JvbmtpdGVkYXRhL3JzdHVkeWd1aWRlL2Jsb2IvbWFzdGVyL2RhdGEvYXpfcHBwLlJkYT9yYXc9dHJ1ZSIKbG9hZCAodXJsIChkYXRhX3VybCkgKQpgYGAKCgpUaGlzIG1hcmtkb3duIGdpdmVzIHlvdSBpbnN0cnVjdGlvbnMgZm9yIGltcG9ydGluZyBhbmQgdW5kZXJzdGFuZGluZyB0aGUgUFBQIGRhdGEgZm9yIHlvdXIgbWluaS1tZW1vLiBUaGVyZSBhcmUgb3RoZXIgYmFja2dyb3VuZCBkb2N1bWVudHMgYW5kIGV4YW1wbGUgc3RvcmllcyBpbiB0aGUgQ2FudmFzIG1vZHVsZSB0aGF0IHlvdSBjYW4gYWxzbyByZXZpZXcuIAoKIyMgRG93bmxvYWQgdGhlIGRhdGEgCgoKVG8gbG9hZCB0aGUgZGF0YSBpbnRvIHlvdXIgR2xvYmFsIEVudmlyb25tZW50IGFkZCB0aGlzIGxpbmUgdG8gYSBjb2RlIGNodW5rOiAKCiAgICAgICAgZGF0YV91cmwgPC0gImh0dHBzOi8vZ2l0aHViLmNvbS9jcm9ua2l0ZWRhdGEvcnN0dWR5Z3VpZGUvYmxvYi9tYXN0ZXIvZGF0YS9hel9wcHAuUmRhP3Jhdz10cnVlIgoKICAgICAgICBsb2FkICh1cmwgKCBkYXRhX3VybCkgKQogICAgICAgIAoKClRoYXQgd2lsbCBjcmVhdGUgYSBkYXRhIGZyYW0gY2FsbGVkIGBhel9wcHBgIGluIHlvdXIgUlN0dWRpbyBzZXNzaW9uLiAgCgoKCiMjIFNvdXJjZQoKVGhlIG9yaWdpbmFsIGRhdGEgd2FzIHB1Ymxpc2hlZCBieSB0aGUgU0JBIGhlcmUgPGh0dHBzOi8vc2JhLmFwcC5ib3guY29tL3MvNW15ZDFueHV0b3E4d3hlY3gyNTYyYmFydXo3NzRzaTY+IG9uIEZlYi4gMSwgMjAyMS4gSSBkb3dubG9hZGVkIGFuZCB1bnppcHBlZCB0aGUgUFBQIHBvcnRpb24sIGFuZCByZWFkIGl0IGludG8gUiB1c2luZyBhIGRpZmZlcmVudCBwcm9ncmFtLiAKClRoZSBTQkEgaGFzIG5ldmVyIHJlbGVhc2VkIGEgcmVjb3JkIGxheW91dCBvciBkYXRhIGRpY3Rpb25hcnkgZm9yIHRoaXMgZGF0YSB0aGF0IEkgY2FuIGZpbmQuIEkgd29ya2VkIG9uIGEgcHJvamVjdCBvdmVyIHRoZSBDaHJpc3RtYXMgYnJlYWsgdXNpbmcgYW4gZWFybGllciB2ZXJzaW9uIG9mIHRoZSBkYXRhLCBzbyBJICp0aGluayogSSB1bmRlcnN0YW5kIG11Y2ggb2Ygd2hhdCBpdCBjb250YWlucy4gCgpJIHNlbGVjdGVkIGFib3V0IGhhbGYgb2YgdGhlIHZhcmlhYmxlcyB0aGF0IGFyZSBpbmNsdWRlZCBpbiB0aGUgb3JpZ2luYWwgZGF0YSwgYW5kIHNpbXBsaWZpZWQgYSBmZXcgYXMgc2hvd24gYmVsb3cuIEkgYWxzbyBjb252ZXJ0ZWQgYWxsIGNoYXJhY3RlciBmaWVsZHMgdG8gdXBwZXIgY2FzZSB0byBlYXNlIGZpbHRlcmluZywgYW5kIGNvbnZlcnRlZCB0aGUgWmlwIENvZGVzIHRvIDUtZGlnaXRzIC4gCgpUaGlzIGRhdGFzZXQgaW5jbHVkZXMgYWxsIGxvYW5zIHRoYXQgd2VyZSBtYWRlIHRvIGNvbXBhbmllcyBpbiBBcml6b25hLiBBIGhhbmRmdWwgKDYpIG9mIHRoZXNlIGFyZSBnb2luZyB0b3dhcmQgcHJvamVjdHMgaW4gb3RoZXIgc3RhdGVzLiAgIAoKSGVyZSBhcmUgdGhlIGNvbHVtbnMgSSBrZXB0LCBhbmQgd2hhdCBJIHVuZGVyc3RhbmQgdGhlbSB0byBtZWFuLiBBbnkgY2hhbmdlcyBvdGhlciB0aGFuIHRob3NlIGFib3ZlIGFyZSBpbmRpY2F0ZWQgaW4gdGhlIHJlY29yZCBsYXlvdXQuIAoKClRoZSBbbG9hbiBhcHBsaWNhdGlvbiBmb3JtXShodHRwczovL3d3dy5zYmEuZ292L3NpdGVzL2RlZmF1bHQvZmlsZXMvMjAyMC0wNC9QUFAtQm9ycm93ZXItQXBwbGljYXRpb24tRm9ybS1GaWxsYWJsZS5wZGYpIC4gVGhlcmUncyBvbmUgaW1wb3J0YW50IHRoaW5nIEkgZG9uJ3Qga25vdyB0aGUgYW5zd2VyIHRvOiBUaGVyZSBhcmUgbm8gcm93cyB3aXRoIGEgbG9hbiBzdGF0dXN0IHRoYXQgc3VnZ2VzdHMgdGhhdCBpdCB3YXMgZm9yZ2l2ZW4sIHRob3VnaCB3ZSBrbm93IGZyb20gcmVwb3J0cyB0aGF0IG1vcmUgdGhhbiAkMTAwIGJpbGxpb24gaGF2ZSBhbHJlYWR5IGJlZW4gZm9yZ2l2ZW4uIFRoZXJlIGlzIGEgImxvYW4gc3RhdHVzIiBmaWVsZCBpbiB0aGUgZGF0YSwgYnV0IHRoYXQgZG9lc24ndCBoYXZlIGFueXRoaW5nIHJlbGF0ZWQgdG8gZm9yZ2l2ZW5lc3MsIGFuZCB0aGUgIlBBSUQgSU4gRlVMTCIgaW5kaWNhdG9yIGRvZXNuJ3QgYWRkIHVwIHRvIG5lYXJseSBlbm91Z2ggbW9uZXkuIEknbSB0cnlpbmcgdG8gZmluZCBvdXQgd2hhdCB0aGUgYW5zd2VyIGlzIHRoZXJlLCBidXQgSSB3YW50ZWQgdG8gd2FybiB5b3Ugbm90IHRvIHVzZSB0aGF0IGNvbHVtbiB1bnRpbCB3ZSBnZXQgc29tZSBjbGFyaXR5IG9uIGl0LiAKCjxkaXYgc3R5bGU9IndpZHRoOjcwJSI+Cgpjb2x1bW4gbmFtZSAgICAgICB8IHR5cGUgfCBkZXNjcmlwdGlvbgotLS0tLS0tLS0tLS0tLS0tLS18IC0tLS0gfCAtLS0tLS0tLS0tLQppZCAgICAgICAgICAgICAgICB8ICBuICB8IEEgdW5pcXVlIGlkZW50aWZpZXIgc3VwcGxpZWQgYnkgdGhlIFNCQQpkYXRlX2FwcHJvdmVkICAgICB8ICBkICB8IERhdGUgdGhlIGxvYW4gd2FzIGluaXRpYWxseSBhcHByb3ZlZApzdGF0dXNfZGF0ZSAgICAgICB8ICBkICB8IFdoZW4gdGhlIHN0YXR1cyB3YXMgcmVwb3J0ZWQgKE5BIGlmIHVucmVwb3J0ZWQpCmJvcnJvd2VyICAgICAgICAgIHwgIGMgIHwgTmFtZSBvZiB0aGUgYm9ycm93ZXIsIGFsbCB1cHBlciBjYXNlCmFkZHJlc3MgICAgICAgICAgIHwgIGMgIHwgQWRkcmVzcyBvZiB0aGUgYm9ycm93ZXIKY2l0eSAgICAgICAgICAgICAgfCAgYyAgfApzdGF0ZSAgICAgICAgICAgICB8ICBjICB8CnppcCAgICAgICAgICAgICAgIHwgIGMgIHwgNS1kaWdpdCBaaXAgY29kZSBvZiB0aGUgYm9ycm93ZXIKbG9hbl9zdGF0dXMgICAgICAgfCAgYyAgfCBCbGFuayBpZiByZWRhY3RlZCBiZWNhdXNlIG9mIEVYRU1QVElPTiA0IC0gVkVSWSB1bmNsZWFyIHdoYXQgdGhpcyBtZWFucyEgCnRlcm0gICAgICAgICAgICAgIHwgIG4gIHwgTW9udGhzIHRvIHBheSBiYWNrCmluaXRpYWxfYW10ICAgICAgIHwgIG4gIHwgSW5pdGlhbCBhbW91bnQgYXBwcm92ZWQgYnkgU0JBCnBheXJvbGxfYW10ICAgICAgIHwgIG4gIHwgQW1vdW50IHVzZWQgZm9yIHBheXJvbGwKbm9ucGF5cm9sbF9hbXQgICAgfCAgbiAgfCBUb3RhbCBhbW91bnQgdXNlZCBmb3Igb3RoZXIgY29zdHMKZnJhbmNoaXNlICAgICAgICAgfCAgYyAgfCBOYW1lIG9mIGEgZnJhbmNoaXNlZSAKbGVuZGVyICAgICAgICAgICAgfCAgYyAgfCBOYW1lIG9mIG9yaWdpbmF0aW5nIGxlbmRlciAobm90IHNlcnZpY2luZyBsZW5kZXIpCmxlbmRlcl9zdGF0ZSAgICAgIHwgIGMgIHwKcnVyYWxfdXJiYW4gICAgICAgfCAgYyAgfApidXNpbmVzc19hZ2UgICAgICB8ICBjICB8IEhvdyBsb25nIHRoZSBidXNpbmVzcyBoYWQgYmVlbiBvcGVuIGJlZm9yZSB0aGUgbG9hbiAtIApwcm9qZWN0X2NpdHkgICAgICB8ICBjICB8IENpdHkgd2hlcmUgdGhlIG1vbmV5IHdpbGwgYmUgdXNlZApwcm9qZWN0X2NvdW50eSAgICB8ICBjICB8IENvdW50eSB3aGVyZSB0aGUgbW9uZXkgd2lsbCBiZSB1c2VkCnByb2plY3Rfc3RhdGUgICAgIHwgIGMgIHwgU3RhdGUgd2hlcmUgdGhlIGNvdW50eSB3aWxsIGJlIHVzZWQKam9ic19yZXBvcnRlZCAgICAgfCAgbiAgfCBOdW1iZXIgb2Ygam9icyAic2F2ZWQiIGJ5IHRoZSBsb2FuCm5haWNzX2NvZGUgICAgICAgIHwgIGMgIHwgU3RhbmRhcmQgaW5kdXN0cnkgY29kZSB1c2VkIGluIHRoZSBmZWRlcmFsIGdvdmVybm1lbnQKbmFpY3Nfc2VjdG9yICAgICAgfCAgYyAgfCBTZWN0b3Igb2YgdGhhdCBzdGFuZGFyZCBpbmR1c3RyeSBjb2RlLCB3aGVyZSBpdCB3YXMgYSB2YWxpZCBvbmUKbmFpY3NfZGVzY3JpcHQgICAgfCAgYyAgfCAyMDE3IGRlc2NyaXB0aW9uIG9mIHRoZSBpbmR1c3RyeSwgYmxhbmsgaWYgYSBwcmV2aW91cyB5ZWFyJ3MgdmVyc2lvbiB1c2VkCnJhY2VfZXRobmljaXR5ICAgIHwgIGMgIHwgUmFjZSBhbmQgZXRobmljaXR5IG9mIHRoZSBidXNpbmVzcyBvd25lciwgYnV0IHVzdWFsbHkgbm90IGZpbGxlZCBvdXQKZ2VuZGVyICAgICAgICAgICAgfCAgYyAgfCBTYW1lIHdpdGggZ2VuZGVyCnZldGVyYW4gICAgICAgICAgIHwgIGMgIHwgU2FtZSB3aXRoIHZldGVyYW4Kbm9uX3Byb2ZpdCAgICAgICAgfCAgYyAgfCBPbmx5IGZpbGxlZCBvdXQgaWYgdGhlIGJvcnJvd2VyIGlzIGEgbm9uLXByb2ZpdCBvcmdhbml6YXRpb24KIAo8L2Rpdj4KCgo=