bos_311 <- read_csv("311 Cases 2020_2024 Unrestricted.csv")
## Rows: 1671860 Columns: 31
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (8): SUBJECT, REASON, TYPE, LOCATION, propid, SOURCE, NSA_NAME, BRA_PD
## dbl  (21): CASE_ENQUIRY_ID, X, Y, LocationID, ObjectID, TLID, BLK_ID_10, BG_...
## date  (2): OPEN_DT, CLOSED_DT
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Command to important the data file (saved in the same folder) Into RStudio.

bos_311 <- read_csv("311 Cases 2020_2024 Unrestricted.csv")
## Rows: 1671860 Columns: 31
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr   (8): SUBJECT, REASON, TYPE, LOCATION, propid, SOURCE, NSA_NAME, BRA_PD
## dbl  (21): CASE_ENQUIRY_ID, X, Y, LocationID, ObjectID, TLID, BLK_ID_10, BG_...
## date  (2): OPEN_DT, CLOSED_DT
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

This command creates the object bos_311, which is “assigned” the values of the csv file with our data.

{> View (bos_311)} This command enables us to view bos_311 (which contains our data) as a table.

{> dim (bos_311)} This shows us the number of rows (the number of individual records) and the number of columns (the number of variables) in the data set. The result of this command was: [1] 1671860 31 (rows and columns respectively).

{>names(bos_311)} This gives us the names of each of the columns (the variables) in the dataset. These include: [1] “CASE_ENQUIRY_ID” “OPEN_DT” “CLOSED_DT” “SUBJECT” “REASON” “TYPE” “LOCATION”
[8] “propid” “SOURCE” “X” “Y” “LocationID” “ObjectID” “TLID”
[15] “BLK_ID_10” “BG_ID_10” “CT_ID_10” “BLK_ID_20” “BG_ID_20” “CT_ID_20” “NSA_NAME”
[22] “BRA_PD” “PUBLIC” “HOUSING” “UNCIVILUSE” “BIGBUILD” “GRAFFITI” “TRASH”
[29] “PRIVATENEGLECT” “PUBLICDENIG” “PROBLEM”
Some of these are understandable by name (for example, LOCATION gives us the address of a specific case) while some require exploration. After this, I did some manual “fiddling” with the dataset to understand what column meant what - some of them were binary (particularly the “descriptors” - such as TRASH, GRAFFITI, BIGBUILD, etc, which just tell you if that variable is true or not in a given record) while some of them, such as LOCATION, where character-based variables.

My initial thinking was to use the BRA_PD column to get the numbers of cases by neighborhood using a command, and then create another object which focuses on cases in West Roxbury only - I executed several commands for this, but changed course eventually to focus on Blue Hill Ave through the search bar in the bos_311 table.

Manually oberserving the Blue Hill Ave cases, I found one particular address (1480 Blue Hill Ave) to have three different constituent calls; I proceed to pull the cases up using the folllowing line of code.

{bos_311[c(540368:540370), ]}

This showed me the profile of the three cases in the console, which I will try to summarise chronologically - 1. Case 540368. Opened on February 5th, it was closed on November 30th! A building inspection request. 2. Case 540370 (non-chronological in terms of timelines). Opened on the 7th of April, it was closed on the 14th of April - another building inspection request. 3. Case 540369. Opened on the 30th of July, it was closed on the 27th of October - an electrical issue.

While the 1st and 3rd cases (the building inspection requests) are marked in the “PUBLIC” column with a 1 (meaning that they were registered that way), the electrical request has no marking.

At first glance, this particular building seems to have recurring issues, the response times for which are not particularly apt.

Looking online, the address seems to be the location of a car wash (currently), and so the Building Inspection Requests may be from the authorities meaning to inspect the business. While I am not sure why the first case was closed at a significantly later date, this could very well be an administrative error.

Looking at this case in particular, it seems like it is difficult to judge the entire story based on the data alone. While I know that it is not a residential building (registered as a 0 under the HOUSING column), I did not know it was a business. Applying this to a larger scale, making inferences from large datasets without crucial data points such as whether it is a business or not may be helpful in making decisions, although I do wonder at the thinking behind the current categorization (illegal usage, graffiti, trash, neglect, etc.) Additionally, having a BUSINESS column, or something to the effect, may help to highlight the frequency of complaints that are from businesses/related to businesses, as well as trends regarding the challenges that businesses may face.