This project is for Assignment#0 as the starting deliverable for Data 607.

Pre- Coding Approach

For this assignment I am going to go over the basics to upload and load a data frame for reproducibility and general use. I plan on using a data set from NYC open data as they tend to be simpler to comprehend and data often comes with a clear data dictionary unlike a lot of other sources. The data set I plan to use is NYC Wifi Hot Spot Locations.

I will download the file on my local drive,then use filter, arrange functions to organize the data. Then i will drop an “unnecessary” columns such as the X, Y coordinates, Community Board numbers, NTA(Neighborhood Tabulation Area) and any non-descriptive identifiers. They are important but they do not display the information we need to understand the overview of the distribution of WiFi spots.

# Loading Essential Libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.5
## ✔ ggplot2   4.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.4     ✔ tibble    3.2.1
## ✔ purrr     1.0.4     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Introduction to Data

This data set is an overview of the available Wifi hotspots that individuals will be able to explore around the city.The data contains 3319 observations and 29 identifying categories. It was collect and placed on NYC open data. The last update as of this assignment was September 23 2002.

Updated Purpose

For this assignment, I focused on pointing out the Wifispot that are accessible in Brooklyn. I have notice while in the city I use more of my mobile data than I expected. So it would be interesting to see how many places I could use wifi for free in Brooklyn and if that is a alternative to using data on my phone.

#Upload Dataset from Github

Wifi_NYC <- read_csv("https://raw.githubusercontent.com/Mayneman000/DATA607Assignment/refs/heads/DATA/NYC_Wi-Fi_Hotspot_Locations_20260125.csv")
## Rows: 3319 Columns: 29
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): Type, Provider, Name, Location, Location_T, Remarks, City, SSID, S...
## dbl (13): OBJECTID, Borough, Latitude, Longitude, BoroCode, Council Distrcit...
## num  (2): X, Y
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
show_col_types = FALSE
#Revealing the start of the Dataset
head(Wifi_NYC)
## # A tibble: 6 × 29
##   OBJECTID Borough Type        Provider Name  Location Latitude Longitude      X
##      <dbl>   <dbl> <chr>       <chr>    <chr> <chr>       <dbl>     <dbl>  <dbl>
## 1    10604       4 Limited Fr… SPECTRUM Bais… Park Pe…     40.7     -73.8 1.04e6
## 2    10555       4 Limited Fr… SPECTRUM Kiss… Park Pe…     40.7     -73.8 1.03e6
## 3    12370       3 Free        Transit… Gran… Grand S…     40.7     -73.9 1.00e6
## 4     9893       3 Free        Downtow… <NA>  125 Cou…     40.7     -74.0 9.86e5
## 5    10169       1 Free        Transit… Lexi… Lexingt…     40.8     -74.0 9.94e5
## 6    10880       4 Limited Fr… SPECTRUM Kiss… Park Pe…     40.7     -73.8 1.04e6
## # ℹ 20 more variables: Y <dbl>, Location_T <chr>, Remarks <chr>, City <chr>,
## #   SSID <chr>, SourceID <chr>, Activated <chr>, BoroCode <dbl>,
## #   `Borough Name` <chr>, `Neighborhood Tabulation Area Code (NTACODE)` <chr>,
## #   `Neighborhood Tabulation Area (NTA)` <chr>, `Council Distrcit` <dbl>,
## #   Postcode <dbl>, BoroCD <dbl>, `Census Tract` <dbl>, BCTCB2010 <dbl>,
## #   BIN <dbl>, BBL <dbl>, DOITT_ID <dbl>, `Location (Lat, Long)` <chr>

Filtering Data

As mentioned in my pre-coding approach. I wish to clear and filter this data and remove columns in order to make it easier for us to view our dataset. Although considering how big our dataset is, it will make more sense to focus on the columns I wish to keep instead. This also helps to address the large amount of duplicate columns provided in the raw data such a Boro Name, Boro Code, City, etc.

#Keeping only Necessary Columns 
Wifi_NYCSimple <- Wifi_NYC %>% select(Type, Name, Location, City, Location_T, SSID, Provider)
#Sorting Data for Visual Clarity (Limiting Site to Brooklyn)

Wifi_Brooklyn <- Wifi_NYCSimple %>%
  filter(City == "Brooklyn", Type == "Free")

Thus by filtering the data we get the final count of free WiFi spots in Brooklyn and the number of providers present. Equalling a total of 542 spots in total that are free and LinkNYC responsible for almost half of them.

count(Wifi_Brooklyn)
## # A tibble: 1 × 1
##       n
##   <int>
## 1   542
Wifi_Brooklyn %>%
  count(Provider) %>%
  arrange(n)
## # A tibble: 7 × 2
##   Provider                 n
##   <chr>                <int>
## 1 AT&T                     7
## 2 City Tech               11
## 3 NYCHA                   28
## 4 BPL                     59
## 5 Transit Wireless        80
## 6 Downtown Brooklyn      100
## 7 LinkNYC - Citybridge   257

Conclusions

As for the future prospects, it would be interesting to be able to place these wifi spots that I have filtered out onto a map , assuming that it wasn’t already completed. It can prove as a visual indicator of the concentration of wifi spots, because I believe that the present spots are not evenly distributed. There is also a possibility to update this finding in this assignment if the original dataset can get updated again this year.

End of Report

```