Project 2.0

Author

D Devkota

Project 2

NYC Public WiFi Hotspots

This project 2 analyzes the distribution of public Wi-Fi hotspots across the New York City to understand how internet access varies by location. The dataset includes categorical variables like borough, provider, and location type, along with quantitative variables such as latitude and longitude for mapping. This topic is meaningful because access to public Wi-Fi is important for everyday connectivity and digital access in current era.

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.2.0     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.5     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(leaflet)
library(viridis)
Loading required package: viridisLite
wifi <- readr::read_csv("Project 2.csv")
Rows: 3319 Columns: 29
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (14): Type, Provider, Name, Location, Location_T, Remarks, City, SSID, S...
dbl (13): OBJECTID, Borough, Latitude, Longitude, BoroCode, Council Distrcit...
num  (2): X, Y

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(wifi)
Rows: 3,319
Columns: 29
$ OBJECTID                                      <dbl> 10604, 10555, 12370, 989…
$ Borough                                       <dbl> 4, 4, 3, 3, 1, 4, 1, 3, …
$ Type                                          <chr> "Limited Free", "Limited…
$ Provider                                      <chr> "SPECTRUM", "SPECTRUM", …
$ Name                                          <chr> "Baisley Pond Park", "Ki…
$ Location                                      <chr> "Park Perimeter", "Park …
$ Latitude                                      <dbl> 40.67486, 40.74756, 40.7…
$ Longitude                                     <dbl> -73.78412, -73.81815, -7…
$ X                                             <dbl> 1044131.9, 1034637.5, 10…
$ Y                                             <dbl> 185219.9, 211685.2, 1986…
$ Location_T                                    <chr> "Outdoor TWC Aerial", "O…
$ Remarks                                       <chr> "3 free 10 min sessions"…
$ City                                          <chr> "Queens", "Queens", "Bro…
$ SSID                                          <chr> "GuestWiFi", "GuestWiFi"…
$ SourceID                                      <chr> "0", "0", NA, NA, NA, "0…
$ Activated                                     <chr> "09/09/9999", "09/09/999…
$ BoroCode                                      <dbl> 4, 4, 3, 3, 1, 4, 1, 3, …
$ `Borough Name`                                <chr> "Queens", "Queens", "Bro…
$ `Neighborhood Tabulation Area Code (NTACODE)` <chr> "QN02", "QN22", "BK90", …
$ `Neighborhood Tabulation Area (NTA)`          <chr> "Springfield Gardens Nor…
$ `Council Distrcit`                            <dbl> 28, 20, 34, 33, 4, 20, 9…
$ Postcode                                      <dbl> 11434, 11355, 11206, 112…
$ BoroCD                                        <dbl> 412, 407, 301, 302, 108,…
$ `Census Tract`                                <dbl> 294, 845, 495, 9, 120, 1…
$ BCTCB2010                                     <dbl> 294, 845, 495, 9, 120, 1…
$ BIN                                           <dbl> 0, 0, 0, 3388736, 0, 0, …
$ BBL                                           <dbl> 0, 0, 0, 3002777501, 0, …
$ DOITT_ID                                      <dbl> 1408, 1359, 1699, 298, 5…
$ `Location (Lat, Long)`                        <chr> "(40.6748599999, -73.784…
head(wifi)
# A tibble: 6 × 29
  OBJECTID Borough Type        Provider Name  Location Latitude Longitude      X
     <dbl>   <dbl> <chr>       <chr>    <chr> <chr>       <dbl>     <dbl>  <dbl>
1    10604       4 Limited Fr… SPECTRUM Bais… Park Pe…     40.7     -73.8 1.04e6
2    10555       4 Limited Fr… SPECTRUM Kiss… Park Pe…     40.7     -73.8 1.03e6
3    12370       3 Free        Transit… Gran… Grand S…     40.7     -73.9 1.00e6
4     9893       3 Free        Downtow… <NA>  125 Cou…     40.7     -74.0 9.86e5
5    10169       1 Free        Transit… Lexi… Lexingt…     40.8     -74.0 9.94e5
6    10880       4 Limited Fr… SPECTRUM Kiss… Park Pe…     40.7     -73.8 1.04e6
# ℹ 20 more variables: Y <dbl>, Location_T <chr>, Remarks <chr>, City <chr>,
#   SSID <chr>, SourceID <chr>, Activated <chr>, BoroCode <dbl>,
#   `Borough Name` <chr>, `Neighborhood Tabulation Area Code (NTACODE)` <chr>,
#   `Neighborhood Tabulation Area (NTA)` <chr>, `Council Distrcit` <dbl>,
#   Postcode <dbl>, BoroCD <dbl>, `Census Tract` <dbl>, BCTCB2010 <dbl>,
#   BIN <dbl>, BBL <dbl>, DOITT_ID <dbl>, `Location (Lat, Long)` <chr>
wifi %>% count(`Borough Name`, sort = TRUE)
# A tibble: 5 × 2
  `Borough Name`     n
  <chr>          <int>
1 Manhattan       1672
2 Brooklyn         700
3 Queens           531
4 Bronx            316
5 Staten Island    100
wifi %>% count(Type, sort = TRUE)
# A tibble: 3 × 2
  Type             n
  <chr>        <int>
1 Free          2736
2 Limited Free   581
3 Partner Site     2
wifi %>% count(Provider, sort = TRUE) %>% 
  slice_head(n = 10)
# A tibble: 10 × 2
   Provider                    n
   <chr>                   <int>
 1 LinkNYC - Citybridge     1868
 2 SPECTRUM                  343
 3 Transit Wireless          276
 4 ALTICEUSA                 237
 5 Harlem                    101
 6 Downtown Brooklyn         100
 7 NYPL                       90
 8 QPL                        65
 9 BPL                        59
10 Manhattan Down Alliance    36
summary(wifi$Latitude)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  40.51   40.70   40.75   40.74   40.79   40.90 
summary(wifi$Longitude)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -74.24  -73.99  -73.96  -73.95  -73.93  -73.71 
ggplot(wifi, aes(x = `Borough Name`)) +
  geom_bar(fill = "steelblue") +
  labs(title ="Hotspots per Borough (Exploration)",
       x = "Borough", y = "Count") +
  theme_minimal()

ggplot(wifi, aes(x = Type)) +
  geom_bar(fill = "coral") +
  labs(title = "WiFi Type Distribution (Exploration)",
       x = "Type", y = "Count") +
  theme_minimal()

wifi_filtered <- wifi %>%
  filter(Type == "Free") %>%
  filter(!is.na(Latitude), !is.na(Longitude)) %>%
  arrange(`Borough Name`) %>%
  slice_head(n = 900)
nrow(wifi_filtered)
[1] 900
wifi_filtered %>%
  group_by(`Borough Name`) %>%
  summarize(
    Count = n(),
    Providers = n_distinct(Provider)
  ) %>%
  arrange(desc(Count))
# A tibble: 3 × 3
  `Borough Name` Count Providers
  <chr>          <int>     <int>
1 Brooklyn         540         7
2 Bronx            196         4
3 Manhattan        164         8
wifi_plot <- wifi_filtered %>%
  mutate(
    Provider_Group = case_when(
      Provider == "LinkNYC - Citybridge" ~ "LinkNYC",
      Provider == "Transit Wireless"     ~ "Transit Wireless",
      Provider == "SPECTRUM"             ~ "Spectrum",
      TRUE                               ~ "Other"
    )
  ) %>%
  group_by(`Borough Name`, Provider_Group) %>%
  summarize(Count = n(), .groups = "drop")
ggplot(wifi_plot, aes(x = reorder(`Borough Name`, -Count),
                      y = Count,
                      fill = Provider_Group)) +
  geom_col(position = "stack", color = "white", linewidth = 0.3) +
  scale_fill_viridis_d(option = "plasma", begin = 0.1, end = 0.9,
                       name = "WiFi Provider") +
  annotate("text",
           x = 1, y = 620,
           label = "LinkNYC leads\nwith most hotspots",
           size = 3.5, color = "gray20", fontface = "italic") +
  annotate("segment",
           x = 1, xend = 1, y = 595, yend = 500,
           arrow = arrow(length = unit(0.2, "cm")),
           color = "gray40") +
  labs(
    title = "Free Public WiFi Hotspots Across NYC Boroughs by Provider",
    x = "Borough",
    y = "Number of Hotspots",
  )

pal <- colorFactor(
  palette = c("#E41A1C", "#377EB8", "#4DAF4A", "#FF7F00", "#984EA3"),
  domain = wifi_filtered$`Borough Name`
)
leaflet(wifi_filtered) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addCircleMarkers(
    lng = ~Longitude,
    lat = ~Latitude,
    radius = 5,
    color = ~pal(`Borough Name`),
    stroke = FALSE,
    fillOpacity = 0.8,
    popup = ~paste(
      "<b>", ifelse(is.na(Name), "Unnamed Hotspot", Name), "</b><br>",
      "Borough: ", `Borough Name`, "<br>",
      "Provider: ", Provider, "<br>",
      "Type: ", Type, "<br>",
      "Location: ", Location_T
    )
  ) %>%
  addLegend(
    position = "bottomright",
    pal = pal,
    values = ~`Borough Name`,
    title = "Borough",
    opacity = 0.9
  ) %>%
  setView(lng = -73.97, lat = 40.73, zoom = 11)

#1 This project looks at public Wi-Fi hotspots in New York City to see how internet access is spread across different areas. The dataset includes variables like borough, provider, and location type, which are text variables, and latitude and longitude, which are numbers used for mapping. I cleaned the data by removing missing values, fixing text formatting, and selecting only some boroughs and Wi-Fi types to make the dataset smaller and easier to analyze. I chose this topic because Wi-Fi is important for daily life, and I wanted to see how access is different across the busiest city.

#2 The bar chart shows how many Wi-Fi hotspots are in each borough, so we can compare which areas have more access. The map shows where these Wi-Fi hotspots are located in the city. It helps us see that hotspots are mostly in busy and crowded areas. One thing I noticed is that some areas have many more hotspots than others. I also wish I could have added internet speed or usage data, but that information was not available in my csv.

#3 I used a public NYC dataset from NYC Open Data and used cheatsheet from https://opensource.posit.co/resources/cheatsheets/