The goal of this analysis is to answer the research question: Which division (North, Central, or South) has the highest number of soccer fields, and how does field lighting vary across them?
The dataset used for this project is Soccer Fields Dataset, which includes information about soccer fields such as their location, division, field surface, lighting availability, and other characteristics.
For this analysis, the following columns are important: Division: Indicates whether the field belongs to the North, Central, or South division. Field Name: Identifies each soccer field. Lighting: Shows whether a field has lighting (Yes or No).
This dataset helps explore how soccer fields are distributed across different divisions and whether lighting availability differs by region.
# Load required libraries
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.2
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
# Import dataset
soccer <- read.csv("Soccer_Fields_-6381694660147513766 (1).csv")
# View dataset structure
glimpse(soccer)
## Rows: 84
## Columns: 21
## $ OBJECTID <int> 9795, 9796, 9797, 9798, 9799, 9800, 9801, 9…
## $ NAME <chr> "Ballard PG", "Lawton Park", "Georgetown PF…
## $ ADDRESS <chr> "6020 28th Ave NW", "4005 27th Ave W", "750…
## $ DIVISION <chr> "North", "Central", "South", "SSD", "South"…
## $ SOCCER <int> 1, 1, 1, 1, 2, 1, 2, 1, 1, 1, 2, 1, 1, 3, 1…
## $ Overlapping <chr> "Y", "N", "Y", "Y", "N", "Y", "Y", "Y", "N"…
## $ Surface <chr> "Grass", "Grass", "Synthetic", "Grass", "Sy…
## $ Lights <chr> "Yes", "No", "Yes", "No", "Yes", "No", "No"…
## $ PMAid <int> 497, 316, 410, NA, 409, NA, NA, NA, NA, NA,…
## $ Location.Id <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ AMWO.Id <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ RES1 <chr> "3- Sched", "1- Sched", "2- Sched", "3- Sch…
## $ RES2 <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ GLOBALID <chr> "f5cfa10b-1ad8-4032-ad08-f81f77f16777", "dd…
## $ GIS_CREATOR <chr> "SeattleParks_SeattleCityGIS", "SeattlePark…
## $ GIS_CRT_DT <chr> "10/20/2025 5:25:52 PM", "10/20/2025 5:25:5…
## $ GIS_EDITOR <chr> "SeattleParks_SeattleCityGIS", "SeattlePark…
## $ GIS_EDT_DT <chr> "10/20/2025 5:25:52 PM", "10/20/2025 5:25:5…
## $ Spatial.Data.Quality.Level <chr> "QL-D3", "QL-D3", "QL-D3", "QL-D3", "QL-D3"…
## $ x <dbl> 1256682, 1256447, 1273035, 1271875, 1283710…
## $ y <dbl> 249293.9, 243339.0, 204646.7, 264571.1, 208…
# Clean and prepare data
soccer_clean <- soccer %>%
filter(!is.na(DIVISION)) %>%
select(NAME, DIVISION, Lights) %>%
mutate(Lights = tolower(Lights))
# Summarize number of fields per division
division_summary <- soccer_clean %>%
group_by(DIVISION) %>%
summarise(Total_NAME = n())
division_summary
## # A tibble: 6 × 2
## DIVISION Total_NAME
## <chr> <int>
## 1 Central 16
## 2 NE 2
## 3 North 17
## 4 SE 1
## 5 SSD 25
## 6 South 23
# Count Lights availability by DIVISION
Lights_summary <- soccer_clean %>%
group_by(DIVISION, Lights) %>%
summarise(Count = n())
## `summarise()` has grouped output by 'DIVISION'. You can override using the
## `.groups` argument.
Lights_summary
## # A tibble: 11 × 3
## # Groups: DIVISION [6]
## DIVISION Lights Count
## <chr> <chr> <int>
## 1 Central no 10
## 2 Central yes 6
## 3 NE no 1
## 4 NE yes 1
## 5 North no 12
## 6 North yes 5
## 7 SE no 1
## 8 SSD no 21
## 9 SSD yes 4
## 10 South no 15
## 11 South yes 8
# Visualization: Number of fields per division
ggplot(division_summary, aes(x = DIVISION, y = Total_NAME, fill = DIVISION)) +
geom_bar(stat = "identity") +
labs(title = "Number of Soccer Fields by Division",
x = "DIVISION", y = "Number of NAME")
# Visualization: Lights distribution by DIVISION
ggplot(Lights_summary, aes(x = DIVISION, y = Count, fill = Lights)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Lighting Availability by Division",
x = "DIVISION", y = "Number of Fields with/without Lights")
The analysis identifies which division — North, Central, or South — contains the most soccer fields. Based on the dataset, we can also observe differences in lighting availability across divisions.
If one division shows a higher proportion of lighted fields, it may indicate better infrastructure or investment in recreational facilities.
For future research, it would be valuable to examine: - The surface types (natural vs. artificial turf) in relation to divisions. - The geographic spread of fields using coordinates. - How lighting and field quantity correlate with community population or income levels.