Geographic Information Systems (GIS) are tools used to capture, store, and analyze geographic data. Essentially, it’s a way to understand “where” things are and how they relate to each other. GIS enables us to work with location-based information, allowing us to visualize spatial data in a meaningful way—often through maps. This makes complex data more accessible and interpretable. Whether it’s showing the distribution of schools in a district, identifying areas with high traffic, or tracking changes in land use, GIS provides a powerful framework for understanding spatial relationships and making data-driven decisions.
Maps in GIS start with data, which is typically provided in the form of shapefiles or other geospatial formats such as GeoJSON or KML. These data contain two key components:
Maps in GIS are constructed using multiple layers, each representing a different type of spatial data. These layers allow us to separate and manipulate different kinds of information, making it easier to analyze complex datasets and create meaningful visualizations.
Each layer in GIS can be:
The ability to manipulate and visualize data in separate layers is one of the most powerful aspects of GIS. It allows for complex analyses and detailed visualizations, giving users flexibility and control over their data and map outputs.
Many software applications are available for constructing geographical maps, but most require a paid license (e.g., ArcGIS, SAS, etcetera). In our case, we will use R and RStudio because they are free, open-source, and highly flexible. R provides powerful packages like ggplot2, sf, and tmap that allow us to create and analyze spatial data without licensing costs. Additionally, R supports reproducibility and automation, making it ideal for research and large-scale geographic analyses.
Visualizing relevant variables geographically offers a deeper understanding of how our programs function within the Dallas Independent School District. In this tutorial, we’ll leverage the power of R and its geospatial capabilities to construct dynamic maps that reveal crucial insights, such as attendance rates and the family and community engagement index, directly within the district’s trustee boundaries.
To achieve this, we’ll build our maps using a layered approach. Specifically, we’ll focus on two primary layers:
These essential files, provided by the Dallas ISD’s Demographic Studies department, are regularly updated and publicly available for download here.
Let’s dive in and explore the world of geospatial analysis with R!
To begin, the code below clears the R environment, ensuring a clean workspace. Next, it loads essential R libraries for geospatial analysis, data visualization, and manipulation. Finally, it imports a shapefile containing Dallas ISD trustee boundary data into a spatial data frame, which prepares the data for mapping.
A shapefile is a common digital storage format for geospatial vector data, representing geographic features through geometric shapes (points, lines, and polygons) and their associated attributes.
# Here is the files we need for map construction: https://data-disd-gismaps.hub.arcgis.com/search?groupIds=6e8e4c20a9704410a659251d2987a362
# Clear all objects from memory
rm(list = ls())
# Install required packages (Uncomment if not installed)
# install.packages(c("sf", "ggplot2", "ggspatial", "rosm", "DT", "cowplot", "osmdata", "RColorBrewer", "dplyr", "patchwork", "ggrepel", "readr", "prettymapr"))
# Load required libraries
library(sf) # For reading, transforming, and visualizing geographic data
library(ggplot2) # For creating high-quality visualizations
library(ggspatial) # For adding spatial features like basemaps and compass roses
library(rosm) # Provides OpenStreetMap tiles for basemaps
library(DT) # For interactive tables
library(cowplot) # For combining multiple plots into one
library(osmdata) # For extracting OpenStreetMap data
library(RColorBrewer) # For color palettes
library(dplyr) # For data manipulation
library(patchwork) # For arranging multiple ggplot2 plots in a grid
library(ggrepel) # For improved text label placement in plots
library(readr) # For read_csv()
library(prettymapr) # For adding nice map elements
# Load geographic data for Trustee Boundaries
trustee_boundaries_map <- st_read("C:/Users/FFERRERO/OneDrive - Dallas Independent School District/2024-25/CSM 4th goal/Trustee_Boundaries/Trustee_Boundaries.shp")
## Reading layer `Trustee_Boundaries' from data source
## `C:\Users\FFERRERO\OneDrive - Dallas Independent School District\2024-25\CSM 4th goal\Trustee_Boundaries\Trustee_Boundaries.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 9 features and 7 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 2439242 ymin: 6886656 xmax: 2577621 ymax: 7046186
## Projected CRS: NAD83 / Texas North Central (ftUS)
In the output above, R confirms the successful reading of the ‘Trustee_Boundaries.shp’ shapefile. It details that the file contains 9 polygon features, representing the trustee boundaries, along with 7 associated data fields. The data is in a projected coordinate system, NAD83 / Texas North Central, and the bounding box provides the spatial extent of the data. Essentially, R has loaded the spatial information and its attributes, and now, let’s see how the data looks like in this file.
# Display first 9 rows of Trustee Boundaries data in an interactive table
datatable(head(trustee_boundaries_map, 9))
Now, let’s import the campus locations layer. The code below
uses the st_read
function to load a shapefile containing
Dallas ISD campus locations into a spatial data frame named
campus_locations_map
, making it ready for
mapping.
# Load geographic data for Campus Locations
campus_locations_map <- st_read("C:/Users/FFERRERO/OneDrive - Dallas Independent School District/2024-25/CSM 4th goal/Campus_Locations/Campus_Locations.shp")
## Reading layer `Campus_Locations' from data source
## `C:\Users\FFERRERO\OneDrive - Dallas Independent School District\2024-25\CSM 4th goal\Campus_Locations\Campus_Locations.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 252 features and 12 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -10794330 ymin: 3841213 xmax: -10746510 ymax: 3891845
## Projected CRS: WGS 84 / Pseudo-Mercator
The output above confirms the successful loading of the ‘Campus_Locations.shp’ shapefile, revealing it contains 252 point features, not polygons, representing campus locations, each with 12 attributes. Notably, the projection is WGS 84 / Pseudo-Mercator, different from the previous trustee boundaries data. Let’s now explore how the data looks.
# Select relevant columns excluding CITY, PHONE, and WEBSITE
campus_locations_map <- campus_locations_map %>%
select(-SLN, -CITY, -PHONE, -WEBSITE)
# Display first 10 rows of Campus Locations data in an interactive table
datatable(head(campus_locations_map))
To ensure accurate spatial alignment, the code below transforms the trustee_boundaries_map to match the coordinate reference system (CRS) of the campus_locations_map using the st_transform() function. This is a crucial GIS practice, as it guarantees both layers use the same spatial reference, preventing misalignment and ensuring accurate spatial analysis and visualization.
# Ensure both layers use the same coordinate reference system (CRS)
isd_map <- st_transform(trustee_boundaries_map, crs = st_crs(campus_locations_map))
Now, let’s plot and overlay our layers! The code below generates a map of Dallas ISD trustee district boundaries using ggplot2 and sf. It layers an OpenStreetMap basemap with the trustee boundary polygons, styling the map for clarity and adding a north arrow for orientation. We’ve added comments line by line to be exhaustive.
# Create a map using ggplot2, displaying trustee district boundaries for Dallas ISD
ggplot() + # Initialize the ggplot object
annotation_map_tile(type = "osm", zoom = 11) + # Add OpenStreetMap basemap with a zoom level of 11
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) + # Plot the trustee district boundaries using SF (Simple Features) geometry
# Fill the district areas with a light green color ("#A0D6D1") and outline them with a gray color ("gray40"), size of the boundary lines is 0.5
ggtitle("Dallas ISD Trustee Districts") + # Set the title of the plot to "Dallas ISD Trustee Districts"
theme_minimal() + # Apply the minimal theme to the plot for a clean design
theme(
legend.position = "none", # Remove the legend from the plot (not needed for this map)
panel.grid = element_blank(), # Remove the grid lines for a cleaner visual
axis.text = element_blank(), # Remove axis text to focus on the map
axis.title = element_blank() # Remove axis titles to focus on the map
) +
annotation_north_arrow(
location = "tr", # Place the north arrow in the top-right corner ("tr" stands for top-right)
which_north = "true", # Ensure the arrow is pointing to true north
style = north_arrow_nautical, # Choose a nautical style for the north arrow (alternative design)
height = unit(0.5, "inches"), # Set the height of the north arrow to 0.5 inches
width = unit(0.5, "inches") # Set the width of the north arrow to 0.5 inches
)
We can also modify the map by using a different color palette to distinguish the boundaries.
# Create a map showing the trustee district boundaries with different colors
ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, aes(fill = factor(DIST)),
color = "gray40", size = 0.5) + # Plot ISD boundaries with different colors
scale_fill_brewer(palette = "Pastel1") + # Apply pastel colors to each district
geom_sf_text(data = trustee_boundaries_map, aes(label = DIST),
size = 6,
fontface = "bold",
color = "black",
nudge_y = 0.1,
nudge_x = 0.1) +
ggtitle("Dallas ISD Trustee Districts") +
theme_minimal() +
theme(legend.position = "none",
panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"),
width = unit(0.5, "inches")
)
We can now overlay the campus locations layer. This is achieved with a single extra line of code!
# Create the map with circles showing campus locations
ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = campus_locations_map, color = "#211650", size = 2, shape = 16) + # Add campus locations as dark purple filled circles with size 2
ggtitle("Dallas ISD School Campus Locations with Trustee Districts") +
theme_minimal() +
theme(legend.position = "none",
panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"),
width = unit(0.5, "inches")
)
We want to distinguish graphically campuses depending on their types, but we first need to conduct some data manipulations. The R code below modifies the campus_locations_map dataset by creating a new variable, LEVEL_GROUPED, that consolidates “Magnet,” “Choice,” and “Alternative” schools into a single category labeled “Alternative / Choice / Magnet,” while keeping all other values unchanged.
# Create a new grouped variable in campus_locations_map to combine "Magnet", "Choice", and "Alternative" into a single category
campus_locations_map <- campus_locations_map %>%
mutate(LEVEL_GROUPED = case_when(
LEVEL_ %in% c("Magnet", "Choice", "Alternative") ~ "Alternative / Choice / Magnet",
TRUE ~ LEVEL_
))
…and now we are ready to display the new map. Please check the new lines added to the previous code with specific comments!
# Create a map showing campus locations with different colors based on school type, and a black outline for the points
ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = campus_locations_map, aes(fill = LEVEL_GROUPED), color = "black", size = 2, shape = 22) + # Add campus locations as filled squares, with a black outline
scale_fill_manual(values = c("Elementary" = "#D41111", # Set red for Elementary schools
"Middle" = "#56B4E9", # Set light blue for Middle schools
"High" = "#009E73", # Set green for High schools
"Alternative / Choice / Magnet" = "#F0E442"), # Set yellow for Alternative, Choice, and Magnet schools
guide = guide_legend(title = "School Level or Type", # Add legend with title "School Level or Type"
override.aes = list(shape = 21, color = "black")), # Ensure the legend shows black outlines for points
limits = c("Elementary", "Middle", "High", "Alternative / Choice / Magnet")) + # Set the order of categories in the legend
ggtitle("Dallas ISD School Campus Locations by Levels") +
theme_minimal() +
theme(panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"),
width = unit(0.5, "inches")
) +
guides(fill = guide_legend(title = "School Level or Type")) # Add a guide for the fill aesthetic with title "School Level or Type"
Let’s go further and create four separate maps, one for each level, and arrange them in two rows and two columns! Check the code below:
# Create filtered maps for each school level
plot_elementary <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = campus_locations_map %>% filter(LEVEL_GROUPED == "Elementary"),
aes(fill = LEVEL_GROUPED), color = "black", size = 2, shape = 21) +
scale_fill_manual(values = c("Elementary" = "#D41111")) +
ggtitle("Elementary Schools") +
theme_minimal() +
theme(panel.grid = element_blank(), axis.text = element_blank(), axis.title = element_blank()) +
annotation_north_arrow(location = "tr", which_north = "true", style = north_arrow_nautical, height = unit(0.5, "inches"), width = unit(0.5, "inches")) +
guides(fill = "none")
plot_middle <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = campus_locations_map %>% filter(LEVEL_GROUPED == "Middle"),
aes(fill = LEVEL_GROUPED), color = "black", size = 2, shape = 21) +
scale_fill_manual(values = c("Middle" = "#56B4E9")) +
ggtitle("Middle Schools") +
theme_minimal() +
theme(panel.grid = element_blank(), axis.text = element_blank(), axis.title = element_blank()) +
annotation_north_arrow(location = "tr", which_north = "true", style = north_arrow_nautical, height = unit(0.5, "inches"), width = unit(0.5, "inches")) +
guides(fill = "none")
plot_high <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = campus_locations_map %>% filter(LEVEL_GROUPED == "High"),
aes(fill = LEVEL_GROUPED), color = "black", size = 2, shape = 21) +
scale_fill_manual(values = c("High" = "#009E73")) +
ggtitle("High Schools") +
theme_minimal() +
theme(panel.grid = element_blank(), axis.text = element_blank(), axis.title = element_blank()) +
annotation_north_arrow(location = "tr", which_north = "true", style = north_arrow_nautical, height = unit(0.5, "inches"), width = unit(0.5, "inches")) +
guides(fill = "none")
plot_alt_choice_magnet <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = campus_locations_map %>% filter(LEVEL_GROUPED == "Alternative / Choice / Magnet"),
aes(fill = LEVEL_GROUPED), color = "black", size = 2, shape = 21) +
scale_fill_manual(values = c("Alternative / Choice / Magnet" = "#F0E442")) +
ggtitle("Alternative / Choice / Magnet Schools") +
theme_minimal() +
theme(panel.grid = element_blank(), axis.text = element_blank(), axis.title = element_blank()) +
annotation_north_arrow(location = "tr", which_north = "true", style = north_arrow_nautical, height = unit(0.5, "inches"), width = unit(0.5, "inches")) +
guides(fill = "none")
# Arrange the four plots side by side
(plot_elementary + plot_middle) / (plot_high + plot_alt_choice_magnet)
In this section, we will explore how student attendance varies across each trustee district boundary using a heatmap and visualize the family engagement index for each school through a bubble map. This introduces two key educational variables into our geographic representation. Our conjecture is that family engagement is somehow related to student attendance: the more engaged a family is with the school and community, the higher the student attendance. Let’s visualize this spacially and examine our initial hypothesis…
As you will see below, we have a new dataset that includes the TEA number, campus name, level, trustee district number, attendance, and the Family and Community Engagement Index (FCEI) (a value that measures the level of family involvement in campus activities, with a range of 0-1, where 1 represents the highest possible engagement). Please note that this is a synthetic dataset created with Gemini, featuring exaggerated tendencies to simplify the example for demonstration purposes.
# Read the CSV file into a data frame
attendance_fcei_data <- read_csv("C:/Users/FFERRERO/OneDrive - Dallas Independent School District/2024-25/CSM 4th goal/Mock_Data_Attendance&FCEI_by_Campus.csv")
# Show the first 10 rows of the Campus locations dataset in an interactive table
datatable(head(attendance_fcei_data, 10)) # Display campus location data
The next step is to merge our geographic file with the dataset containing the educational variables, using TEA as the primary key. Additionally, we need to review the variable types for each attribute to ensure proper alignment and compatibility.
# Merge the campus_locations_map with attendance_fcei_data by TEA (both datasets have a column named "TEA")
merged_data <- merge(campus_locations_map, attendance_fcei_data, by = "TEA")
# Check variables in the merged dataset and its type
str(merged_data)
## Classes 'sf' and 'data.frame': 256 obs. of 15 variables:
## $ TEA : int 1 2 3 4 5 6 7 8 9 10 ...
## $ FID : int 131 129 114 126 135 112 208 133 218 136 ...
## $ SCHOOLNAME : chr "Adams High School Leadership Academy" "Adamson High School" "New Tech High School" "Multiple Careers Magnet Center" ...
## $ LEVEL_ : chr "High" "High" "Choice" "Alternative" ...
## $ Latitude : num 32.8 32.7 32.7 32.8 32.7 ...
## $ Longitude : num -96.7 -96.8 -96.8 -96.8 -96.9 ...
## $ ADDRESS : chr "2101 Millmar Dr." "309 E. Ninth St." "4730 S. Lancaster Rd." "4528 Rusk Ave." ...
## $ ZIP : int 75228 75203 75216 75204 75211 75230 75229 75233 75215 75215 ...
## $ LEVEL_GROUPED: chr "High" "High" "Alternative / Choice / Magnet" "Alternative / Choice / Magnet" ...
## $ CAMPUS : chr "Adams, B" "Adamson" "New Tech" "Multiple Careers Magnet Center" ...
## $ LEVEL : chr "High" "High" "Choice" "Alternative" ...
## $ DIST : num 3 7 5 8 7 2 1 6 9 9 ...
## $ Attendance : num 78 88 92 65 90 85 75 82 95 94 ...
## $ FCEI : num 0.75 0.85 0.9 0.6 0.88 0.82 0.7 0.8 0.93 0.92 ...
## $ geometry :sfc_POINT of length 256; first list element: 'XY' num -10762383 3872385
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
## ..- attr(*, "names")= chr [1:14] "TEA" "FID" "SCHOOLNAME" "LEVEL_" ...
As you can see above, both Attendance and FCEI are numeric, so there’s no need to convert them. However, if you do need to, the code is provided below—just uncomment the lines if necessary.
# Convert ATTENDANCE from character to numeric if needed
# merged_data$Attendance <- as.numeric(merged_data$Attendance)
# Convert FCEI from character to numeric if needed
# merged_data$FCEI <- as.numeric(merged_data$FCEI)
The next step involves data manipulation. Our objective is to calculate the average attendance and FCEI for each trustee district boundary.
# Calculate the average of Attendance and FCEI per TRUSTEE
averages_per_district <- merged_data %>%
group_by(DIST) %>%
summarise(
avg_attendance = mean(Attendance, na.rm = TRUE),
avg_fcei = mean(FCEI, na.rm = TRUE)
)
# View the resulting summary
print(averages_per_district)
## Simple feature collection with 9 features and 3 fields
## Geometry type: MULTIPOINT
## Dimension: XY
## Bounding box: xmin: -10794330 ymin: 3841213 xmax: -10746510 ymax: 3891845
## Projected CRS: WGS 84 / Pseudo-Mercator
## # A tibble: 9 × 4
## DIST avg_attendance avg_fcei geometry
## <dbl> <dbl> <dbl> <MULTIPOINT [m]>
## 1 1 79.1 0.767 ((-10794330 3878283), (-10784618 3882616), (-10…
## 2 2 87.2 0.851 ((-10779248 3874439), (-10779213 3876093), (-10…
## 3 3 79.0 0.767 ((-10771431 3880430), (-10771426 3878450), (-10…
## 4 4 83.6 0.816 ((-10763282 3861153), (-10762345 3860198), (-10…
## 5 5 91.9 0.899 ((-10783078 3866988), (-10782243 3866342), (-10…
## 6 6 84.2 0.822 ((-10786951 3857629), (-10785554 3855475), (-10…
## 7 7 91.8 0.898 ((-10789660 3860299), (-10788013 3858555), (-10…
## 8 8 90.0 0.879 ((-10788027 3863367), (-10787701 3866340), (-10…
## 9 9 95.4 0.934 ((-10775394 3867684), (-10774540 3868990), (-10…
We will now extract these newly calculated values, sort them, and append them to the trustee_boundaries_map dataset.
# Extract the avg_attendance values
avg_attendance_values <- averages_per_district$avg_attendance
# Print the values
print(avg_attendance_values)
## [1] 79.10526 87.21053 79.04762 83.62963 91.89189 84.16667 91.81818 89.97059
## [9] 95.44444
# Sort trustee_boundaries_map by the DIST variable
trustee_boundaries_map <- trustee_boundaries_map %>%
arrange(DIST)
# Assign avg_attendance values in the correct order
trustee_boundaries_map$avg_attendance <- averages_per_district$avg_attendance
# Check the result
print(head(trustee_boundaries_map))
## Simple feature collection with 6 features and 8 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 2457841 ymin: 6886656 xmax: 2577621 ymax: 7046186
## Projected CRS: NAD83 / Texas North Central (ftUS)
## OBJECTID Census_Sch NAME DIST Shape_Leng Shape__Are Shape__Len
## 1 2 16230 Dallas 1 0.6093766 831194521 202240.6
## 2 4 16230 Dallas 2 0.6850432 701079691 230147.2
## 3 3 16230 Dallas 3 0.5796822 606163035 194774.8
## 4 6 16230 Dallas 4 0.8443502 2116276657 284144.0
## 5 1 16230 Dallas 5 1.6357313 2461573598 541572.2
## 6 9 16230 Dallas 6 0.6308621 965259853 209294.7
## geometry avg_attendance
## 1 POLYGON ((2494887 7018256, ... 79.10526
## 2 POLYGON ((2513404 6983338, ... 87.21053
## 3 POLYGON ((2532128 6984044, ... 79.04762
## 4 POLYGON ((2575682 6887012, ... 83.62963
## 5 POLYGON ((2500940 6915776, ... 91.89189
## 6 POLYGON ((2478977 6949039, ... 84.16667
# Extract the avg_FCEI values
avg_FCEI_values <- averages_per_district$avg_fcei
# Print the values
print(avg_FCEI_values)
## [1] 0.7673684 0.8510526 0.7671429 0.8162963 0.8989189 0.8216667 0.8978788
## [8] 0.8785294 0.9344444
# Sort trustee_boundaries_map by the DIST variable
trustee_boundaries_map <- trustee_boundaries_map %>%
arrange(DIST)
# Assign FCEI values in the correct order
trustee_boundaries_map$FCEI <- avg_FCEI_values
# Check the result
print(head(trustee_boundaries_map))
## Simple feature collection with 6 features and 9 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 2457841 ymin: 6886656 xmax: 2577621 ymax: 7046186
## Projected CRS: NAD83 / Texas North Central (ftUS)
## OBJECTID Census_Sch NAME DIST Shape_Leng Shape__Are Shape__Len
## 1 2 16230 Dallas 1 0.6093766 831194521 202240.6
## 2 4 16230 Dallas 2 0.6850432 701079691 230147.2
## 3 3 16230 Dallas 3 0.5796822 606163035 194774.8
## 4 6 16230 Dallas 4 0.8443502 2116276657 284144.0
## 5 1 16230 Dallas 5 1.6357313 2461573598 541572.2
## 6 9 16230 Dallas 6 0.6308621 965259853 209294.7
## avg_attendance geometry FCEI
## 1 79.10526 POLYGON ((2494887 7018256, ... 0.7673684
## 2 87.21053 POLYGON ((2513404 6983338, ... 0.8510526
## 3 79.04762 POLYGON ((2532128 6984044, ... 0.7671429
## 4 83.62963 POLYGON ((2575682 6887012, ... 0.8162963
## 5 91.89189 POLYGON ((2500940 6915776, ... 0.8989189
## 6 84.16667 POLYGON ((2478977 6949039, ... 0.8216667
Perfect! We’re all set! The trustee_boundaries_map dataset now contains both the geographic and attribute data for each boundary. As a result, we can create a heatmap using the following code:
# Create the attendance heatmap with a water mark
map1 <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, aes(fill = avg_attendance),
color = "gray40", size = 0.5) +
scale_fill_gradient2(low = "#FF9999", mid = "#FFCC66", high = "#66FF66",
midpoint = mean(trustee_boundaries_map$avg_attendance, na.rm = TRUE),
guide = guide_colorbar(barwidth = 10, barheight = 0.5,
title = "Average Percentage of Attendance",
title.position = "top",
label.position = "bottom",
title.hjust = 0.5,
label.hjust = 0.5,
title.theme = element_text(size = 10),
label.theme = element_text(size = 8),
direction = "horizontal",
tickmarks = TRUE)) +
# District number (larger font)
geom_sf_text(data = trustee_boundaries_map,
aes(label = DIST),
size = 6,
fontface = "bold",
color = "black",
vjust = 0.5, # Center vertically
hjust = 0.5) + # Center horizontally
# Percentage (smaller font inside parentheses) below the district number
geom_sf_text(data = trustee_boundaries_map,
aes(label = paste("(", round(avg_attendance, 1), "%)", sep = "")),
size = 3.5,
fontface = "italic",
color = "black",
vjust = 2, # Move percentage below the district number, adjust this value as needed.
hjust = 0.5) + # Center horizontally
ggtitle("DISD Average Attendance Heatmap by District Trustee") +
labs(fill = "Average Percentage of Attendance") +
theme_minimal() +
theme(legend.position = c(0, -0.008),
legend.justification = c(0, 0),
legend.background = element_rect(fill = "white", color = "black", size = 0.5),
panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"),
width = unit(0.5, "inches")
) +
# Watermark addition with the legend "MOCK DATA"
geom_text(
aes(x = Inf, y = -Inf, label = "MOCK DATA"), # Place watermark in bottom right corner
hjust = 1.1, # Adjust horizontal position
vjust = -5, # Adjust vertical position
alpha = 0.3, # Make it semi-transparent
size = 10, # Adjust size
color = "red" # Adjust color
)
print(map1)
Let’s now focus on the Family and Community Engagement Index per school. The idea is to create a bubble map, where each campus is represented by a bubble. The size and color of the bubbles will indicate the level of engagement: larger and greener circles represent a higher average FCEI.
# Create the map with an adjusted bubble size and color gradient based on FCEI
map2 <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = merged_data, aes(size = FCEI, fill = FCEI), shape = 21, stroke = 1, color = "gray40") + # Changed fill and added border
scale_size_continuous(range = c(0.01, 4), name = "FCEI", guide = "none") +
scale_fill_gradientn(colors = c("red", "green"), name = "FCEI") + # Changed to fill gradient
ggtitle("DISD Family & Community Engagement Index (FCEI) by Campus") +
theme_minimal() +
theme(legend.position = "right",
panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"),
width = unit(0.5, "inches")
) +
geom_text(
aes(x = Inf, y = -Inf, label = "MOCK DATA"),
hjust = 1.1,
vjust = -5,
alpha = 0.3,
size = 10,
color = "red"
)
print(map2)
Let’s enhance the previous map to make it interactive. By hovering over each school, we can display its attendance and FCEI values.
# Interactive Map
library(plotly)
# Create the base ggplot object
map2_base <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = merged_data,
aes(size = FCEI, fill = FCEI,
text = paste("Campus:", CAMPUS, "<br>Attendance:", Attendance, "<br>FCEI:", FCEI)), # Changed to "Attendance"
shape = 21, stroke = 1, color = "gray40") +
scale_size_continuous(range = c(0.01, 4), name = "FCEI", guide = "none") +
scale_fill_gradientn(colors = c("red", "green"), name = "FCEI") +
ggtitle("DISD Family & Community Engagement Index (FCEI) by Campus") +
theme_minimal() +
theme(legend.position = "right",
panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"),
width = unit(0.5, "inches")
) +
geom_text(
aes(x = Inf, y = -Inf, label = "MOCK DATA"),
hjust = 1.1,
vjust = -5,
alpha = 0.3,
size = 10,
color = "red"
)
# Convert the ggplot object to an interactive plotly object
interactive_map <- ggplotly(map2_base, tooltip = "text")
# Display the interactive map
interactive_map
But which campuses have the highest FCEI? We will filter the 15 campuses with the highest FCEI, extract their geographic coordinates, plot them on a map, and add their names using geom_text_repel for better clarity.
# Select the top 15 campuses based on FCEI
top_15_campuses <- merged_data %>%
arrange(desc(FCEI)) %>%
slice(1:15)
# Extract coordinates
top_15_campuses <- top_15_campuses %>%
mutate(coords = st_coordinates(.)) %>%
mutate(Longitude = coords[, 1], Latitude = coords[, 2])
# Create the map with only the top 15 campuses and labels
map3 <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = top_15_campuses, aes(size = FCEI), color = "#211650", shape = 16) +
scale_size_continuous(range = c(3.5, 4), name = "FCEI") +
ggtitle("Top 15 DISD Campuses by FCEI") +
theme_minimal() +
theme(legend.position = "none",
panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"),
width = unit(0.5, "inches")
) +
# Label the top 15 campuses using geom_text_repel
geom_text_repel(data = top_15_campuses, aes(x = Longitude, y = Latitude, label = CAMPUS),
color = "#211650", size = 3, fontface = "bold",
box.padding = unit(0.35, "lines"),
point.padding = unit(0.3, "lines"),
segment.color = "gray50",
segment.size = 0.2) +
# Watermark addition
geom_text(
aes(x = Inf, y = -Inf, label = "MOCK DATA"),
hjust = 1.1,
vjust = -5,
alpha = 0.3,
size = 10,
color = "red"
)
print(map3)
To compare the maps, we need to position them next to each other. There may be a relationship between our variables that becomes more apparent when we view them side by side….
# Combine the maps using patchwork
combined_maps1 <- map1 + map2
combined_maps2 <- map1 + map3
# Display the combined maps
print(combined_maps1)
print(combined_maps2)
It seems that campuses with higher FCEI are located in trustee district boundaries with higher student attendance! Intersting finding! Let’s verify this with numbers and calculate the correlation.
# Calculate the correlation
correlation <- cor(merged_data$Attendance, merged_data$FCEI, use = "complete.obs")
# Print the correlation coefficient
print(paste("Correlation between Attendance and FCEI:", correlation))
## [1] "Correlation between Attendance and FCEI: 0.998779807347982"
Correlation is almost perfect! But… do not forget… it’s simulated data for storytelling purposes, LOL!
# Visualize the correlation with a scatter plot
ggplot(merged_data, aes(x = Attendance, y = FCEI)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) + # Add a linear regression line
ggtitle("DISD Attendance vs. FCEI") +
xlab("Attendance") +
ylab("FCEI") +
theme_minimal()
A common use of maps in education is to visualize how a variable changes over time across different geographic areas. This approach allows us to observe both temporal trends and their spatial distribution. Let’s look at a mock example using average attendance.
Feature | WGS 84 | NAD83 |
---|---|---|
Core Concept | Global Geographic Coordinate System (GCS) | North American Geographic Coordinate System (GCS) |
Primary Application | GPS, Global Web Mapping | North American Mapping & Surveying |
Geographic Focus | Worldwide | North America |
Accuracy in Dallas | Generally good for global context | Higher accuracy for local analysis in Dallas |
Key Distinction | Global standard, default for GPS data | Optimized for North American precision |
Dallas-Relevant Examples | Often the underlying system for global base maps | Preferred for detailed Dallas-area datasets |
Understanding Geographic Coordinate Systems (GCS) is fundamental in GIS. Two crucial GCS you’ll encounter are WGS 84 and NAD83.
WGS 84 serves as the global standard, the backbone of the Global Positioning System (GPS), and frequently underpins web mapping platforms. While it provides a consistent global reference, its accuracy might be slightly less refined for localized areas like Dallas.
NAD83, on the other hand, is specifically tailored for North America. It offers higher precision for mapping and surveying within this region, making it the preferred choice for detailed spatial analysis within Dallas and surrounding areas.
# Geographic Information System: A Tutorial with R
# 3/21/2025
# Federico Ferrero
# Here are the files we need for map construction: https://data-disd-gismaps.hub.arcgis.com/search?groupIds=6e8e4c20a9704410a659251d2987a362
# Clear all objects from memory
rm(list = ls())
# Install required packages (Uncomment if not installed)
# install.packages(c("sf", "ggplot2", "ggspatial", "rosm", "DT", "cowplot", "osmdata", "RColorBrewer", "dplyr", "patchwork", "ggrepel", "readr", "prettymapr"))
# Load required libraries
library(sf) # For reading, transforming, and visualizing geographic data
library(ggplot2) # For creating high-quality visualizations
library(ggspatial) # For adding spatial features like basemaps and compass roses
library(rosm) # Provides OpenStreetMap tiles for basemaps
library(DT) # For interactive tables
library(cowplot) # For combining multiple plots into one
library(osmdata) # For extracting OpenStreetMap data
library(RColorBrewer) # For color palettes
library(dplyr) # For data manipulation
library(patchwork) # For arranging multiple ggplot2 plots in a grid
library(ggrepel) # For improved text label placement in plots
library(readr) # For read_csv()
library(prettymapr) # For adding nice map elements
# Load geographic data for Trustee Boundaries
trustee_boundaries_map <- st_read("C:/Users/FFERRERO/OneDrive - Dallas Independent School District/2024-25/CSM 4th goal/Trustee_Boundaries/Trustee_Boundaries.shp")
# Display first 9 rows of Trustee Boundaries data in an interactive table
datatable(head(trustee_boundaries_map, 9))
# Load geographic data for Campus Locations
campus_locations_map <- st_read("C:/Users/FFERRERO/OneDrive - Dallas Independent School District/2024-25/CSM 4th goal/Campus_Locations/Campus_Locations.shp")
# Select relevant columns excluding CITY, PHONE, and WEBSITE
campus_locations_map <- campus_locations_map %>%
select(-SLN, -CITY, -PHONE, -WEBSITE)
# Display first 10 rows of Campus Locations data in an interactive table
datatable(head(campus_locations_map))
# Ensure both layers use the same coordinate reference system (CRS)
isd_map <- st_transform(trustee_boundaries_map, crs = st_crs(campus_locations_map))
# Create a map using ggplot2, displaying trustee district boundaries for Dallas ISD
ggplot() + # Initialize the ggplot object
annotation_map_tile(type = "osm", zoom = 11) + # Add OpenStreetMap basemap with a zoom level of 11
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) + # Plot the trustee district boundaries using SF (Simple Features) geometry
# Fill the district areas with a light green color ("#A0D6D1") and outline them with a gray color ("gray40"), size of the boundary lines is 0.5
ggtitle("Dallas ISD Trustee Districts") + # Set the title of the plot to "Dallas ISD Trustee Districts"
theme_minimal() + # Apply the minimal theme to the plot for a clean design
theme(
legend.position = "none", # Remove the legend from the plot (not needed for this map)
panel.grid = element_blank(), # Remove the grid lines for a cleaner visual
axis.text = element_blank(), # Remove axis text to focus on the map
axis.title = element_blank() # Remove axis titles to focus on the map
) +
annotation_north_arrow(
location = "tr", # Place the north arrow in the top-right corner ("tr" stands for top-right)
which_north = "true", # Ensure the arrow is pointing to true north
style = north_arrow_nautical, # Choose a nautical style for the north arrow (alternative design)
height = unit(0.5, "inches"), # Set the height of the north arrow to 0.5 inches
width = unit(0.5, "inches") # Set the width of the north arrow to 0.5 inches
)
# Create a map showing the trustee district boundaries with different colors
ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, aes(fill = factor(DIST)),
color = "gray40", size = 0.5) + # Plot ISD boundaries with different colors
scale_fill_brewer(palette = "Pastel1") + # Apply pastel colors to each district
geom_sf_text(data = trustee_boundaries_map, aes(label = DIST),
size = 6,
fontface = "bold",
color = "black",
nudge_y = 0.1,
nudge_x = 0.1) +
ggtitle("Dallas ISD Trustee Districts") +
theme_minimal() +
theme(legend.position = "none",
panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"),
width = unit(0.5, "inches")
)
# Create the map with circles showing campus locations
ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = campus_locations_map, color = "#211650", size = 2, shape = 16) + # Add campus locations as dark purple filled circles with size 2
ggtitle("Dallas ISD School Campus Locations with Trustee Districts") +
theme_minimal() +
theme(legend.position = "none",
panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"),
width = unit(0.5, "inches")
)
# Create a new grouped variable in campus_locations_map to combine "Magnet", "Choice", and "Alternative" into a single category
campus_locations_map <- campus_locations_map %>%
mutate(LEVEL_GROUPED = case_when(
LEVEL_ %in% c("Magnet", "Choice", "Alternative") ~ "Alternative / Choice / Magnet",
TRUE ~ LEVEL_
))
# Create a map showing campus locations with different colors based on school type, and a black outline for the points
ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = campus_locations_map, aes(fill = LEVEL_GROUPED), color = "black", size = 2, shape = 22) + # Add campus locations as filled squares, with a black outline
scale_fill_manual(values = c("Elementary" = "#D41111", # Set red for Elementary schools
"Middle" = "#56B4E9", # Set light blue for Middle schools
"High" = "#009E73", # Set green for High schools
"Alternative / Choice / Magnet" = "#F0E442"), # Set yellow for Alternative, Choice, and Magnet schools
guide = guide_legend(title = "School Level or Type", # Add legend with title "School Level or Type"
override.aes = list(shape = 21, color = "black")), # Ensure the legend shows black outlines for points
limits = c("Elementary", "Middle", "High", "Alternative / Choice / Magnet")) + # Set the order of categories in the legend
ggtitle("Dallas ISD School Campus Locations by Levels") +
theme_minimal() +
theme(panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"),
width = unit(0.5, "inches")
) +
guides(fill = guide_legend(title = "School Level or Type")) # Add a guide for the fill aesthetic with title "School Level or Type"
# Create filtered maps for each school level
plot_elementary <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = campus_locations_map %>% filter(LEVEL_GROUPED == "Elementary"),
aes(fill = LEVEL_GROUPED), color = "black", size = 2, shape = 21) +
scale_fill_manual(values = c("Elementary" = "#D41111")) +
ggtitle("Elementary Schools") +
theme_minimal() +
theme(panel.grid = element_blank(), axis.text = element_blank(), axis.title = element_blank()) +
annotation_north_arrow(location = "tr", which_north = "true", style = north_arrow_nautical, height = unit(0.5, "inches"), width = unit(0.5, "inches")) +
guides(fill = "none")
plot_middle <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = campus_locations_map %>% filter(LEVEL_GROUPED == "Middle"),
aes(fill = LEVEL_GROUPED), color = "black", size = 2, shape = 21) +
scale_fill_manual(values = c("Middle" = "#56B4E9")) +
ggtitle("Middle Schools") +
theme_minimal() +
theme(panel.grid = element_blank(), axis.text = element_blank(), axis.title = element_blank()) +
annotation_north_arrow(location = "tr", which_north = "true", style = north_arrow_nautical, height = unit(0.5, "inches"), width = unit(0.5, "inches")) +
guides(fill = "none")
plot_high <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = campus_locations_map %>% filter(LEVEL_GROUPED == "High"),
aes(fill = LEVEL_GROUPED), color = "black", size = 2, shape = 21) +
scale_fill_manual(values = c("High" = "#009E73")) +
ggtitle("High Schools") +
theme_minimal() +
theme(panel.grid = element_blank(), axis.text = element_blank(), axis.title = element_blank()) +
annotation_north_arrow(location = "tr", which_north = "true", style = north_arrow_nautical, height = unit(0.5, "inches"), width = unit(0.5, "inches")) +
guides(fill = "none")
plot_alt_choice_magnet <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = campus_locations_map %>% filter(LEVEL_GROUPED == "Alternative / Choice / Magnet"),
aes(fill = LEVEL_GROUPED), color = "black", size = 2, shape = 21) +
scale_fill_manual(values = c("Alternative / Choice / Magnet" = "#F0E442")) +
ggtitle("Alternative / Choice / Magnet Schools") +
theme_minimal() +
theme(panel.grid = element_blank(), axis.text = element_blank(), axis.title = element_blank()) +
annotation_north_arrow(location = "tr", which_north = "true", style = north_arrow_nautical, height = unit(0.5, "inches"), width = unit(0.5, "inches")) +
guides(fill = "none")
# Arrange the four plots side by side
(plot_elementary + plot_middle) / (plot_high + plot_alt_choice_magnet)
# Read the CSV file into a data frame
attendance_fcei_data <- read_csv("C:/Users/FFERRERO/OneDrive - Dallas Independent School District/2024-25/CSM 4th goal/Mock_Data_Attendance&FCEI_by_Campus.csv")
# Show the first 10 rows of the Campus locations dataset in an interactive table
datatable(head(attendance_fcei_data, 10)) # Display campus location data
# Merge the campus_locations_map with attendance_fcei_data by TEA (both datasets have a column named "TEA")
merged_data <- merge(campus_locations_map, attendance_fcei_data, by = "TEA")
# Check variables in the merged dataset and its type
str(merged_data)
# Convert ATTENDANCE from character to numeric if needed
# merged_data$Attendance <- as.numeric(merged_data$Attendance)
# Convert FCEI from character to numeric if needed
# merged_data$FCEI <- as.numeric(merged_data$FCEI)
# Calculate the average of Attendance and FCEI per TRUSTEE
averages_per_district <- merged_data %>%
group_by(DIST) %>%
summarise(
avg_attendance = mean(Attendance, na.rm = TRUE),
avg_fcei = mean(FCEI, na.rm = TRUE)
)
# View the resulting summary
print(averages_per_district)
# Extract the avg_attendance values
avg_attendance_values <- averages_per_district$avg_attendance
# Print the values
print(avg_attendance_values)
# Sort trustee_boundaries_map by the DIST variable
trustee_boundaries_map <- trustee_boundaries_map %>%
arrange(DIST)
# Assign avg_attendance values in the correct order
trustee_boundaries_map$avg_attendance <- averages_per_district$avg_attendance
# Check the result
print(head(trustee_boundaries_map))
# Extract the avg_FCEI values
avg_FCEI_values <- averages_per_district$avg_fcei
# Print the values
print(avg_FCEI_values)
# Sort trustee_boundaries_map by the DIST variable
trustee_boundaries_map <- trustee_boundaries_map %>%
arrange(DIST)
# Assign FCEI values in the correct order
trustee_boundaries_map$FCEI <- avg_FCEI_values
# Check the result
print(head(trustee_boundaries_map))
# Create the attendance heatmap with a water mark
map1 <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, aes(fill = avg_attendance),
color = "gray40", size = 0.5) +
scale_fill_gradient2(low = "#FF9999", mid = "#FFCC66", high = "#66FF66",
midpoint = mean(trustee_boundaries_map$avg_attendance, na.rm = TRUE),
guide = guide_colorbar(barwidth = 10, barheight = 0.5,
title = "Average Percentage of Attendance",
title.position = "top",
label.position = "bottom",
title.hjust = 0.5,
label.hjust = 0.5,
title.theme = element_text(size = 10),
label.theme = element_text(size = 8),
direction = "horizontal",
tickmarks = TRUE)) +
# District number (larger font)
geom_sf_text(data = trustee_boundaries_map,
aes(label = DIST),
size = 6,
fontface = "bold",
color = "black",
vjust = 0.5, # Center vertically
hjust = 0.5) + # Center horizontally
# Percentage (smaller font inside parentheses) below the district number
geom_sf_text(data = trustee_boundaries_map,
aes(label = paste("(", round(avg_attendance, 1), "%)", sep = "")),
size = 3.5,
fontface = "italic",
color = "black",
vjust = 2, # Move percentage below the district number, adjust this value as needed.
hjust = 0.5) + # Center horizontally
ggtitle("DISD Average Attendance Heatmap by District Trustee") +
labs(fill = "Average Percentage of Attendance") +
theme_minimal() +
theme(legend.position = c(0, -0.008),
legend.justification = c(0, 0),
legend.background = element_rect(fill = "white", color = "black", size = 0.5),
panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"),
width = unit(0.5, "inches")
) +
# Watermark addition with the legend "MOCK DATA"
geom_text(
aes(x = Inf, y = -Inf, label = "MOCK DATA"), # Place watermark in bottom right corner
hjust = 1.1, # Adjust horizontal position
vjust = -5, # Adjust vertical position
alpha = 0.3, # Make it semi-transparent
size = 10, # Adjust size
color = "red" # Adjust color
)
print(map1)
# Create the map with an adjusted bubble size and color gradient based on FCEI
map2 <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = merged_data, aes(size = FCEI, fill = FCEI), shape = 21, stroke = 1, color = "gray40") + # Changed fill and added border
scale_size_continuous(range = c(0.01, 4), name = "FCEI", guide = "none") +
scale_fill_gradientn(colors = c("red", "green"), name = "FCEI") + # Changed to fill gradient
ggtitle("DISD Family & Community Engagement Index (FCEI) by Campus") +
theme_minimal() +
theme(legend.position = "right",
panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"),
width = unit(0.5, "inches")
) +
geom_text(
aes(x = Inf, y = -Inf, label = "MOCK DATA"),
hjust = 1.1,
vjust = -5,
alpha = 0.3,
size = 10,
color = "red"
)
print(map2)
# Interactive Map: first we need the plotly library
install.packages("plotly")
library(plotly)
# Create the base ggplot object
map2_base <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = merged_data,
aes(size = FCEI, fill = FCEI,
text = paste("Campus:", CAMPUS, "<br>Attendance:", Attendance, "<br>FCEI:", FCEI)), # Changed to "Attendance"
shape = 21, stroke = 1, color = "gray40") +
scale_size_continuous(range = c(0.01, 4), name = "FCEI", guide = "none") +
scale_fill_gradientn(colors = c("red", "green"), name = "FCEI") +
ggtitle("DISD Family & Community Engagement Index (FCEI) by Campus") +
theme_minimal() +
theme(legend.position = "right",
panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"),
width = unit(0.5, "inches")
) +
geom_text(
aes(x = Inf, y = -Inf, label = "MOCK DATA"),
hjust = 1.1,
vjust = -5,
alpha = 0.3,
size = 10,
color = "red"
)
# Convert the ggplot object to an interactive plotly object
interactive_map <- ggplotly(map2_base, tooltip = "text")
# Display the interactive map
interactive_map
# Select the top 15 campuses based on FCEI
top_15_campuses <- merged_data %>%
arrange(desc(FCEI)) %>%
slice(1:15)
# Extract coordinates
top_15_campuses <- top_15_campuses %>%
mutate(coords = st_coordinates(.)) %>%
mutate(Longitude = coords[, 1], Latitude = coords[, 2])
# Create the map with only the top 15 campuses and labels
map3 <- ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = trustee_boundaries_map, fill = "#A0D6D1", color = "gray40", size = 0.5) +
geom_sf(data = top_15_campuses, aes(size = FCEI), color = "#211650", shape = 16) +
scale_size_continuous(range = c(3.5, 4), name = "FCEI") +
ggtitle("Top 15 DISD Campuses by FCEI") +
theme_minimal() +
theme(legend.position = "none",
panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank()) +
annotation_north_arrow(
location = "tr",
which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"),
width = unit(0.5, "inches")
) +
# Label the top 15 campuses using geom_text_repel
geom_text_repel(data = top_15_campuses, aes(x = Longitude, y = Latitude, label = CAMPUS),
color = "#211650", size = 3, fontface = "bold",
box.padding = unit(0.35, "lines"),
point.padding = unit(0.3, "lines"),
segment.color = "gray50",
segment.size = 0.2) +
# Watermark addition
geom_text(
aes(x = Inf, y = -Inf, label = "MOCK DATA"),
hjust = 1.1,
vjust = -5,
alpha = 0.3,
size = 10,
color = "red"
)
print(map3)
# Combine the maps using patchwork
combined_maps1 <- map1 + map2
combined_maps2 <- map1 + map3
# Display the combined maps
print(combined_maps1)
print(combined_maps2)
# Calculate the correlation
correlation <- cor(merged_data$Attendance, merged_data$FCEI, use = "complete.obs")
# Print the correlation coefficient
print(paste("Correlation between Attendance and FCEI:", correlation))
# Visualize the correlation with a scatter plot
ggplot(merged_data, aes(x = Attendance, y = FCEI)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) + # Add a linear regression line
ggtitle("DISD Attendance vs. FCEI") +
xlab("Attendance") +
ylab("FCEI") +
theme_minimal()
# For temporal progression of maps simulate attendance progression
set.seed(123)
attendance_progression <- trustee_boundaries_map %>%
mutate(attendance_2020 = pmax(avg_attendance - runif(n(), 5, 10), 75),
attendance_2021 = pmax(attendance_2020 + runif(n(), 1, 4), 75),
attendance_2022 = pmax(attendance_2021 + runif(n(), 1, 3), 75),
attendance_2023 = avg_attendance)
# Function to generate individual map *without* legend
create_map <- function(data, attendance_var, subtitle) {
ggplot() +
annotation_map_tile(type = "osm", zoom = 11) +
geom_sf(data = data, aes(fill = .data[[attendance_var]]),
color = "gray40", size = 0.5) +
scale_fill_gradient(low = "white", high = "#211650", na.value = "grey90",
limits = c(75, 100),
guide = "none") +
geom_sf_text(data = data, aes(label = DIST),
size = 6, fontface = "bold", color = "black") +
geom_sf_text(data = data,
aes(label = paste0("(", round(.data[[attendance_var]], 1), "%)")),
size = 3.5, fontface = "italic", color = "black", vjust = 2) +
ggtitle(subtitle) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 12, face = "bold"),
panel.grid = element_blank(),
axis.text = element_blank(),
axis.title = element_blank(),
legend.position = "none"
) +
annotation_north_arrow(
location = "tr", which_north = "true",
style = north_arrow_nautical,
height = unit(0.5, "inches"), width = unit(0.5, "inches")
) +
geom_text(
aes(x = Inf, y = -Inf, label = "MOCK DATA"),
hjust = 1.1, vjust = -5, alpha = 0.3,
size = 10, color = "red"
)
}
# Create maps without legends
map_2020 <- create_map(attendance_progression, "attendance_2020", "2020–21")
map_2021 <- create_map(attendance_progression, "attendance_2021", "2021–22")
map_2022 <- create_map(attendance_progression, "attendance_2022", "2022–23")
map_2023 <- create_map(attendance_progression, "attendance_2023", "2023–24")
# Combine maps horizontally without any legend
final_plot <- (map_2020 + map_2021 + map_2022 + map_2023) +
plot_layout(ncol = 4) +
plot_annotation(
title = "DISD Average Student Attendance by District Trustee Boundary",
theme = theme(
plot.title = element_text(size = 16, face = "bold", hjust = 0.5)
)
)
print(final_plot)