Assignment 05

Author

Andry Sermeño

INTRODUCTION

My main goal for this project was to pick the urban transportation sector in Latin America mainly because it has a lot of data that can be visualized and interpreted.

I started by analyzing taxi trip data from three major cities: Mexico City, Quito and Monterrey. The data set includes pickup and drop off coordinates, trip duration, trip distance and wait times.

I used a combination of visualization tools created with Plotly and Leaflet, the project also explores how taxi services vary across cities, difference in trip patters, geographic concentration and service efficiency.

data <- read.csv('/Users/andrysermeno/Desktop/Mexico,Data,Project.csv')
data_clean <- data |>
  filter(
    between(pickup_latitude, -90, 90),
    between(pickup_longitude, -180, 180),
    between(dropoff_latitude, -90, 90),
    between(dropoff_longitude, -180, 180)
  ) |>
  mutate(
    trip_duration_min = trip_duration / 60,
    dist_km = dist_meters / 1000
  )

SCATTERPLOT:

This section compares how long each trip took in minutes versus how far the trip was traveled in kilometers. What was interesting is that longer distances generally require more time and fall under a rising trend.

Some trips that were short in distance but long in duration, potentially due to heavy traffic and even carrying more than one passenger. The different colors represent the different cities.

BOXPLOT:

The box plot shows the distribution of trip duration’s for each city, allowing to compare side by side. Mexico city shows a high concentration of shorter trips, while Monterrey displays a wider spread of trip duration. I also see that Quito has fewer observations but still presents variability.

plot_ly(data_clean,
        x = ~dist_km,
        y = ~trip_duration_min,
        type = "scatter",
        mode = "markers",
        color = ~vendor_id,
        text = ~paste("City:", vendor_id,
                      "<br>Distance:", round(dist_km, 2), "km",
                      "<br>Duration:", round(trip_duration_min, 1), "min",
                      "<br>Wait Time:", wait_sec, "sec"),
        marker = list(size = 6)) |>
  layout(title = "Trip Duration vs Distance",
         xaxis = list(title = "Distance (km)"),
         yaxis = list(title = "Duration (minutes)"))

LEAFLET MAP:

The Leaflet map displays pickup locations across all three cities. Clustering is used to group nearby points and reduce visual clutter, especially in urban centers like Mexico City. Another interesting thing is the map helps contextualize the trip data geographically and makes it easier to find high demand areas or areas where pickups are more spread out.

leaflet(data_clean) |>
  addProviderTiles("CartoDB.Positron") |>
  setView(lng = -99.1332, lat = 19.4326, zoom = 5) |>
  addMarkers(
    lng = ~pickup_longitude,
    lat = ~pickup_latitude,
    clusterOptions = markerClusterOptions(),
    popup = ~paste0(
      "<b>City:</b> ", vendor_id,
      "<br><b>Trip Distance:</b> ", round(dist_km, 2), " km",
      "<br><b>Trip Duration:</b> ", round(trip_duration_min, 1), " min"
    )
  )
plot_ly(data_clean,
        x = ~dist_km,
        y = ~trip_duration_min,
        type = "scatter",
        mode = "markers",
        color = ~vendor_id,
        text = ~paste("City:", vendor_id,
                      "<br>Distance:", round(dist_km, 2), "km",
                      "<br>Duration:", round(trip_duration_min, 1), "min",
                      "<br>Wait Time:", wait_sec, "sec"),
        marker = list(size = 6)) |>
  layout(title = "Trip Duration vs Distance",
         xaxis = list(title = "Distance (km)"),
         yaxis = list(title = "Duration (minutes)"))
data_clean |>
  filter(wait_sec < 3600) |>  # Exclude extreme outliers (1hr+ wait)
  plot_ly(
    x = ~wait_sec,
    type = "histogram",
    nbinsx = 50,
    marker = list(color = "darkorange")
  ) |>
  layout(title = "Distribution of Wait Times",
         xaxis = list(title = "Wait Time (seconds)"),
         yaxis = list(title = "Number of Trips"))
data_clean |>
  plot_ly(
    x = ~vendor_id,
    y = ~trip_duration_min,
    type = "box",
    color = ~vendor_id,
    boxpoints = "all",
    jitter = 0.4,
    pointpos = 0
  ) |>
  layout(title = "Trip Duration by City",
         xaxis = list(title = "City"),
         yaxis = list(title = "Trip Duration (minutes)"))

SUMMARY:

This project provides an analysis of taxi trip data from Mexico City, Monterrey, and Quito. I was able to examine over 90,000 trips, the analysis showed most taxis ride follow an specif pattern between the distance and duration, a significant number of trips experienced delays or long wait times.

There is also showed patters where there is inefficiencies due to traffic or dispatching. All of the visualizations provide insight into how taxi systems operate in different urban environments and how data can be used to inform improvements in the mobility service. What was interesting in that in those cities busy months for taxis are between June to August and it has to do with tourism across those cities.

REFERENCES: