data <- read.csv('/Users/andrysermeno/Desktop/Mexico,Data,Project.csv')Assignment 05
INTRODUCTION
My main goal for this project was to pick the urban transportation sector in Latin America mainly because it has a lot of data that can be visualized and interpreted.
I started by analyzing taxi trip data from three major cities: Mexico City, Quito and Monterrey. The data set includes pickup and drop off coordinates, trip duration, trip distance and wait times.
I used a combination of visualization tools created with Plotly and Leaflet, the project also explores how taxi services vary across cities, difference in trip patters, geographic concentration and service efficiency.
data_clean <- data |>
filter(
between(pickup_latitude, -90, 90),
between(pickup_longitude, -180, 180),
between(dropoff_latitude, -90, 90),
between(dropoff_longitude, -180, 180)
) |>
mutate(
trip_duration_min = trip_duration / 60,
dist_km = dist_meters / 1000
)SCATTERPLOT:
This section compares how long each trip took in minutes versus how far the trip was traveled in kilometers. What was interesting is that longer distances generally require more time and fall under a rising trend.
Some trips that were short in distance but long in duration, potentially due to heavy traffic and even carrying more than one passenger. The different colors represent the different cities.
BOXPLOT:
The box plot shows the distribution of trip duration’s for each city, allowing to compare side by side. Mexico city shows a high concentration of shorter trips, while Monterrey displays a wider spread of trip duration. I also see that Quito has fewer observations but still presents variability.
plot_ly(data_clean,
x = ~dist_km,
y = ~trip_duration_min,
type = "scatter",
mode = "markers",
color = ~vendor_id,
text = ~paste("City:", vendor_id,
"<br>Distance:", round(dist_km, 2), "km",
"<br>Duration:", round(trip_duration_min, 1), "min",
"<br>Wait Time:", wait_sec, "sec"),
marker = list(size = 6)) |>
layout(title = "Trip Duration vs Distance",
xaxis = list(title = "Distance (km)"),
yaxis = list(title = "Duration (minutes)"))