Exploring Airbnb Pricing Across U.S. Cities

Final Project | Data Science 1 with R (STAT 301-1)

Author

Gauri Adarsh

Published

December 10, 2025

Github Repo Link

Introduction

“In short order, they inflict emotional and mental damage to those neighborhoods in exchange for a few bucks in their pockets.”

This quote, from a frustrated resident living near a short-term rental with especially rowdy behavior, captures the emotional intensity surrounding Airbnb’s presence in U.S. cities. As someone interested in government and housing policy, I wanted to look past anecdotes and examine where Airbnb has expanded, how expensive different markets have become, and how these patterns vary across cities and neighborhoods. By analyzing Airbnb data from 2020 and 2023, this project explores where short-term rentals are most concentrated, how prices differ spatially, and what these trends might imply for equity-minded housing regulation.

Data overview & quality

This project uses two Airbnb listing datasets collected in 2020 and 2023 across multiple U.S. cities. The 2020 dataset contains 222,449 listings and 16 variables, while the 2023 dataset contains 228,398 listings and 17 variables, allowing for during and post-pandemic comparisons over time. Together, the datasets represent a snapshot of short-term rental activity across markets all over the United States!

Key variables include nightly price (int), number of reviews (int), reviews per month (num), room type (categorical factor), neighbourhood and neighbourhood group (categorical), and geographic latitude and longitude (num). Both datasets also contain host-level information such as calculated host listings count, minimum stay requirements, and annual availability. City identifiers are stored as categorical factors, enabling multi-city comparisons.

Missingness is concentrated in one variable only: reviews_per_month, with 48,602 missing values in 2020 and 49,085 in 2023. These missing values occur when listings have zero reviews, so they were treated as structural zeros rather than removed. A small number of listings with zero prices and extreme outliers in price and minimum nights were removed to ensure realistic and interpretable distributions.

Below is a random sample from both datasets for examination!

Random Sample of Cleaned Airbnb 2020 Data
id name host_id neighbourhood_group neighbourhood latitude longitude room_type price minimum_nights number_of_reviews last_review reviews_per_month calculated_host_listings_count availability_365 city
18876264 MODERN DOWNTOWN TOP FLOOR PENTHOUSE WITH BAY VIEWS 23165254 Downtown Belltown 47.61572 -122.34663 Entire home/apt 599 1 40 2020-02-28 1.13 18 320 Seattle
6776235 Honua Kai - K112 Ground Floor w/ oversized lanai 24563934 Maui Lahaina 20.94438 -156.68982 Hotel room 415 6 3 2020-02-02 0.09 7 1 Hawaii
15042311 COMFY SINGLE in Shared Room #3! 41676630 NA West Town 41.90701 -87.68935 Shared room 39 1 95 2020-07-08 1.97 2 10 Chicago
27263329 Conveniently located SOMA Studio 48005494 NA South of Market 37.78290 -122.39846 Entire home/apt 70 30 2 2020-01-15 0.09 111 245 San Francisco
31625086 Denver Getaway Studio in Contemporary Rustic Style 28939848 NA Regis 39.78644 -105.03172 Entire home/apt 105 2 59 2020-05-11 4.06 1 102 Denver
Random Sample of Cleaned Airbnb 2023 Data
id name host_id neighbourhood_group neighbourhood latitude longitude room_type price minimum_nights number_of_reviews last_review reviews_per_month calculated_host_listings_count availability_365 number_of_reviews_ltm city
1.839532e+07 Fort Lauderdale beach condo! 76217164 NA Fort Lauderdale 26.13320 -80.10908 Entire home/apt 191 3 233 NA 3.25 1 75 47 Broward County
5.310828e+07 Grace Ranch #5 267485780 Unincorporated Areas Northeast Antelope Valley 34.72179 -118.02511 Entire home/apt 105 2 14 NA 1.11 8 363 13 Los Angeles
8.117132e+17 Luxe-Modern South Bay Studio 434290762 Other Cities Torrance 33.83046 -118.36647 Entire home/apt 160 2 0 NA 0.00 2 22 0 Los Angeles
7.622673e+17 The Perfect Escape @Lake Ridge 108186866 NA Fort Lauderdale 26.13887 -80.13052 Entire home/apt 101 1 13 NA 3.02 35 305 13 Broward County
2.013737e+07 Hollywood Hotspot 16148120 City of Los Angeles Hollywood 34.10143 -118.33985 Entire home/apt 80 30 8 NA 0.12 1 0 0 Los Angeles

Part 1: Exploring the Airbnb landscape circa 2020

To understand how Airbnb impacts housing markets across cities, I start by looking at overall price distributions and room types to characterize the basic structure of each market. Is it tourist-heavy, or maybe a hot-spot for commuters? For simplicity of analysis, most graphs only illustrate NYC, LA, Seattle, Hawaii, and Rhode Island, as they contain values for neighborhood_group, which I use to get a high-level analysis of the five areas. Next, I analyze how location within a city influences Airbnb pricing patterns. Finally, I analyze how price relates to demand. As an economist by trade, I tried to find the best substitute for Airbnb demand, and decided on using review activity as a proxy to identify the types of listings that drive the most economic activity.

Before diving into neighborhoods and demand, I first ask a simple question: what does an “average” Airbnb even look like across different cities? To answer that, I examine how prices and room types are distributed in each market.

Figure 1: Data was provided by Kaggle.

In Figure 1, one thing becomes immediately clear: there is no such thing as a “typical” Airbnb price. Most listings in every city cluster below $300, but a small number of luxury units stretch far into the upper tail. Hawaii and Rhode Island, in particular, lean much more heavily into these high-end outliers, reflecting their vacation-driven markets. This early snapshot already hints at a key theme of the analysis: that averages alone can be misleading when most real market activity happens at much lower prices.

Figure 2: Data was provided by Kaggle.

In Figure 2, the story shifts from price to what is actually being rented, and the answer is overwhelmingly full homes and apartments. While early concerns suggested that Airbnb would primarily compete with hotels, these patterns suggest the platform instead competes most directly with the long-term rental housing market, intensifying housing scarcity rather than redistributing tourist demand. As our troubled resident said, “damage to […] neighborhoods” indeed.

To investigate what we found in Figure 1 and see if that has any ramifications on our stats, I plotted Figure 3 and Figure 4, and our problem becomes even clearer.

Figure 3: Data was provided by Kaggle.
Figure 4: Data was provided by Kaggle.

In Figure 3, that distortion becomes unmistakable: in every city, the mean sits well above the median, pulled upward by a small number of extremely expensive listings. This tells us that “average” Airbnb prices often reflect the luxury fringe more than the typical stay. For cities debating whether to expand or restrict short-term rentals, this is an important find, because regulation aimed at the average may miss where most residents and renters are actually affected. The same inequality becomes even more visible when prices are plotted on a log scale in Figure 4.

Before jumping into distance and pricing, I mapped every listing by joining its coordinates to state boundaries and coloring points by price—basically creating a heatmap of tourism pressure. Each city revealed its own Airbnb fingerprint: Figure 5 glowing along the coasts, Figure 8 clustering around beaches and college towns, Figure 6 lighting up the Westside, Figure 9 hugging the urban core, and Figure 7 radiating from Manhattan. These patterns set the stage for understanding how price behaves once we factor in distance.

Figure 5: Data was provided by Kaggle.
Figure 6: Data was provided by Kaggle.
Figure 7: Data was provided by Kaggle.
Figure 8: Data was provided by Kaggle.
Figure 9: Data was provided by Kaggle.

Across every city, Airbnb activity is overwhelmingly coastal or tied to high-amenity areas, reinforcing the platform’s role in transforming prime-location housing into short-term tourist stock. The densest clusters appear exactly where long-term renters already face tight markets. These patterns set the stage for the rest of the analysis: once we know where Airbnb concentrates, it becomes much easier to understand how prices scale with distance, neighborhood identity, and demand.

So now let’s evaluate within-city differentials even further. I created an equation where I first matched every Airbnb listing to its city’s geographic center and then calculated the straight-line distance between the listing and that center using latitude and longitude (for Hawaii I used Honolulu, and for Rhode Island I used Providence). This gives a distance measure in kilometers for every single listing, allowing me to move from a visual map-based intuition to a precise, comparable spatial gradient across cities.

Figure 10: Data was provided by Kaggle.

Now that each listing has a measurable distance from its city center, Figure 10 lets us ask a classic urban economics question in a very modern context: does Airbnb follow the same price gradients as traditional housing? In New York City, Los Angeles, and Seattle, the answer is largely yes! Prices tend to be highest near the urban core and generally fall as listings move farther away. This mirrors long-established central business district theory and Marshallian forces of urban agglomeration, where proximity to jobs, transit, and cultural amenities commands a premium.

However, New York City reveals a surprising second rise in prices at farther distances from the center. This pattern likely reflects the influence of waterfront and destination neighborhoods, where access to beaches, harbors, and skyline views can substitute for centrality itself. In other words, a listing in a far–out but scenic area may command just as much as one near the Upper East Side. Meanwhile, Hawaii and Rhode Island tell an entirely different story: here, prices do not decline with distance at all, and in some cases actually increase. These markets are clearly governed by tourism rather than commuting, with beach access and resort proximity driving value instead of downtown access.

Together, these patterns show that Airbnb does not behave like a single unified market. In some cities it follows traditional urban housing logic; in others it operates more like a spatially distributed hotel economy. From a policy perspective, this distinction is crucial, because it suggests that regulation designed for commuter cities may fail entirely in tourism-driven markets, even when both are labeled “urban.” Therefore, one cannot be too hasty with the label of “urban”.

After seeing how price changes with distance from the city center, the next natural question is: where exactly within cities does Airbnb concentrate, and how expensive are said areas?

Figure 11: Data was provided by Kaggle.
Figure 12: Data was provided by Kaggle.

Figure 11 shows that average prices spike in centrally branded and tourism-oriented districts like Downtown LA, Maui, Waikiki, and Newport, placing them in a completely different price tier than most residential areas. This explains to us how strongly Airbnb pricing is tied to destination value.

Figure 12 shifts from price to volume, revealing that the largest clusters of listings appear both in tourism hubs like Clark County (Las Vegas) and Hawaii’s urban centers, and in dense residential neighborhoods such as Williamsburg, Harlem, Bushwick, and Bedford-Stuyvesant. Together, these two plots show that Airbnb pressure is strongest where high prices and high listing density overlap, helping explain why these same neighborhoods often sit at the center of housing and regulatory debates.

Finally, Figure 13 answers the question that ties everything together: which Airbnbs are actually doing the most market work?

Figure 13: Data was provided by Kaggle.

The answer is unmistakable. Lower-priced listings receive far more reviews, signaling higher booking volume and faster turnover. In the 0–200 range especially, review activity is dense, while luxury units appear far quieter by comparison. By using reviews as a proxy for demand, this pattern reveals that the economic engine of Airbnb is not its elite properties, but its high-volume, affordable listings. These are the units that most directly substitute for long-term rental housing, and therefore the most dangerous ones when we think about the housing market.

Part 2: The post-pandemic shuffle: tracking Airbnb’s urban power shifts

If Part 1 showed us what Airbnb looked like, Part 2 asks the more complex question: how is it changing, and who feels the impact? From 2020 to 2023, Airbnb markets didn’t just recover - they completely shifted. I start by comparing cities to see which ones opened the door wider to short-term rentals and which pulled back.

Then I zoom into neighborhoods, tracking joint changes in price and demand to reveal the winners and losers of Airbnb’s post-pandemic reshuffling. Some areas surged with higher prices and heavier traffic, while others saw demand evaporate. These shifts highlight where Airbnb pressures local housing markets most.

Finally, I identify the neighborhoods with the steepest increases and declines: places where Airbnb is reshaping economic activity fast enough that regulation either becomes essential or potentially disruptive.

I begin by comparing the top 10 cities with Airbnb listings, and seeing how many listings were added (or subtracted) from 2020 totals.

Figure 14: Data was provided by Kaggle.

Airbnb listings surged back in nearly every major city between 2020 and 2023 as shown by Figure 14, signaling a strong post-pandemic rebound in tourism and hosting. Los Angeles and Broward County, in particular, explode in scale, reflecting lighter regulation and fast-recovering travel demand. New York City is the outlier: its listings drop, almost certainly driven by Local Law 18 (2022), a de facto ban that forces host registration, requires on-site presence for short stays, and blocks platforms from processing unregistered bookings to combat housing shortages.

We understand what the change looks like from a high level, but let’s zoom in more. I first wanted to see where price had increased (an economist’s first question always).

Figure 15: Data was provided by Kaggle.

What’s interesting about Figure 15 is the two completely different worlds represented. On one side, neighborhoods like North Lawndale (Chicago), Bay Terrace & Encanto (San Diego), and High Point (Seattle) are historically under-resourced, residential, or actively gentrifying areas. A 100%+ jump in Airbnb prices in these places is not just a market shift. It may be a signal that short-term rentals may be accelerating housing pressure in communities already vulnerable to displacement.

On the other side, you have Carson (Los Angeles), Colonial Village (Washington D.C.), and Atherton (San Mateo County), which are extremely wealthy or high property-value areas where rising prices reflect a very different mechanism: constrained supply and strong demand rather than neighborhood-level vulnerability. These are places where Airbnb increases don’t threaten housing stability in the same way, but instead highlight how lucrative the platform becomes in already exclusive markets.

After how interesting these results were, I wanted to look at the inverse - neighborhoods that had fallen in average Airbnb price.

Figure 16: Data was provided by Kaggle.

Figure 16 surprised me even more. Downtown Columbus, Denver’s Civic Center, and several Portland neighborhoods are not what I think of when I think “tourist hotspot”. They may be more tied to business travel or local traffic that might have just not rebounded post-2020. Altogether, it hints at this growing split in the Airbnb landscape: some neighborhoods are recovering and even booming, while others feel like they’re still trying to find their footing.

If price is fluctating, perhaps we can look at our demand proxy to understand the different kinds of markets here. I mapped both the neighborhoods with the highest increases and decreases in reviews.

Figure 17: Data was provided by Kaggle.
Figure 18: Data was provided by Kaggle.

Figure 17 once again shows us how unexpected the hotspots are. Neighborhoods like Mesquite and South Prescott, which are not exactly classic tourist hubs, are seeing huge jumps in reviews, suggesting travelers are drifting into cheaper or previously overlooked areas. At the same time, wealthy places like Atherton and Hillsboro Beach also appear, hinting that demand is growing at both the budget end and the luxury end of the market.

The decline side (Figure 18) paints a different picture: several LA neighborhoods (Pico-Union, East Hollywood, Pico-Robertson) all show steep drops in reviews, and New York and Seattle have similar cases. These feel like places where tourism hasn’t bounced back, or where supply may have outpaced demand. It again looks like Airbnb’s post-pandemic recovery is stretching in two very different directions. In my brain, cities were falling into distinct categories - some were seeing increased demand and pricing, others the opposite, and still others increased demand/falling prices or decreased demand/increasing prices. I couldn’t understand this, so I turned to graphing to help me.

Figure 19: Data was provided by Kaggle.

The quadrant plot finally made the patterns snap into place. Neighborhoods weren’t just rising or falling; they were clustering into four distinct market worlds:

  1. “Gold Rushes” (Higher price, higher demand): These are neighborhoods riding a full post-pandemic rebound: prices are climbing, tourists are coming. Tourism, gentrification pressure, and city spillover are often all happening here.

  2. “Bargain Surges” (Lower price, higher demand): Here, demand is rising but prices have softened. These feel like the value-hunters magents where travelers go when premium markets get too expensive.

  3. “Luxury Goes Light” (Higher price, lower demand): These markets might be hitting affordability ceilings, overshooting demand, or aging out of traveler preferences - price has gone up but demand has gone down.

  4. “Demand Crashes” (Lower price, lower demand): Tourists left, hosts cut prices, and the market didn’t bounce back. Many of the LA and New York declines fall here, suggesting neighborhood-specific regulatory concerns or broader urban recovery challenges.

Together, these four market personalities finally made the story legible: Airbnb’s post-2020 recovery wasn’t one narrative: it was four, unfolding at the same time, often within the same metro area.

Conclusion

This analysis showed that Airbnb’s impact is hyper-local. I was shocked to see the biggest price jumps in under-resourced or gentrifying neighborhoods, while demand exploded in places far outside classic tourist zones. The quadrant plot made it clear: neighborhoods are experiencing four completely different Airbnb trajectories, from booming “Gold Rushes” to severe “Demand Crashes.”

These differences are critical for policymaking. Citywide rules, like caps, taxes, or registration laws, won’t land evenly. Vulnerable neighborhoods may need stronger protections, while wealthier or tourism-oriented areas might be better served by targeted fees or enforcement rather than broad bans (as NYC’s Local Law 18 demonstrated).

A natural next step would be linking these patterns to rent changes, host types, or demographic shifts to understand who is most at risk. My core takeaway: Airbnb regulation must be neighborhood-specific, because Airbnb’s impact is too uneven to govern with a single citywide approach.

References

Kritik Seth, “US Airbnb Open Data,” Kaggle (2023) https://www.kaggle.com/datasets/kritikseth/us-airbnb-open-data

Thibeault, J. (2024, June 24). Airbnb: A plague in American neighborhoods? Medium. Retrieved from https://medium.com/_jasonthibeault/airbnb-a-plague-in-american-neighborhoods-e453974410fe

Appendix

This dataset required quite a lot of cleaning, so let me go through a little bit of the process! To prepare my dataset for analysis:

  • I first addressed missing and improperly formatted entries. The neighbourhood_group variable contained blank strings instead of actual missing values, so I converted those to NA.
  • I then kept one version that had the neighborhood groups listed as NA and one version that had all observations with NA for neighborhood_group removed.
  • Several key variables required type conversion: last_review was transformed from a character string into a true date using lubridate::dmy(), and the categorical variables (room_type, city, and neighbourhood_group) were converted into factors for clearer interpretation.
  • I then removed host_name, a text-heavy identifier column that provides no analytical value.
  • To ensure realistic modeling, I filtered out listings with a price of $0 and trimmed extreme outliers by removing observations above the 99th percentile of both price and minimum_nights.
  • Finally, I replaced missing values in reviews_per_month with zero, which is appropriate because listings with no reviews simply have no monthly review activity.

Thank you for an amazing quarter! I really learned so much in this class. Have a great break!