Author

LiuChenshu

Published

February 24, 2025

GeoPandas的核心数据结构是geopandas.GeoDataFrame,它是pandas.DataFrame的子类,可以存储几何列并执行空间操作。geopandas.GeoSeries是pandas.Series的一个子类,处理几何图形。因此,GeoDataFrame是pandas.Series和geopandas.GeoSeries的组合,前者是传统数据(数字、布尔值、文本等),后者是几何图形(点、多边形等)。 image.png

每个 GeoSeries可以包含任何几何体类型(你甚至可以在一个数组中混合它们),并且有一个GeoSeries.crs属性,它存储了关于投影的信息(CRS代表坐标参考系统)。因此,GeoDataFrame中的每个GeoSeries都可以使用不同的投影,例如,允许你对同一个几何体有多个版本(不同的投影)。

一个 GeoDataFrame中只有一个GeoSeries被认为是 _active_的几何体,这意味着所有应用于 GeoDataFrame的几何操作都在这个active 列上操作。

1 读写文件

Code
from geodatasets import get_path
import geopandas

path_to_data = get_path("nybb")
gdf = geopandas.read_file(path_to_data)

gdf
BoroCode BoroName Shape_Leng Shape_Area geometry
0 5 Staten Island 330470.010332 1.623820e+09 MULTIPOLYGON (((970217.022 145643.332, 970227....
1 4 Queens 896344.047763 3.045213e+09 MULTIPOLYGON (((1029606.077 156073.814, 102957...
2 3 Brooklyn 741080.523166 1.937479e+09 MULTIPOLYGON (((1021176.479 151374.797, 102100...
3 1 Manhattan 359299.096471 6.364715e+08 MULTIPOLYGON (((981219.056 188655.316, 980940....
4 2 Bronx 464392.991824 1.186925e+09 MULTIPOLYGON (((1012821.806 229228.265, 101278...
Code
gdf.to_file("my_file.geojson", driver="GeoJSON")

2 简单访问器和方法

2.1 测量面积

Code
gdf = gdf.set_index("BoroName")
gdf["area"] = gdf.area
gdf["area"]
BoroName
Staten Island    1.623822e+09
Queens           3.045214e+09
Brooklyn         1.937478e+09
Manhattan        6.364712e+08
Bronx            1.186926e+09
Name: area, dtype: float64

2.2 获得多边形边界和中心点

Code
#获得每个多边形的边界(LineString),访问GeoDataFrame.boundary。
gdf['boundary'] = gdf.boundary
gdf['boundary']
BoroName
Staten Island    MULTILINESTRING ((970217.022 145643.332, 97022...
Queens           MULTILINESTRING ((1029606.077 156073.814, 1029...
Brooklyn         MULTILINESTRING ((1021176.479 151374.797, 1021...
Manhattan        MULTILINESTRING ((981219.056 188655.316, 98094...
Bronx            MULTILINESTRING ((1012821.806 229228.265, 1012...
Name: boundary, dtype: geometry

将边界保存为一个新的列,现在在同一个GeoDataFrame中有两个几何列。

还可以创建新的几何体,例如,可以是原始几何体的缓冲版本(即GeoDataFrame.buffer(10))或其中心点。

Code
gdf['centroid'] = gdf.centroid
gdf['centroid']
BoroName
Staten Island      POINT (941639.45 150931.991)
Queens           POINT (1034578.078 197116.604)
Brooklyn          POINT (998769.115 174169.761)
Manhattan         POINT (993336.965 222451.437)
Bronx              POINT (1021174.79 249937.98)
Name: centroid, dtype: geometry
Code
gdf['buffer'] = gdf.buffer(5)
gdf['buffer']
BoroName
Staten Island    MULTIPOLYGON (((970219.129 145648.05, 970227.6...
Queens           MULTIPOLYGON (((994983.514 209031.804, 994974....
Brooklyn         MULTIPOLYGON (((1030445.277 166507.837, 103044...
Manhattan        MULTIPOLYGON (((1008144.925 258094.305, 100814...
Bronx            MULTIPOLYGON (((1047075.607 249822.682, 104707...
Name: buffer, dtype: geometry

2.3 测量距离

Code
first_point = gdf['centroid'].iloc[0]
gdf['distance'] = gdf['centroid'].distance(first_point)
gdf['distance']
BoroName
Staten Island         0.000000
Queens           103781.535276
Brooklyn          61674.893421
Manhattan         88247.742789
Bronx            126996.283623
Name: distance, dtype: float64
Code
gdf['distance'].mean()
76140.09102166798

3 制作地图

Code
gdf.plot("area", legend=True)
#绘制了活动几何体列,并按 "面积"列进行颜色编码。我们还想显示一个图例(legend=True)。

也可以使用GeoDataFrame.explore()交互式地探索你的数据,其行为与plot()相同,但会返回一个交互式地图。

Code
import matplotlib.pyplot as plt
import folium
import mapclassify
gdf.explore("area", legend=False)
Make this Notebook Trusted to load map: File -> Trust Notebook

将活动的几何图形(GeoDataFrame.set_geometry)转换为中心点,我们可以用点的几何图形绘制相同的数据。

Code
gdf = gdf.set_geometry("centroid")
gdf.plot("area", legend=True)

我们也可以将两个GeoSeries分层在彼此之上。我们只需要把一个作为另一个的轴。

Code
ax = gdf["geometry"].plot()
gdf["centroid"].plot(ax=ax, color="black")

把活动的几何体设置回原来的GeoSeries。

Code
gdf = gdf.set_geometry("geometry")

4 创建几何图形

4.1 凸面体

Code
gdf["convex_hull"] = gdf.convex_hull
Code
ax = gdf["convex_hull"].plot(alpha=.5)  # saving the first plot as an axis and setting alpha (transparency) to 0.5
gdf["boundary"].plot(ax=ax, color="white", linewidth=.5)  # passing the first plot and setting linewitdth to 0.5

4.2 缓冲区

Code
# buffering the active geometry by 10 000 feet (geometry is already in feet)
gdf["buffered"] = gdf.buffer(10000)

# buffering the centroid geometry by 10 000 feet (geometry is already in feet)
gdf["buffered_centroid"] = gdf["centroid"].buffer(10000)
Code
ax = gdf["buffered"].plot(alpha=.5)  # saving the first plot as an axis and setting alpha (transparency) to 0.5
gdf["buffered_centroid"].plot(ax=ax, color="red", alpha=.5)  # passing the first plot as an axis to the second
gdf["boundary"].plot(ax=ax, color="white", linewidth=.5)  # passing the first plot and setting linewitdth to 0.5

5 几何关系

可以询问不同几何体的空间关系。使用上面的几何图形,我们可以检查哪些缓冲区与布鲁克林的原始几何图形相交,即与布鲁克林相距10 000英尺以内。

首先,得到一个布鲁克林的多边形。

Code
brooklyn = gdf.loc["Brooklyn", "geometry"]
brooklyn

Code
type(brooklyn)
shapely.geometry.multipolygon.MultiPolygon
Code
gdf["buffered"].intersects(brooklyn)
#检查gdf["buffered"]中的哪些几何图形与之相交。
BoroName
Staten Island     True
Queens            True
Brooklyn          True
Manhattan         True
Bronx            False
dtype: bool
Code
#检查哪些缓冲中心点完全在原始区的多边形内。
# 在这种情况下,两个GeoSeries都被对齐,并对每一行进行检查。
# 通过 gdf["buffered_centroid"].within(gdf),逐行检查每个缓冲中心点(buffered_centroid)是否在其对应的原始多边形(gdf)内。
# 将布尔结果(True或False)存储在新列 gdf["within"] 中。
# 最后返回 gdf["within"] 列,显示所有缓冲中心点是否在原始多边形内的结果。
gdf["within"] = gdf["buffered_centroid"].within(gdf)
gdf["within"]
BoroName
Staten Island     True
Queens            True
Brooklyn         False
Manhattan        False
Bronx            False
Name: within, dtype: bool
Code
# 设置GeoDataFrame的几何列为空间对象"buffered_centroid"
gdf = gdf.set_geometry("buffered_centroid")

# 绘制GeoDataFrame的"within"列,使用分类图,并添加图例,设置图例位置为左上角
ax = gdf.plot("within", legend=True, categorical=True, legend_kwds={'loc': "upper left"})
#设置变量 categorical 为 True,表示该变量用于标记数据是否为分类数据。
# legend_kwds 是一个字典,用于传递额外的关键字参数给图例函数(如 plt.legend() 或 GeoPandas 中的 legend 方法)。
# 字典中的键 loc 表示图例的位置。
# 值 "upper left" 指定了图例应放置在图表的左上角。
# 在同一坐标轴上绘制GeoDataFrame的"boundary"列,使用黑色线条,线宽为0.5
gdf["boundary"].plot(ax=ax, color="black", linewidth=.5)  # passing the first plot and setting linewitdth to 0.5

6 投影

每个GeoSeries都有其坐标参考系统(CRS),可在 GeoSeries.crs中访问。CRS告诉GeoPandas这些几何图形的坐标在地球表面的位置。在某些情况下,CRS是地理的,这意味着坐标是经纬度的。在这些情况下,其CRS是WGS84,授权代码为EPSG:4326。

Code
gdf.crs
<Projected CRS: EPSG:2263>
Name: NAD83 / New York Long Island (ftUS)
Axis Info [cartesian]:
- X[east]: Easting (US survey foot)
- Y[north]: Northing (US survey foot)
Area of Use:
- name: United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk.
- bounds: (-74.26, 40.47, -71.8, 41.3)
Coordinate Operation:
- name: SPCS83 New York Long Island zone (US survey foot)
- method: Lambert Conic Conformal (2SP)
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich

Geometries在EPSG:2263中,坐标为英尺。可以使用GeoSeries.to_crs()轻松地将GeoSeries重新投射到另一个CRS,如EPSG:4326。

Code
gdf = gdf.set_geometry("geometry")
boroughs_4326 = gdf.to_crs("EPSG:4326")
boroughs_4326.plot()

Code
boroughs_4326.crs
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich

请注意沿图中轴线的坐标差异。以前我们有120 000 - 280 000(英尺),现在有40.5 - 40.9(度)。在这种情况下,boroughs_4326有一个WGS84的 “geometry”列,但所有其他的(包括中心点等)仍然在原来的CRS中。

Code
boroughs_4326
BoroCode Shape_Leng Shape_Area geometry area boundary centroid distance buffer convex_hull buffered buffered_centroid within
BoroName
Staten Island 5 330470.010332 1.623820e+09 MULTIPOLYGON (((-74.05051 40.56642, -74.05047 ... 1.623822e+09 MULTILINESTRING ((970217.022 145643.332, 97022... POINT (941639.45 150931.991) 0.000000 MULTIPOLYGON (((970219.129 145648.05, 970227.6... POLYGON ((915517.688 120121.881, 915467.035 12... POLYGON ((903234.894 123347.784, 903178.057 12... POLYGON ((951639.45 150931.991, 951591.298 149... True
Queens 4 896344.047763 3.045213e+09 MULTIPOLYGON (((-73.83668 40.59495, -73.83678 ... 3.045214e+09 MULTILINESTRING ((1029606.077 156073.814, 1029... POINT (1034578.078 197116.604) 103781.535276 MULTIPOLYGON (((994983.514 209031.804, 994974.... POLYGON ((1000721.532 136681.776, 994611.996 2... POLYGON ((1066963.473 157602.686, 1067059.264 ... POLYGON ((1044578.078 197116.604, 1044529.926 ... True
Brooklyn 3 741080.523166 1.937479e+09 MULTIPOLYGON (((-73.86706 40.58209, -73.86769 ... 1.937478e+09 MULTILINESTRING ((1021176.479 151374.797, 1021... POINT (998769.115 174169.761) 61674.893421 MULTIPOLYGON (((1030445.277 166507.837, 103044... POLYGON ((988872.821 146772.032, 983670.606 14... POLYGON ((962679.12 165570.385, 962651.33 1658... POLYGON ((1008769.115 174169.761, 1008720.962 ... False
Manhattan 1 359299.096471 6.364715e+08 MULTIPOLYGON (((-74.01093 40.68449, -74.01193 ... 6.364712e+08 MULTILINESTRING ((981219.056 188655.316, 98094... POINT (993336.965 222451.437) 88247.742789 MULTIPOLYGON (((1008144.925 258094.305, 100814... POLYGON ((977855.445 188082.322, 971830.134 19... POLYGON ((980499.119 178448.735, 979864.868 17... POLYGON ((1003336.965 222451.437, 1003288.812 ... False
Bronx 2 464392.991824 1.186925e+09 MULTIPOLYGON (((-73.89681 40.79581, -73.89694 ... 1.186926e+09 MULTILINESTRING ((1012821.806 229228.265, 1012... POINT (1021174.79 249937.98) 126996.283623 MULTIPOLYGON (((1047075.607 249822.682, 104707... POLYGON ((1017949.978 225426.885, 1015563.562 ... POLYGON ((992724.911 240962.362, 992700.941 24... POLYGON ((1031174.79 249937.98, 1031126.637 24... False

对于依赖距离或面积的操作,需要使用投影CRS(米、英尺、公里等)而不是地理CRS(度)。GeoPandas操作是平面的,而度数反映的是球体上的位置。因此,使用度的空间操作可能不会产生正确的结果。例如,gdf.area.sum()(投影CRS)的结果是8 429 911 572英尺,但boroughs_4326.area.sum()(地理CRS)的结果是0.083。