GeoPandas的核心数据结构是geopandas.GeoDataFrame,它是pandas.DataFrame的子类,可以存储几何列并执行空间操作。geopandas.GeoSeries是pandas.Series的一个子类,处理几何图形。因此,GeoDataFrame是pandas.Series和geopandas.GeoSeries的组合,前者是传统数据(数字、布尔值、文本等),后者是几何图形(点、多边形等)。
每个 GeoSeries可以包含任何几何体类型(你甚至可以在一个数组中混合它们),并且有一个GeoSeries.crs属性,它存储了关于投影的信息(CRS代表坐标参考系统)。因此,GeoDataFrame中的每个GeoSeries都可以使用不同的投影,例如,允许你对同一个几何体有多个版本(不同的投影)。
一个 GeoDataFrame中只有一个GeoSeries被认为是 _active_的几何体,这意味着所有应用于 GeoDataFrame的几何操作都在这个active 列上操作。
读写文件
Code
from geodatasets import get_path
import geopandas
path_to_data = get_path("nybb" )
gdf = geopandas.read_file(path_to_data)
gdf
0
5
Staten Island
330470.010332
1.623820e+09
MULTIPOLYGON (((970217.022 145643.332, 970227....
1
4
Queens
896344.047763
3.045213e+09
MULTIPOLYGON (((1029606.077 156073.814, 102957...
2
3
Brooklyn
741080.523166
1.937479e+09
MULTIPOLYGON (((1021176.479 151374.797, 102100...
3
1
Manhattan
359299.096471
6.364715e+08
MULTIPOLYGON (((981219.056 188655.316, 980940....
4
2
Bronx
464392.991824
1.186925e+09
MULTIPOLYGON (((1012821.806 229228.265, 101278...
Code
gdf.to_file("my_file.geojson" , driver= "GeoJSON" )
简单访问器和方法
测量面积
Code
gdf = gdf.set_index("BoroName" )
gdf["area" ] = gdf.area
gdf["area" ]
BoroName
Staten Island 1.623822e+09
Queens 3.045214e+09
Brooklyn 1.937478e+09
Manhattan 6.364712e+08
Bronx 1.186926e+09
Name: area, dtype: float64
获得多边形边界和中心点
Code
#获得每个多边形的边界(LineString),访问GeoDataFrame.boundary。
gdf['boundary' ] = gdf.boundary
gdf['boundary' ]
BoroName
Staten Island MULTILINESTRING ((970217.022 145643.332, 97022...
Queens MULTILINESTRING ((1029606.077 156073.814, 1029...
Brooklyn MULTILINESTRING ((1021176.479 151374.797, 1021...
Manhattan MULTILINESTRING ((981219.056 188655.316, 98094...
Bronx MULTILINESTRING ((1012821.806 229228.265, 1012...
Name: boundary, dtype: geometry
将边界保存为一个新的列,现在在同一个GeoDataFrame中有两个几何列。
还可以创建新的几何体,例如,可以是原始几何体的缓冲版本(即GeoDataFrame.buffer(10))或其中心点。
Code
gdf['centroid' ] = gdf.centroid
gdf['centroid' ]
BoroName
Staten Island POINT (941639.45 150931.991)
Queens POINT (1034578.078 197116.604)
Brooklyn POINT (998769.115 174169.761)
Manhattan POINT (993336.965 222451.437)
Bronx POINT (1021174.79 249937.98)
Name: centroid, dtype: geometry
Code
gdf['buffer' ] = gdf.buffer (5 )
gdf['buffer' ]
BoroName
Staten Island MULTIPOLYGON (((970219.129 145648.05, 970227.6...
Queens MULTIPOLYGON (((994983.514 209031.804, 994974....
Brooklyn MULTIPOLYGON (((1030445.277 166507.837, 103044...
Manhattan MULTIPOLYGON (((1008144.925 258094.305, 100814...
Bronx MULTIPOLYGON (((1047075.607 249822.682, 104707...
Name: buffer, dtype: geometry
测量距离
Code
first_point = gdf['centroid' ].iloc[0 ]
gdf['distance' ] = gdf['centroid' ].distance(first_point)
gdf['distance' ]
BoroName
Staten Island 0.000000
Queens 103781.535276
Brooklyn 61674.893421
Manhattan 88247.742789
Bronx 126996.283623
Name: distance, dtype: float64
制作地图
Code
gdf.plot("area" , legend= True )
#绘制了活动几何体列,并按 "面积"列进行颜色编码。我们还想显示一个图例(legend=True)。
也可以使用GeoDataFrame.explore()交互式地探索你的数据,其行为与plot()相同,但会返回一个交互式地图。
Code
import matplotlib.pyplot as plt
import folium
import mapclassify
gdf.explore("area" , legend= False )
Make this Notebook Trusted to load map: File -> Trust Notebook
将活动的几何图形(GeoDataFrame.set_geometry)转换为中心点,我们可以用点的几何图形绘制相同的数据。
Code
gdf = gdf.set_geometry("centroid" )
gdf.plot("area" , legend= True )
我们也可以将两个GeoSeries分层在彼此之上。我们只需要把一个作为另一个的轴。
Code
ax = gdf["geometry" ].plot()
gdf["centroid" ].plot(ax= ax, color= "black" )
把活动的几何体设置回原来的GeoSeries。
Code
gdf = gdf.set_geometry("geometry" )
创建几何图形
凸面体
Code
gdf["convex_hull" ] = gdf.convex_hull
Code
ax = gdf["convex_hull" ].plot(alpha= .5 ) # saving the first plot as an axis and setting alpha (transparency) to 0.5
gdf["boundary" ].plot(ax= ax, color= "white" , linewidth= .5 ) # passing the first plot and setting linewitdth to 0.5
缓冲区
Code
# buffering the active geometry by 10 000 feet (geometry is already in feet)
gdf["buffered" ] = gdf.buffer (10000 )
# buffering the centroid geometry by 10 000 feet (geometry is already in feet)
gdf["buffered_centroid" ] = gdf["centroid" ].buffer (10000 )
Code
ax = gdf["buffered" ].plot(alpha= .5 ) # saving the first plot as an axis and setting alpha (transparency) to 0.5
gdf["buffered_centroid" ].plot(ax= ax, color= "red" , alpha= .5 ) # passing the first plot as an axis to the second
gdf["boundary" ].plot(ax= ax, color= "white" , linewidth= .5 ) # passing the first plot and setting linewitdth to 0.5
几何关系
可以询问不同几何体的空间关系。使用上面的几何图形,我们可以检查哪些缓冲区与布鲁克林的原始几何图形相交,即与布鲁克林相距10 000英尺以内。
首先,得到一个布鲁克林的多边形。
Code
brooklyn = gdf.loc["Brooklyn" , "geometry" ]
brooklyn
Code
shapely.geometry.multipolygon.MultiPolygon
Code
gdf["buffered" ].intersects(brooklyn)
#检查gdf["buffered"]中的哪些几何图形与之相交。
BoroName
Staten Island True
Queens True
Brooklyn True
Manhattan True
Bronx False
dtype: bool
Code
#检查哪些缓冲中心点完全在原始区的多边形内。
# 在这种情况下,两个GeoSeries都被对齐,并对每一行进行检查。
# 通过 gdf["buffered_centroid"].within(gdf),逐行检查每个缓冲中心点(buffered_centroid)是否在其对应的原始多边形(gdf)内。
# 将布尔结果(True或False)存储在新列 gdf["within"] 中。
# 最后返回 gdf["within"] 列,显示所有缓冲中心点是否在原始多边形内的结果。
gdf["within" ] = gdf["buffered_centroid" ].within(gdf)
gdf["within" ]
BoroName
Staten Island True
Queens True
Brooklyn False
Manhattan False
Bronx False
Name: within, dtype: bool
Code
# 设置GeoDataFrame的几何列为空间对象"buffered_centroid"
gdf = gdf.set_geometry("buffered_centroid" )
# 绘制GeoDataFrame的"within"列,使用分类图,并添加图例,设置图例位置为左上角
ax = gdf.plot("within" , legend= True , categorical= True , legend_kwds= {'loc' : "upper left" })
#设置变量 categorical 为 True,表示该变量用于标记数据是否为分类数据。
# legend_kwds 是一个字典,用于传递额外的关键字参数给图例函数(如 plt.legend() 或 GeoPandas 中的 legend 方法)。
# 字典中的键 loc 表示图例的位置。
# 值 "upper left" 指定了图例应放置在图表的左上角。
# 在同一坐标轴上绘制GeoDataFrame的"boundary"列,使用黑色线条,线宽为0.5
gdf["boundary" ].plot(ax= ax, color= "black" , linewidth= .5 ) # passing the first plot and setting linewitdth to 0.5
投影
每个GeoSeries都有其坐标参考系统(CRS),可在 GeoSeries.crs中访问。CRS告诉GeoPandas这些几何图形的坐标在地球表面的位置。在某些情况下,CRS是地理的,这意味着坐标是经纬度的。在这些情况下,其CRS是WGS84,授权代码为EPSG:4326。
Code
<Projected CRS: EPSG:2263>
Name: NAD83 / New York Long Island (ftUS)
Axis Info [cartesian]:
- X[east]: Easting (US survey foot)
- Y[north]: Northing (US survey foot)
Area of Use:
- name: United States (USA) - New York - counties of Bronx; Kings; Nassau; New York; Queens; Richmond; Suffolk.
- bounds: (-74.26, 40.47, -71.8, 41.3)
Coordinate Operation:
- name: SPCS83 New York Long Island zone (US survey foot)
- method: Lambert Conic Conformal (2SP)
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich
Geometries在EPSG:2263中,坐标为英尺。可以使用GeoSeries.to_crs()轻松地将GeoSeries重新投射到另一个CRS,如EPSG:4326。
Code
gdf = gdf.set_geometry("geometry" )
boroughs_4326 = gdf.to_crs("EPSG:4326" )
boroughs_4326.plot()
Code
<Geographic 2D CRS: EPSG:4326>
Name: WGS 84
Axis Info [ellipsoidal]:
- Lat[north]: Geodetic latitude (degree)
- Lon[east]: Geodetic longitude (degree)
Area of Use:
- name: World.
- bounds: (-180.0, -90.0, 180.0, 90.0)
Datum: World Geodetic System 1984 ensemble
- Ellipsoid: WGS 84
- Prime Meridian: Greenwich
请注意沿图中轴线的坐标差异。以前我们有120 000 - 280 000(英尺),现在有40.5 - 40.9(度)。在这种情况下,boroughs_4326有一个WGS84的 “geometry”列,但所有其他的(包括中心点等)仍然在原来的CRS中。
Code
BoroName
Staten Island
5
330470.010332
1.623820e+09
MULTIPOLYGON (((-74.05051 40.56642, -74.05047 ...
1.623822e+09
MULTILINESTRING ((970217.022 145643.332, 97022...
POINT (941639.45 150931.991)
0.000000
MULTIPOLYGON (((970219.129 145648.05, 970227.6...
POLYGON ((915517.688 120121.881, 915467.035 12...
POLYGON ((903234.894 123347.784, 903178.057 12...
POLYGON ((951639.45 150931.991, 951591.298 149...
True
Queens
4
896344.047763
3.045213e+09
MULTIPOLYGON (((-73.83668 40.59495, -73.83678 ...
3.045214e+09
MULTILINESTRING ((1029606.077 156073.814, 1029...
POINT (1034578.078 197116.604)
103781.535276
MULTIPOLYGON (((994983.514 209031.804, 994974....
POLYGON ((1000721.532 136681.776, 994611.996 2...
POLYGON ((1066963.473 157602.686, 1067059.264 ...
POLYGON ((1044578.078 197116.604, 1044529.926 ...
True
Brooklyn
3
741080.523166
1.937479e+09
MULTIPOLYGON (((-73.86706 40.58209, -73.86769 ...
1.937478e+09
MULTILINESTRING ((1021176.479 151374.797, 1021...
POINT (998769.115 174169.761)
61674.893421
MULTIPOLYGON (((1030445.277 166507.837, 103044...
POLYGON ((988872.821 146772.032, 983670.606 14...
POLYGON ((962679.12 165570.385, 962651.33 1658...
POLYGON ((1008769.115 174169.761, 1008720.962 ...
False
Manhattan
1
359299.096471
6.364715e+08
MULTIPOLYGON (((-74.01093 40.68449, -74.01193 ...
6.364712e+08
MULTILINESTRING ((981219.056 188655.316, 98094...
POINT (993336.965 222451.437)
88247.742789
MULTIPOLYGON (((1008144.925 258094.305, 100814...
POLYGON ((977855.445 188082.322, 971830.134 19...
POLYGON ((980499.119 178448.735, 979864.868 17...
POLYGON ((1003336.965 222451.437, 1003288.812 ...
False
Bronx
2
464392.991824
1.186925e+09
MULTIPOLYGON (((-73.89681 40.79581, -73.89694 ...
1.186926e+09
MULTILINESTRING ((1012821.806 229228.265, 1012...
POINT (1021174.79 249937.98)
126996.283623
MULTIPOLYGON (((1047075.607 249822.682, 104707...
POLYGON ((1017949.978 225426.885, 1015563.562 ...
POLYGON ((992724.911 240962.362, 992700.941 24...
POLYGON ((1031174.79 249937.98, 1031126.637 24...
False
对于依赖距离或面积的操作,需要使用投影CRS(米、英尺、公里等)而不是地理CRS(度)。GeoPandas操作是平面的,而度数反映的是球体上的位置。因此,使用度的空间操作可能不会产生正确的结果。例如,gdf.area.sum()(投影CRS)的结果是8 429 911 572英尺,但boroughs_4326.area.sum()(地理CRS)的结果是0.083。