按给定的 geojson 对包含纬度/液化天然气的熊猫数据帧进行分段



我有一个包含列latlng列的DataFrame。我还有FeatureCollection包含多边形的 geojson。给定此多边形,如何分割df并以有效的方式仅选择给定多边形内的行?我想避免循环遍历df并手动检查每个元素。

d = {'lat' : [0,0.1,-0.1,0.4],
'lng' : [50,50.1,49.6,49.5]}

df = pd.DataFrame(d)

这是显示 1 个面和 4 个点的要素集合。如您所见,只有最后一点在外面。

{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
0,
49
],
[
0.6,
50
],
[
0.1,
52
],
[
-1,
51
],
[
0,
49
]
]
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0,
50
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0.1,
50.1
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
-0.1,
49.6
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0.4,
49.5
]
}
}
]
}

此地图将显示面和点。

编辑: 以下是我目前拥有的代码,但正如您所料,它非常慢。

from shapely.geometry import shape, Point
# check each polygon to see if it contains the point
for feature in feature_collection['features']:
polygon = shape(feature['geometry'])
for index, row in dfr.iterrows():
point = Point(row.location_lng, row.location_lat)
if polygon.contains(point):
print('Found containing polygon:', feature)

dfr是我的DataFrame包含location_latlocation_lng.feature_collection是一个只有多边形的 geojson 特征集合(请注意,上面的 geojson 示例仅用于解释问题,它只有 1 个多边形,并且有一些点来说明问题)。

假设你有数据帧dfr如下:

location_lat  location_lng
0           0.0          50.0
1           0.1          50.1
2          -0.1          49.6
3           0.4          49.5

以及包含多边形的feature_collection,例如:

{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [[[0,49],[0.6,50],[0.1,52],[-1,51],[0,49]]]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [[[0,50],[0.6,50],[0.1,52],[-1,51],[0,50]]]
}
}]
}

我将第二个多边形中的 49 更改为 50 以删除其中的其他点。

您可以先创建一个包含dfr中的点的列:

#using Point from shapely and apply
from shapely.geometry import Point
dfr['point'] = dfr[['location_lat', 'location_lng']].apply(Point,axis=1)
#or use MultiPoint faster
from shapely.geometry import MultiPoint
dfr['point'] = list(MultiPoint(dfr[['location_lat', 'location_lng']].values))

第二种方法在小数据帧上似乎更快,所以我甚至会把这种方法用于更大的数据帧。

现在,您可以为每个feature_collection面创建一个列,其中包含该点是否属于该要素,我想通过循环它们:

from shapely.geometry import shape
for i, feature in enumerate(feature_collection['features']):
dfr['feature_{}'.format(i)] = list(map(shape(feature['geometry']).contains,dfr['point']))

然后dfr看起来像:

location_lat  location_lng              point  feature_0  feature_1
0           0.0          50.0       POINT (0 50)       True      False
1           0.1          50.1   POINT (0.1 50.1)       True       True
2          -0.1          49.6  POINT (-0.1 49.6)       True      False
3           0.4          49.5   POINT (0.4 49.5)      False      False

要选择属于要素的点,请执行以下操作:

print (dfr.loc[dfr['feature_1'],['location_lat', 'location_lng']])
location_lat  location_lng
1           0.1          50.1

最新更新