我有一个包含列lat
和lng
列的DataFrame
。我还有FeatureCollection
包含多边形的 geojson。给定此多边形,如何分割df
并以有效的方式仅选择给定多边形内的行?我想避免循环遍历df
并手动检查每个元素。
d = {'lat' : [0,0.1,-0.1,0.4],
'lng' : [50,50.1,49.6,49.5]}
df = pd.DataFrame(d)
这是显示 1 个面和 4 个点的要素集合。如您所见,只有最后一点在外面。
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [
[
[
0,
49
],
[
0.6,
50
],
[
0.1,
52
],
[
-1,
51
],
[
0,
49
]
]
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0,
50
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0.1,
50.1
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
-0.1,
49.6
]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Point",
"coordinates": [
0.4,
49.5
]
}
}
]
}
此地图将显示面和点。
编辑: 以下是我目前拥有的代码,但正如您所料,它非常慢。
from shapely.geometry import shape, Point
# check each polygon to see if it contains the point
for feature in feature_collection['features']:
polygon = shape(feature['geometry'])
for index, row in dfr.iterrows():
point = Point(row.location_lng, row.location_lat)
if polygon.contains(point):
print('Found containing polygon:', feature)
dfr
是我的DataFrame
包含location_lat
和location_lng
.feature_collection
是一个只有多边形的 geojson 特征集合(请注意,上面的 geojson 示例仅用于解释问题,它只有 1 个多边形,并且有一些点来说明问题)。
假设你有数据帧dfr
如下:
location_lat location_lng
0 0.0 50.0
1 0.1 50.1
2 -0.1 49.6
3 0.4 49.5
以及包含多边形的feature_collection
,例如:
{
"type": "FeatureCollection",
"features": [
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [[[0,49],[0.6,50],[0.1,52],[-1,51],[0,49]]]
}
},
{
"type": "Feature",
"properties": {},
"geometry": {
"type": "Polygon",
"coordinates": [[[0,50],[0.6,50],[0.1,52],[-1,51],[0,50]]]
}
}]
}
我将第二个多边形中的 49 更改为 50 以删除其中的其他点。
您可以先创建一个包含dfr
中的点的列:
#using Point from shapely and apply
from shapely.geometry import Point
dfr['point'] = dfr[['location_lat', 'location_lng']].apply(Point,axis=1)
#or use MultiPoint faster
from shapely.geometry import MultiPoint
dfr['point'] = list(MultiPoint(dfr[['location_lat', 'location_lng']].values))
第二种方法在小数据帧上似乎更快,所以我甚至会把这种方法用于更大的数据帧。
现在,您可以为每个feature_collection
面创建一个列,其中包含该点是否属于该要素,我想通过循环它们:
from shapely.geometry import shape
for i, feature in enumerate(feature_collection['features']):
dfr['feature_{}'.format(i)] = list(map(shape(feature['geometry']).contains,dfr['point']))
然后dfr
看起来像:
location_lat location_lng point feature_0 feature_1
0 0.0 50.0 POINT (0 50) True False
1 0.1 50.1 POINT (0.1 50.1) True True
2 -0.1 49.6 POINT (-0.1 49.6) True False
3 0.4 49.5 POINT (0.4 49.5) False False
要选择属于要素的点,请执行以下操作:
print (dfr.loc[dfr['feature_1'],['location_lat', 'location_lng']])
location_lat location_lng
1 0.1 50.1