我有这个数据帧,我想从中获取子集,这个数据帧中的一列称为节点,它是一个id数组。
type id lat lon tags.name tags.highway nodes tags.maxspeed tags.surface
5956 way 6985834 NaN NaN None motorway [45754482, 6342434412, 835929794, 835929795, 8... 100 asphalt
5957 way 6995407 NaN NaN Breemeentje unclassified [45643091, 45643497, 45644332, 45645616, 45654... 60 None
5958 way 6995429 NaN NaN Frans Halslaan residential [45735685, 45737360] 30 paving_stones
5959 way 6995430 NaN NaN Eemhoeve service [45728947, 4995231625, 4995231624, 4995231623,... None None
5960 way 6995431 NaN NaN de Ruijterlaan residential [45727385, 45728969] None None
... ... ... ... ... ... ... ... ... ...
7611 way 759373165 NaN NaN None footway [943684026, 943683892, 943683902, 943684103, 9... None grass
7612 way 759373167 NaN NaN None footway [7093157390, 943684026] None grass
7613 way 759373949 NaN NaN None footway [943684024, 943683886, 943683973, 943684158, 9... None grass
7614 way 759373952 NaN NaN None footway [943684070, 943683935, 943684024] None grass
7615 way 759373953 NaN NaN None footway [7093157347, 943684066, 7093157350, 943684170,... None grass
[1660 rows x 9 columns]
其中 roadsInBound 是类似数据帧的子集,该数据帧具有具有 lon/lat 坐标的各个节点。
type id lat lon tags.name tags.highway nodes tags.maxspeed tags.surface
580 node 45706236 52.207980 5.288690 None None None None None
588 node 45706631 52.208070 5.284730 None None None None None
591 node 45706825 52.208100 5.289390 None None None None None
599 node 45707053 52.208160 5.289680 None None None None None
610 node 45707746 52.208353 5.284343 None None None None None
611 node 45707748 52.208311 5.285264 None None None None None
619 node 45708108 52.208420 5.285910 None None None None None
622 node 45708160 52.208431 5.284128 None None None None None
所以我尝试从另一个列表中检索节点列中包含一个或多个 id 的所有行,如下所示。
out = df.loc[(df['nodes'].isin(roadsInBound['id']))]
由此产生的错误是
TypeError: unhashable type: 'list'
...
...
File "pandas_libshashtable_func_helper.pxi", line 445, in pandas._libs.hashtable.ismember_object
SystemError: `<built-in method view of numpy.ndarray object at 0x000001BB3DE29EE0`> returned a result with an error set
我真的不知道该怎么做,因为我对数据帧相当陌生。 所以任何建议都会很棒。
您正在尝试隐式检查df.nodes
条目的所有元素的isin
条件,而熊猫无法理解。你可以继续将标准的python set方法映射isdisjoint
(并否定它(,如下所示:
简单的例子
import pandas as pd
test = {1,2,3}
df = pd.DataFrame({
'A': ['a','b','c','d'],
'B': [[1,2,4],[4,5,6],[1,2,3],[3,4,5]]})
df.loc[~df.B.map(test.isdisjoint)]
您的申请
out = df.loc[ ~df['nodes'].map(set(roadsInBound['id']).isdisjoint) ]