通过给定的单个值列表查找数据帧中的所有行



我有这个数据帧,我想从中获取子集,这个数据帧中的一列称为节点,它是一个id数组。

type         id  lat  lon       tags.name  tags.highway nodes tags.maxspeed   tags.surface
5956  way    6985834  NaN  NaN            None      motorway  [45754482, 6342434412, 835929794, 835929795, 8...           100        asphalt
5957  way    6995407  NaN  NaN     Breemeentje  unclassified  [45643091, 45643497, 45644332, 45645616, 45654...            60           None
5958  way    6995429  NaN  NaN  Frans Halslaan   residential                               [45735685, 45737360]            30  paving_stones
5959  way    6995430  NaN  NaN        Eemhoeve       service  [45728947, 4995231625, 4995231624, 4995231623,...          None           None
5960  way    6995431  NaN  NaN  de Ruijterlaan   residential                               [45727385, 45728969]          None           None
...   ...        ...  ...  ...             ...           ...                                                ...           ...            ...
7611  way  759373165  NaN  NaN            None       footway  [943684026, 943683892, 943683902, 943684103, 9...          None          grass
7612  way  759373167  NaN  NaN            None       footway                            [7093157390, 943684026]          None          grass
7613  way  759373949  NaN  NaN            None       footway  [943684024, 943683886, 943683973, 943684158, 9...          None          grass
7614  way  759373952  NaN  NaN            None       footway                  [943684070, 943683935, 943684024]          None          grass
7615  way  759373953  NaN  NaN            None       footway  [7093157347, 943684066, 7093157350, 943684170,...          None          grass
[1660 rows x 9 columns]

其中 roadsInBound 是类似数据帧的子集,该数据帧具有具有 lon/lat 坐标的各个节点。

type          id        lat       lon tags.name tags.highway nodes tags.maxspeed tags.surface
580   node    45706236  52.207980  5.288690      None         None  None          None         None
588   node    45706631  52.208070  5.284730      None         None  None          None         None
591   node    45706825  52.208100  5.289390      None         None  None          None         None
599   node    45707053  52.208160  5.289680      None         None  None          None         None
610   node    45707746  52.208353  5.284343      None         None  None          None         None
611   node    45707748  52.208311  5.285264      None         None  None          None         None
619   node    45708108  52.208420  5.285910      None         None  None          None         None
622   node    45708160  52.208431  5.284128      None         None  None          None         None

所以我尝试从另一个列表中检索节点列中包含一个或多个 id 的所有行,如下所示。

out = df.loc[(df['nodes'].isin(roadsInBound['id']))]

由此产生的错误是

TypeError: unhashable type: 'list'
...
...
File "pandas_libshashtable_func_helper.pxi", line 445, in pandas._libs.hashtable.ismember_object
SystemError: `<built-in method view of numpy.ndarray object at 0x000001BB3DE29EE0`> returned a result with an error set

我真的不知道该怎么做,因为我对数据帧相当陌生。 所以任何建议都会很棒。

您正在尝试隐式检查df.nodes条目的所有元素的isin条件,而熊猫无法理解。你可以继续将标准的python set方法映射isdisjoint(并否定它(,如下所示:
简单的例子

import pandas as pd
test = {1,2,3}
df = pd.DataFrame({                                                             
'A': ['a','b','c','d'],                                                     
'B': [[1,2,4],[4,5,6],[1,2,3],[3,4,5]]}) 
df.loc[~df.B.map(test.isdisjoint)]

您的申请

out = df.loc[ ~df['nodes'].map(set(roadsInBound['id']).isdisjoint) ]

最新更新