我遇到了以下问题。我想动态创建查询来过滤panda数据帧,而无需事先手动指定列。
我发现了以下代码片段,所有数据的类型都是String:
columnList=[col1, col1, col1, col2]
filterList=[va1, val2, val3, val4]
query = ' & '.join(f'`{i}!=repr{k}' for i, k in zip(columnList,filterList))
df=df.query(query)
然而,当我运行我的代码时,得到的数据帧看起来像是所有的过滤器都被一个接一个地应用了,但不是作为一个统一的"过滤器";过滤规则链";通过"";操作人员看起来我会在;NAND";
在熊猫中,and
和&
都应该像逻辑NOR一样工作,对吗?
使用datar
,您可以直接操作用于筛选的表达式:
>>> from datar.all import f, tibble, filter
[2022-03-28 10:49:15][datar][WARNING] Builtin name "filter" has been overriden by datar.
>>> df = tibble(
... c1 = ["a", "b", "c", "d", "e"],
... c2 = ["a", "b", "c", "d", "e"],
... )
>>>
>>> columnList=["c1", "c1", "c1", "c2"]
>>> filterList=["a", "b", "c", "d"]
>>>
>>> expr = None
>>> for col, val in zip(columnList, filterList):
... if expr is None:
... expr = f[col] != val
... else:
... expr = expr & (f[col] != val)
...
>>> df >> filter(expr)
c1 c2
<object> <object>
4 e e