当用列表过滤df时，str.contents()，但某些列表项包含2个或更多单词

我正在使用列表筛选列，并且一直在使用

str.contains("".format("|".join(towns))

这适用于像"；Atlanta"；，但不是"；纽约"；因为它正在分别寻找纽约和纽约。有办法绕过这个吗？

可复制示例-它们都返回True：

array = ["New Jersey", "Atlanta", "New York", "Washington"]
df = pd.DataFrame({"col1": array})
towns = ["Atlanta", "New York"]
df["col1"].str.contains("".format("|".join(towns)))

对于您的示例数据Series.isin有效。

>>> df["col1"].isin(towns)
0    False
1     True
2     True
3    False
Name: col1, dtype: bool

如果The Series有点不同，并且您需要使用正则表达式：

>>> dg = pd.DataFrame({"col1": ["New Jersey","Atlanta","New York",
"Washington", "The New York Times"]})
>>> dg
col1
0          New Jersey
1             Atlanta
2            New York
3          Washington
4  The New York Times
>>>
>>> rex = "|".join(towns)
>>> dg['col1'].str.contains(rex)
0    False
1     True
2     True
3    False
4     True
Name: col1, dtype: bool

>>> df
col1
0  New Jersey
1     Atlanta
2    New York
3  Washington

试试这个；

import pandas as pd 
array = ["New Jersey", "Atlanta", "New York", "Washington","New York City"]
df = pd.DataFrame({"col1": array})
towns = ["Atlanta", "New York"]
df["Town Check"] = df['col1'].apply(lambda x: len([i for i in towns if i in x]))
df1 = df[df["Town Check"] > 0]
del df1["Town Check"]
df1.index = range(0,df1.shape[0])

df1的输出；

col1
0        Atlanta
1       New York
2  New York City

相关内容

最新更新

热门标签：