根据拆分后拆分的字符串中的元素筛选行(Pandas)

我有一个PandasDataFrame，其中包含一个用分号分隔的位置名称的列：

index   locations
39951   Credit; Mount Pleasant GO
40976   Ajax GO; Whitby GO; Credit; Oshawa GO; Bayly
14961   Credit; Mount Pleasant GO; Port Credit GO
...

我想做的是根据指定位置是否出现在以分号分隔的列表中进行筛选，方法是先拆分字符串(在;上(，然后检查列表中是否有位置。

使用str.contains()在这里不起作用，因为存在重叠的位置名称(例如，Credit同时出现在Credit和Port Credit中(，并且具有单个位置的行将没有;，因此我无法搜索Credit;。我试过之类的东西

df["Credit" in df.locations.str.split("; ")]

但这似乎不起作用。

有什么建议吗？

您可以使用正则表达式(^|;) *Credit(;|$)来确保模式在分隔符之间是独占的，因此Credit将位于字符串的开头或结尾，或者直接跟在分隔符;:后面

df
index                                     locations
0  39951                     Credit; Mount Pleasant GO
1  40976  Ajax GO; Whitby GO; Credit; Oshawa GO; Bayly
2  14961             Mount Pleasant GO; Port Credit GO
df.locations.str.contains('(^|;) *Credit(;|$)')
#0     True
#1     True
#2    False
#Name: locations, dtype: bool

如果您还想忽略大小写，请在模式中添加修饰符?i：

df.locations.str.contains('(?i)(^|;) *credit(;|$)')
#0     True
#1     True
#2    False
#Name: locations, dtype: bool

您可以尝试(不使用正则表达式(：

#split and explode the dataframe:
m=df['locations'].str.split('; ').explode()
#check your condition and get index where condition satisfies:
m=m[m.isin(['Credit'])].index.unique()
#Finally filter out dataframe:
out=df.loc[m]

现在，如果您打印out，您将获得过滤后的数据帧

相关内容

最新更新

热门标签：