当字符串以结尾时，Pandas str.contains未给出有效结果.(点)沿着边界线

我有一个列值，末尾是.，比如New York .。当我试图用边界线(b(搜索同一个时，它给出了一个无效的结果。

请找到下面的代码片段。

# importing pandas as pd
import pandas as pd
# importing re for regular expressions
import re
# Creating the Series
sr = pd.Series(['The New York . City'])
# Creating the index
idx = ['City 1']
# set the index
sr.index = idx
# Print the series
print(sr)

# find if 'is' substring is present
result = sr.str.contains(pat = '\bNew York \.\b')
# print the result
print(result)

预期结果：

City 1    The New York . City
dtype: object
City 1    True

实际结果：

City 1    The New York . City
dtype: object
City 1    False
dtype: bool

使用

result = sr.str.contains(pat = '\bNew York \.')

而没有最终的CCD_ 4。正如文件所述：

\b
Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters. Note that
形式上，\b被定义为a\w和a\w之间的边界字符(反之亦然(，或介于\w和一串

由于句点不是单词字符，因此在句点后使用\b将不匹配。如果需要确保单点后面有空白，请添加\s。

为了你的理智，使用原始字符串，这可以避免双重转义：

result = sr.str.contains(pat = r'bNew York .')

(注意字符串前面的r前缀。同样，请参阅文档。(

相关内容

最新更新

热门标签：