我有一个列值,末尾是.
,比如New York .
。当我试图用边界线(b
(搜索同一个时,它给出了一个无效的结果。
请找到下面的代码片段。
# importing pandas as pd
import pandas as pd
# importing re for regular expressions
import re
# Creating the Series
sr = pd.Series(['The New York . City'])
# Creating the index
idx = ['City 1']
# set the index
sr.index = idx
# Print the series
print(sr)
# find if 'is' substring is present
result = sr.str.contains(pat = '\bNew York \.\b')
# print the result
print(result)
预期结果:
City 1 The New York . City
dtype: object
City 1 True
实际结果:
City 1 The New York . City
dtype: object
City 1 False
dtype: bool
使用
result = sr.str.contains(pat = '\bNew York \.')
而没有最终的CCD_ 4。正如文件所述:
\b
Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters. Note that
形式上,\b被定义为a\w和a\w之间的边界字符(反之亦然(,或介于\w和一串
由于句点不是单词字符,因此在句点后使用\b
将不匹配。如果需要确保单点后面有空白,请添加\s
。
为了你的理智,使用原始字符串,这可以避免双重转义:
result = sr.str.contains(pat = r'bNew York .')
(注意字符串前面的r
前缀。同样,请参阅文档。(