尝试使用正则表达式只选择衣服的大小
所以我是新的Python,我试图选择行找到这些大小,但与其他词语混淆。我使用正则表达式,但未能获得所需的结果。
代码:
df = pd.DataFrame({"description":["Silver","Red","GOLD","Black Leather","S","L","S","XL","XXL","Noir Matt"," 150x160L","140M"]})
df.description.apply(lambda x : x if re.findall(r"(?!s+d+)(S|M|X*L)(?!s+d+)",str(x)) else np.nan).unique()
输出:
array(['Silver', nan, 'Black Leather', 'S', 'L', 'XL', 'XXL', 'Noir Matt',
' 150x160L', '140M'], dtype=object)
预期:
array([ 'S', 'L', 'XL', 'XXL',nan], dtype=object)
我认为你需要使用
import pandas as pd
df = pd.DataFrame({"description":["Silver","Red","GOLD","Black Leather","S","L","S","XL","XXL","Noir Matt"," 150x160L","140M"]})
df['description'][df['description'].str.match(r'^(?:S|M|X*L)$')].unique()
# => array(['S', 'L', 'XL', 'XXL'], dtype=object)
使用Series.str.match(r'^(?:S|M|X*L)$')
,将description
列中完全匹配S
、M
、零个或多个X
和L
值的部分子集化。