将正则表达式应用于熊猫数据帧列



我试图应用一些我已经编码并且可以针对变量运行的正则表达式,但我想将其应用于数据帧列,然后将结果传递给新列

df["Details"] is my dataframe

df["详细信息"]是我的数据帧,它包含一些类似于我在下面创建的文本作为详细信息

import re
details = '1st: Batman 01:12.98 11.5L'
position = re.search('www:s', details)
distance = re.search('(sdd.[0-9]L)', details)
time = re.search(r'd{2}:d{2}.d{2}',details)
print(position.group(0))
print(distance.group(0))
print(time.group(0))
output is then 
    1st: 
    11.5L
    01:12.98

我希望能够将这些值添加到数据框中分别匹配输出的位置、距离、时间的新列中

我相信你需要Series.str.extract

details = '1st: Batman 01:12.98 11.5L'
df = pd.DataFrame({"Details":[details,details,details]})
df['position'] = df['Details'].str.extract(r'(www:s)')
df['distance'] = df['Details'].str.extract(r'(sdd.[0-9]L)')
df['time'] = df['Details'].str.extract(r'(d{2}:d{2}.d{2})')
print(df)
                      Details position distance      time
0  1st: Batman 01:12.98 11.5L    1st:     11.5L  01:12.98
1  1st: Batman 01:12.98 11.5L    1st:     11.5L  01:12.98
2  1st: Batman 01:12.98 11.5L    1st:     11.5L  01:12.98

在 lambda 函数中应用提取:

df['position'] = df['Details'].apply(lambda x: str(x).extract(r'(www:s)')))
df['distance'] = df['Details'].apply(lambda x: str(x).extract(r'(sdd.[0-9]L)'))
df['time'] = df['Details'].apply(lambda x: str(x).extract(r'(d{2}:d{2}.d{2})'))

最新更新