我有一个如下的数据帧:
df = pd.DataFrame(
data=
[['22 take away', 'something'],
['takeaway 56', 'I see'],
['45 takeaway street', ' This is blue'],
['right street', ' This is white']],
columns=['V1', 'V2']
)
我想使用regex模式将V1中的数字提取到该数据帧中的一个单独变量中。到目前为止,我有以下信息:
pattern = r'd{1,2}'
for i in df.V1:
num = re.search(pattern, i)
if num:
print(num.group(0))
这打印出了数字,但到目前为止,我试图将这些数字分隔成一个变量都是错误的。我的目标是拥有以下数据帧:
dfgoal = pd.DataFrame(
data=
[['22 take away', 'something', '22'],
['takeaway 56', 'I see', '56'],
['45 takeaway street', ' This is blue', '45'],
['right street', ' This is white', ' ']],
columns=['V1', 'V2', 'V3']
)
非常感谢!
不要将print
值而是将append
值分配给某些list
,然后将该list
分配给df["V3"]
当找不到num
时,请记住添加空字符串
import pandas as pd
import re
df = pd.DataFrame(
data=[
['22 take away', 'something'],
['takeaway 56', 'I see'],
['45 takeaway street', ' This is blue'],
['right street', ' This is white']
],
columns=['V1', 'V2']
)
# ------------
results = []
pattern = r'd{1,2}'
for i in df.V1:
num = re.search(pattern, i)
if num:
#print(num.group(0))
results.append(num.group(0))
else:
results.append("")
df['V3'] = results
# ------------
print(df)
结果:
V1 V2 V3
0 22 take away something 22
1 takeaway 56 I see 56
2 45 takeaway street This is blue 45
3 right street This is white
编辑:
使用.str.extract(pattern(更简单(正如@Barmar在评论中建议的那样(。
但模式需要( )
来获取价值。
当它找不到数字时,它放NaN
,它需要.fillna("")
用空字符串替换NaN
。
pattern = r'(d{1,2})'
df['V3'] = df['V1'].str.extract(pattern).fillna("")