在循环中从打印输出填充变量



我有一个如下的数据帧:

df = pd.DataFrame(
data=
[['22 take away', 'something'],
['takeaway 56', 'I see'],
['45 takeaway street', ' This is blue'],
['right street', ' This is white']],
columns=['V1', 'V2']
)

我想使用regex模式将V1中的数字提取到该数据帧中的一个单独变量中。到目前为止,我有以下信息:

pattern =  r'd{1,2}'
for i in df.V1:
num = re.search(pattern, i)
if num:
print(num.group(0))

这打印出了数字,但到目前为止,我试图将这些数字分隔成一个变量都是错误的。我的目标是拥有以下数据帧:

dfgoal = pd.DataFrame(
data=
[['22 take away', 'something', '22'],
['takeaway 56', 'I see', '56'],
['45 takeaway street', ' This is blue', '45'],
['right street', ' This is white', ' ']],
columns=['V1', 'V2', 'V3']
)

非常感谢!

不要将print值而是将append值分配给某些list,然后将该list分配给df["V3"]

当找不到num时,请记住添加空字符串

import pandas as pd
import re
df = pd.DataFrame(
data=[
['22 take away', 'something'],
['takeaway 56', 'I see'],
['45 takeaway street', ' This is blue'],
['right street', ' This is white']
],
columns=['V1', 'V2']
)
# ------------
results = []
pattern =  r'd{1,2}'
for i in df.V1:
num = re.search(pattern, i)
if num:
#print(num.group(0))
results.append(num.group(0))
else:
results.append("")

df['V3'] = results
# ------------
print(df)

结果:

V1              V2  V3
0        22 take away       something  22
1         takeaway 56           I see  56
2  45 takeaway street    This is blue  45
3        right street   This is white 

编辑:

使用.str.extract(pattern(更简单(正如@Barmar在评论中建议的那样(。

但模式需要( )来获取价值。

当它找不到数字时,它放NaN,它需要.fillna("")用空字符串替换NaN

pattern =  r'(d{1,2})'

df['V3'] = df['V1'].str.extract(pattern).fillna("")

最新更新