我想提取符号->
之前的所有数字。现在我只有这个:
df['New'] = df['Companies'].str.findall(r'(d+(?:.d+)?)').str[-1]
它只提取最后一个->
之前的数字
我把它稍微修改成这样:
df['New'] = df['Companies'].str.findall(r'(d+(?:.d+)?)')
但我没有得到我想要的,相反,我想要类似的东西:
Companies New New2 New3
0 -> Company A 100->Company B 60->Company C 80->... 100 60 80
1 -> Company A 100->Company B 53.1->Company C 82... 100 53.1 82
2 -> Company A 100->Company B 23-> Company D 100 23
3 -> Company 1 100->Company B 30-> Company D 100 30
请注意,New's
可以超过 3 列,具体取决于字符串中有多少->
。此外,一些Company
名称的名称中包含整数,我不想将其包含在新列中。
你能帮我这个吗?
在->
之前将Series.str.extractall
与Series.unstack
一起使用,DataFrame.add_prefix
与 catch 整数或float
s 一起使用:
pat = r'(d*.d+|d+.?)->'
df = df.join(df['Companies'].str.extractall(pat)[0].unstack().add_prefix('New'))
print (df)
Companies New0 New1 New2
0 -> Company A 100->Company B 60->Company C 80-> 100 60 80
1 -> Company A 100->Company B 53.1->Company C 82 100 53.1 NaN
2 -> Company A 100->Company B 23-> Company D ... 100 23 NaN
3 -> Company 1 100->Company B 30-> Company D 100 30 NaN
如果需要浮动:
df = df.join(df['Companies'].str.extractall(pat)[0].astype(float).unstack().add_prefix('New'))
print (df)
Companies New0 New1 New2
0 -> Company A 100->Company B 60->Company C 80-> 100.0 60.0 80.0
1 -> Company A 100->Company B 53.1->Company C 82 100.0 53.1 NaN
2 -> Company A 100->Company B 23-> Company D ... 100.0 23.0 NaN
3 -> Company 1 100->Company B 30-> Company D 100.0 30.0 NaN