我有一个熊猫DataFrame
,如下所示。
import pandas as pd
import numpy as np
df = pd.DataFrame([['Bay of Plenty', 'Bell Rd, Nukuhou','Nukuhou, Bay of Plenty'],[1.0, 0.5,1.0]]).T
df.columns = ['col1','col2']
col1 col2
0 Bay of Plenty 1
1 Bell Rd, Nukuhou 0.5
2 Nukuhou, Bay of Plenty 1
我想要得到以下输出。
col1 sum
Bay of Plenty 2.0
Nukuhou 1.5
Bell Rd 0.5
我尝试了以下方法。
df["splited"]=df["col1"].str.split(",")
df = (df.explode("splited").reset_index(drop=True))
col1 col2 splited
0 Bay of Plenty 1 Bay of Plenty
1 Bell Rd, Nukuhou 0.5 Bell Rd
2 Bell Rd, Nukuhou 0.5 Nukuhou
3 Nukuhou, Bay of Plenty 1 Nukuhou
4 Nukuhou, Bay of Plenty 1 Bay of Plenty
df.groupby(['splited']).sum().reset_index()
但这不是给总数吗?
您可以使用空间,
:通过,
进行split
#whitespaces
print(df["col1"].str.split(",").tolist())
[['Bay of Plenty'], ['Bell Rd', ' Nukuhou'], ['Nukuhou', ' Bay of Plenty']]
^^^ ^^^
#no whitespaces
print(df["col1"].str.split(", ").tolist())
[['Bay of Plenty'], ['Bell Rd', 'Nukuhou'], ['Nukuhou', 'Bay of Plenty']]
df["splited"]=df["col1"].str.split(", ")
df = df.explode("splited")
df = df.groupby('splited')['col2'].sum().reset_index()
print(df)
splited col2
0 Bay of Plenty 2.0
1 Bell Rd 0.5
2 Nukuhou 1.5
另一个想法是使用Series.str.strip
来去除尾部空白:
df["splited"]=df["col1"].str.split(",")
df = df.explode("splited")
df = df.groupby(df['splited'].str.strip())['col2'].sum().reset_index()
print(df)
splited col2
0 Bay of Plenty 2.0
1 Bell Rd 0.5
2 Nukuhou 1.5
编辑:
如果需要在,
后拆分一个或没有空格,则可以使用regex:
df = pd.DataFrame([['Bay of Plenty', 'Bell Rd, Nukuhou',
'Nukuhou,Bay of Plenty'],[1.0, 0.5,1.0]]).T
df.columns = ['col1','col2']
df["splited"]=df["col1"].str.split(",s*")