我目前有一个这样的数据帧(df(:
name info
alpha foo,bar
alpha bar,foo
beta foo,bar
beta bar,foo
beta baz,qux
我希望创建这样的数据帧:
name info
alpha (foo,bar),(bar,foo)
beta (foo,bar),(bar,foo),(baz,qux)
我正在接近 groupby.apply(list(。例如。
new_df=df.groupby('name')['info'].apply(list)
但是,我似乎无法弄清楚如何以原始数据帧格式获取输出。(即有两列(如示例(
我想我需要reset_index
和unstack
?感谢任何帮助!
尝试使用以下循环for
:
uniqnames = df.name.unique() # get unique names
newdata = [] # data list for output dataframe
for u in uniqnames: # for each unique name
subdf = df[df.name == u] # get rows with this unique name
s = ""
for i in subdf['info']:
s += "("+i+")," # join all info cells for that name
newdata.append([u, s[:-1]]) # remove trailing comma from infos & add row to data list
newdf = pd.DataFrame(data=newdata, columns=['name','info'])
print(newdf)
输出完全符合预期:
name info
0 alpha (foo,bar),(bar,foo)
1 beta (foo,bar),(bar,foo),(baz,qux)
IIUC
df.assign(info='('+df['info']+')').groupby('name')['info'].apply(','.join).to_frame('info')
Out[267]:
info
name
alpha (foo,bar),(bar,foo)
beta (foo,bar),(bar,foo),(baz,qux)
#df.assign(info='('+df['info']+')')# adding the ( and ) for your single string to match with the out put
#groupby('name')# group by the name, you need merge info under the same name
#apply(','.join).to_frame('info') # this will combine each info into one string under the same group
IIUC:
df = pd.DataFrame({'name':['alpha']*2+['beta']*3,
'info':[['foo','bar'],['bar','foo'],
['foo','bar'],['bar','foo'],
['baz','qux']]})
print(df)
因努普特:
info name
0 [foo, bar] alpha
1 [bar, foo] alpha
2 [foo, bar] beta
3 [bar, foo] beta
4 [baz, qux] beta
现在,分组并应用然后 reset_index(( 返回数据帧:
new_df = df.groupby('name')['info'].apply(list)
new_df = new_df.reset_index()
print(new_df)
输出:
name info
0 alpha [[foo, bar], [bar, foo]]
1 beta [[foo, bar], [bar, foo], [baz, qux]]