Python,Pandas数据帧,合并具有相同两列值的数据帧行,并在行中聚合数据



我有这个DF,我试图合并任何两行类似的workDateID。我不知道这个DF有多少列。它可能有数百列,所以我正在寻找一种方法来合并而不必指定所有列名

|workDate   |ID       | Hours |Groundsman  |names2  |Teachers       |Profs
---------------------------------------------------------------------------------
0   |2020-01-09 |13702    | 1.0   |   Ted      |        |               |
1   |2020-01-09 |13702    | 1.0   |            |Seline  |               |
2   |2020-01-10 |13702    | 20.0  |            |Ted,Sam |               |
3   |2020-01-10 |13702    | 20.0  |            |        |Pete,Norm,Tim  |
4   |2020-01-10 |13702    | 20.0  |            |        |               |Joe

所需输出:

|workDate   |ID       | Hours |Groundsman  |names2  |Teachers       |Profs
---------------------------------------------------------------------------------
0   |2020-01-09 |13702    | 1.0   |   Ted      |Seline  |               |
1   |2020-01-10 |13702    | 20.0  |            |Ted,Sam |Pete,Norm,Tim  |Joe

我已经尝试了多种分组方式,但没有得到我想要的输出。正如我提到的,DF可能有很多其他列,所以我不能使用期望包含所有列名的解决方案。它基本上是按workDateID分组但我也想保留所有值在其他列

Try:

import pandas as pd
import numpy as np #only required because the example df creation needs np.nan
df = pd.DataFrame({'workDate': {0: '2020-01-09',
1: '2020-01-09',
2: '2020-01-10',
3: '2020-01-10',
4: '2020-01-10'},
'ID': {0: 13702, 1: 13702, 2: 13702, 3: 13702, 4: 13702},
' Hours': {0: 1.0, 1: 1.0, 2: 20.0, 3: 20.0, 4: 20.0},
'Groundsman': {0: '   Ted', 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan},
'names2': {0: np.nan, 1: 'Seline', 2: 'Ted,Sam', 3: np.nan, 4: np.nan},
'Teachers': {0: np.nan, 1: np.nan, 2: np.nan, 3: 'Pete,Norm,Tim', 4: np.nan},
'Profs': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: 'Joe'}})
print(df.groupby(by=['workDate', 'ID']).last().reset_index())

输出:

workDate     ID   Hours Groundsman   names2       Teachers Profs
0  2020-01-09  13702     1.0        Ted   Seline           None  None
1  2020-01-10  13702    20.0       None  Ted,Sam  Pete,Norm,Tim   Joe

最新更新