我有这个DF,我试图合并任何两行类似的workDate
和ID
。我不知道这个DF有多少列。它可能有数百列,所以我正在寻找一种方法来合并而不必指定所有列名
|workDate |ID | Hours |Groundsman |names2 |Teachers |Profs
---------------------------------------------------------------------------------
0 |2020-01-09 |13702 | 1.0 | Ted | | |
1 |2020-01-09 |13702 | 1.0 | |Seline | |
2 |2020-01-10 |13702 | 20.0 | |Ted,Sam | |
3 |2020-01-10 |13702 | 20.0 | | |Pete,Norm,Tim |
4 |2020-01-10 |13702 | 20.0 | | | |Joe
所需输出:
|workDate |ID | Hours |Groundsman |names2 |Teachers |Profs
---------------------------------------------------------------------------------
0 |2020-01-09 |13702 | 1.0 | Ted |Seline | |
1 |2020-01-10 |13702 | 20.0 | |Ted,Sam |Pete,Norm,Tim |Joe
我已经尝试了多种分组方式,但没有得到我想要的输出。正如我提到的,DF可能有很多其他列,所以我不能使用期望包含所有列名的解决方案。它基本上是按workDate
和ID
分组但我也想保留所有值在其他列
Try:
import pandas as pd
import numpy as np #only required because the example df creation needs np.nan
df = pd.DataFrame({'workDate': {0: '2020-01-09',
1: '2020-01-09',
2: '2020-01-10',
3: '2020-01-10',
4: '2020-01-10'},
'ID': {0: 13702, 1: 13702, 2: 13702, 3: 13702, 4: 13702},
' Hours': {0: 1.0, 1: 1.0, 2: 20.0, 3: 20.0, 4: 20.0},
'Groundsman': {0: ' Ted', 1: np.nan, 2: np.nan, 3: np.nan, 4: np.nan},
'names2': {0: np.nan, 1: 'Seline', 2: 'Ted,Sam', 3: np.nan, 4: np.nan},
'Teachers': {0: np.nan, 1: np.nan, 2: np.nan, 3: 'Pete,Norm,Tim', 4: np.nan},
'Profs': {0: np.nan, 1: np.nan, 2: np.nan, 3: np.nan, 4: 'Joe'}})
print(df.groupby(by=['workDate', 'ID']).last().reset_index())
输出:
workDate ID Hours Groundsman names2 Teachers Profs
0 2020-01-09 13702 1.0 Ted Seline None None
1 2020-01-10 13702 20.0 None Ted,Sam Pete,Norm,Tim Joe