获取pandas数据框架中每个唯一元素的起始日期和最后日期



我有一个这样的数据框架:

日期,IMEI3022275

使用GroupBy.aggfirstlastminmax的名称聚合:

df = (df.groupby(['A-Party','IMEI'], as_index=False)
.agg(**{'First Date':('Date & Time', 'first'),
'Last Date': ('Date & Time', 'last')}))
df.insert(1,'Total IMEI', df.groupby('A-Party')['A-Party'].transform('size'))
print (df)
A-Party  Total IMEI IMEI          First Date            Last Date
0  3022275           2    A  2021-05-01 7:20:05   2022-08-01 6:20:05
1  3022275           2    B  2021-03-02 9:20:05  2022-05-01 10:20:05

如果需要空格(不推荐,因为可能会与字符串''混合):

df.loc[df.duplicated(['A-Party','Total IMEI']), ['A-Party','Total IMEI']] = ''
print (df)
A-Party Total IMEI IMEI          First Date            Last Date
0  3022275          2    A  2021-05-01 7:20:05   2022-08-01 6:20:05
1                        B  2021-03-02 9:20:05  2022-05-01 10:20:05

如果需要,MultiIndex:

df = (df.groupby(['A-Party','IMEI'], as_index=False)
.agg(**{'First Date':('Date & Time', 'first'),
'Last Date': ('Date & Time', 'last')}))
df.insert(1,'Total IMEI', df.groupby('A-Party')['A-Party'].transform('size'))
df = df.set_index(['A-Party','Total IMEI','IMEI'])
print (df)
First Date            Last Date
A-Party Total IMEI IMEI                                         
3022275 2          A     2021-05-01 7:20:05   2022-08-01 6:20:05
B     2021-03-02 9:20:05  2022-05-01 10:20:05

编辑:添加GroupBy.size到命名聚合:

df = (df.groupby(['A-Party','IMEI'], as_index=False)
.agg(**{'Total IMEI':('IMEI', 'size'),
'First Date':('Date & Time', 'first'),
'Last Date': ('Date & Time', 'last')}))
print (df)
A-Party IMEI  Total IMEI          First Date            Last Date
0  3022275    A           3  2021-05-01 7:20:05   2022-08-01 6:20:05
1  3022275    B           3  2021-03-02 9:20:05  2022-05-01 10:20:05
df = df.set_index(['A-Party','Total IMEI','IMEI'])
print (df)
First Date            Last Date
A-Party Total IMEI IMEI                                         
3022275 3          A     2021-05-01 7:20:05   2022-08-01 6:20:05
B     2021-03-02 9:20:05  2022-05-01 10:20:05

EDIT1:

df = (df.assign(dates = df['Date & Time'].dt.date)
.groupby(['A-Party','IMEI'], as_index=False)
.agg(**{'Total IMEI':('IMEI', 'size'),
'No unique days': ('dates', 'nunique'),
'First Date':('Date & Time', 'first'),
'Last Date': ('Date & Time', 'last')}))
print (df)

我只是手动形成每个列,然后创建DataFrame。

A_Party = [df.iloc[0,0],'']
Total_IMEI = [df['IMEI'].nunique(),'']
IMEI = df['IMEI'].unique()
First_Date = df.groupby('IMEI')['Date & Time'].first()
Last_Date = df.groupby('IMEI')['Date & Time'].last()
col = ['A-Party','Total IMEI','IMEI','First Date','Last Date']
df1 = pd.DataFrame(np.array([A_Party,Total_IMEI,IMEI,First_Date,Last_Date]).T,columns=col)
df1
2022-08-01 6:20:052022-05-01 10:20:05

相关内容

  • 没有找到相关文章

最新更新