从给定日期记录创建开始日期和结束日期列



假定我的数据集

Name date
A    15-01-01
A    15-01-31
A    16-02-02
A    16-03-04
B    17-04-05
B    17-05-08
B    17-07-09
C    18-01-02
C    18-02-03

我想将每个人的第一个和最后一个约会添加到一个新列中。

结果是我想要的

Name date     startdate enddate
A    15-01-01 15-01-01  16-03-04
A    15-01-31 15-01-01  16-03-04
A    16-02-02 15-01-01  16-03-04
A    16-03-04 15-01-01  16-03-04
B    17-04-05 17-04-05  17-07-09
B    17-05-08 17-04-05  17-07-09
B    17-07-09 17-04-05  17-07-09
C    18-01-02 18-01-02  18-02-03
C    18-02-03 18-01-02  18-02-03

有没有一种方法可以做到这一点?如果您让我知道,我将不胜感激。

感谢您的阅读

GroupBy.transformGroupBy.firstGroupBy.last一起使用并添加到新列:

g = df.groupby('Name')['date']
df = df.assign(startdate = g.transform('first'), enddate = g.transform('last'))
print (df)
Name      date startdate   enddate
0    A  15-01-01  15-01-01  16-03-04
1    A  15-01-31  15-01-01  16-03-04
2    A  16-02-02  15-01-01  16-03-04
3    A  16-03-04  15-01-01  16-03-04
4    B  17-04-05  17-04-05  17-07-09
5    B  17-05-08  17-04-05  17-07-09
6    B  17-07-09  17-04-05  17-07-09
7    C  18-01-02  18-01-02  18-02-03
8    C  18-02-03  18-01-02  18-02-03

或者如果需要最小和最大与GroupBy.minGroupBy.max

df['date'] = pd.to_datetime(df['date'], format='%y-%m-%d')
g = df.groupby('Name')['date']
df = df.assign(startdate = g.transform('min'), enddate = g.transform('max'))
print (df)
Name       date  startdate    enddate
0    A 2015-01-01 2015-01-01 2016-03-04
1    A 2015-01-31 2015-01-01 2016-03-04
2    A 2016-02-02 2015-01-01 2016-03-04
3    A 2016-03-04 2015-01-01 2016-03-04
4    B 2017-04-05 2017-04-05 2017-07-09
5    B 2017-05-08 2017-04-05 2017-07-09
6    B 2017-07-09 2017-04-05 2017-07-09
7    C 2018-01-02 2018-01-02 2018-02-03
8    C 2018-02-03 2018-01-02 2018-02-03

GroupBy.aggDataFrame.join的替代方案:

df1 = df.groupby('Name').agg(startdate = ('date','first'), lastdate = ('date','last'))
df = df.join(df1, on='Name')

有一种方法可以做到这一点。虽然我不是很确定,但它应该有效。

df['date'] = pd.to_datetime(df['date'], format='%y-%m-%d')
start_date =[]
end_date = []
first_ittr = True;
index = 0
for i in df['date']:
if(first_ittr):
start_date[index] = i
end_date[index]=i
first_ittr = False
elif(i>end_date):
end_date[index] = i
elif(i<start_date):
start_date[index] = i
df['startdate'] = start_date
df['enddate'] = end_date

最新更新