我需要只使用Python和Pandas来解决数据科学问题,其中给定的输入是入住日期、退房日期以及单个的每本书的付款
check_in check_out payment
2020-02-28 2020-03-02 66
2020-02-27 2020-02-29 100
我需要展示一下我之前从那两本书中获得的每日收入。我有一个想法,我必须把它分成每个日期,比如66/3,从02-28分发到03-02和100/2,然后从02-27分发到02-29,作为第二个订单,然后我会得到这样的结果
date earnings
2020-02-27 50
2020-02-28 22+50
2020-02-29 22
2020-03-01 22
如果我把总结起来,情况会是这样的
date earnings
2020-02-27 50
2020-02-28 72
2020-02-29 22
2020-03-01 22
我已经将付款列与输入分开,并使用下面的代码创建了一个新的数据框架,其中包含两个新列,分别表示客人使用day_stay停留的时间和我使用daily_earn获得的收入
df["date_stay"] = abs(df["check_in"] - df["check_out"]) # Get difference
df["date_stay"] = pd.to_numeric(df["date_stay"].dt.days) # Turn to days
df["daily_earn"] = df["payment"]/df["date_stay"]
然后结果将看起来像这个
check_in check_out payment day_stay daily_earn
2020-02-28 2020-03-02 66 3 22
2020-02-27 2020-02-29 100 2 50
最后一步是将daily_earn中的值分配到从02-27到03-01的相应日期,但我不知道该怎么做。我曾尝试使用df.groupby("check_in"(.sum((,但由于输入和输出之间的行数不同,它没有提供所需的输出。有什么想法吗?
让我们尝试一种不同的方法:
df['date'] = df.apply(
lambda r: pd.date_range(r['check_in'], r['check_out'], closed='left'),
axis=1)
df = df.explode('date')
df['earnings'] = df['payment'] / df.groupby(level=0)['date'].transform('count')
df = df.groupby('date', as_index=False)['earnings'].agg('sum')
df
:
date earnings
0 2020-02-27 50.0
1 2020-02-28 72.0
2 2020-02-29 22.0
3 2020-03-01 22.0
步骤分解:
apply
pd.date_range
到每行以获得开始和结束之间的天数:
df['date'] = df.apply(
lambda r: pd.date_range(r['check_in'], r['check_out'], closed='left'),
axis=1)
check_in check_out payment date
0 2020-02-28 2020-03-02 66 DatetimeIndex(['2020-02-28', '2020-02-29', '2020-03-01'], dtype='datetime64[ns]', freq='D')
1 2020-02-27 2020-02-29 100 DatetimeIndex(['2020-02-27', '2020-02-28'], dtype='datetime64[ns]', freq='D')
然后将explode
和date
分成行:
df = df.explode('date')
check_in check_out payment date
0 2020-02-28 2020-03-02 66 2020-02-28
0 2020-02-28 2020-03-02 66 2020-02-29
0 2020-02-28 2020-03-02 66 2020-03-01
1 2020-02-27 2020-02-29 100 2020-02-27
1 2020-02-27 2020-02-29 100 2020-02-28
然后groupby transform
计数date
得到日期数,并将付款除以天数得到日收入:
df['earnings'] = df['payment'] / df.groupby(level=0)['date'].transform('count')
check_in check_out payment date earnings
0 2020-02-28 2020-03-02 66 2020-02-28 22.0
0 2020-02-28 2020-03-02 66 2020-02-29 22.0
0 2020-02-28 2020-03-02 66 2020-03-01 22.0
1 2020-02-27 2020-02-29 100 2020-02-27 50.0
1 2020-02-27 2020-02-29 100 2020-02-28 50.0
然后groupby agg
将date
上的收益相加,得到每个date
:的总和
df = df.groupby('date', as_index=False)['earnings'].agg('sum')
date earnings
0 2020-02-27 50.0
1 2020-02-28 72.0
2 2020-02-29 22.0
3 2020-03-01 22.0
使用的DataFrame和导入:
import pandas as pd
df = pd.DataFrame({'check_in': {0: '2020-02-28', 1: '2020-02-27'},
'check_out': {0: '2020-03-02', 1: '2020-02-29'},
'payment': {0: 66, 1: 100}})
df['check_in'] = pd.to_datetime(df['check_in'])
df['check_out'] = pd.to_datetime(df['check_out'])