我正在尝试使用两个数据帧和一个分组代码进行一些计算。但是,只有当我的变量"date_int"大于或等于特定数字(例如,20180501;相当于日期"2018-05-01"(时,我才无法找到进行这些计算的方法。
换句话说,我正在使用的 groupby 代码不仅考虑相关组合(从 2018-05-01 开始的组合(,并且对以前的组合进行所有计算。我的观点是节省时间,并有一个代码,该代码仅计算我从 2018-05-01 开始寻找的组合。
下面我给出两个数据帧、计算(代码的冲突部分(和预期结果。
数据帧 1 (df(:
idx = [np.array(['Jan-18', 'Jan-18', 'Feb-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'May-18', 'Jun-18', 'Jun-18', 'Jun-18','Jul-18', 'Aug-18', 'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18', 'Nov-18', 'Dec-18', 'Dec-18',]),np.array(['A', 'B', 'B', 'A', 'B', 'C', 'D', 'E', 'B', 'A', 'B', 'C','A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'A', 'B', 'C'])]
data = [{'place': 1}, {'place': 5}, {'place': 3}, {'place': 2}, {'place': 7}, {'place': 3},{'place': 1}, {'place': 6}, {'place': 3}, {'place': 5}, {'place': 2}, {'place': 3},{'place': 1}, {'place': 9}, {'place': 3}, {'place': 2}, {'place': 7}, {'place': 3}, {'place': 6}, {'place': 8}, {'place': 2}, {'place': 7}, {'place': 9}]
df = pd.DataFrame(data, index=idx, columns=['place'])
df.index.names=['date','name']
df=df.reset_index()
df['date'] = pd.to_datetime(df['date'],format = '%b-%y') # http://strftime.org/
#df=df.set_index(['date','type'])
df.reset_index(inplace=True)
df['place'] = df.place.astype('float')
df['date_int'] = df['date'].astype('str').str.replace('-','').astype('int64')
df.set_index(['date_int','name'], inplace = True)
数据帧 2 (df2(:
idx = [np.array(['Jan-18', 'Jan-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18','Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Aug-18', 'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18','Oct-18', 'Oct-18','Oct-18', 'Dec-18', 'Dec-18',]),
np.array(['A', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C','C', 'C', 'D', 'D', 'D', 'D', 'E', 'E', 'E', 'E', 'A', 'A', 'B', 'B', 'C', 'C', 'B', 'C', 'A', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'B', 'C']),
np.array(['B', 'A', 'B', 'C', 'D', 'E', 'A', 'C', 'D', 'E', 'A', 'B','D', 'E', 'A', 'B', 'C', 'E', 'A', 'B', 'C', 'D', 'B', 'C', 'A', 'C', 'A', 'B', 'C', 'B', 'B', 'A', 'A', 'B', 'C', 'B', 'C', 'A', 'C', 'B'])]
data = [{'xx': -4, 'win': -1}, {'xx': 4, 'win': 1}, {'xx': -5, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': 1, 'win': 1}, {'xx': -4, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': 4, 'win': 1}, {'xx': 6, 'win': 1}, {'xx': 1, 'win': 1}, {'xx': 1, 'win': 1}, {'xx': -4, 'win': -1},{'xx': 2, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': -6, 'win': -1}, {'xx': -2, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 4, 'win': 1}, {'xx': -1, 'win': -1}, {'xx': 3, 'win': 1}, {'xx': 5, 'win': 1}, {'xx': 3, 'win': 1}, {'xx': 2, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': -2, 'win': -1}, {'xx': 1, 'win': 1}, {'xx': 6, 'win': 1}, {'xx': -6, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 3, 'win': 1}, {'xx': -2, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': 2, 'win': 1}, {'xx': -2, 'win': -1}, {'xx': 2, 'win': 1}]
df2 = pd.DataFrame(data, index=idx, columns=['xx','win'])
df2.index.names=['date','i1', 'i2']
df2=df2.reset_index()
df2['date'] = pd.to_datetime(df2['date'],format = '%b-%y') # http://strftime.org/
#df=df.set_index(['date','type'])
df2.reset_index(inplace=True)
df2['xx'] = df2.xx.astype('float')
df2['date_int'] = df2['date'].astype('str').str.replace('-','').astype('int64')
df2=df2.drop(['date','index'], axis=1)
df2['date_int2']=df2['date_int']
df2.set_index(['date_int','i1','i2'], inplace = True)
冲突代码(对于大于或等于数字 20180501 的变量"date_int"的值不这样做(
df10=df.copy()
if (df10['date']>='2018-05-01').any():
df10['output'] = (df2.assign(to_this = df2['xx'][df2['date_int2']>=20180501])).groupby(level=[1,2]).to_this.cumcount().sum(level=[0,1])
df10['output'].fillna(0,inplace=True)
预期成果:
index date place output
date_int name
20180501 B 8 2018-05-01 3.0 0.0
20180601 A 9 2018-06-01 5.0 3.0
B 10 2018-06-01 2.0 3.0
C 11 2018-06-01 3.0 2.0
20180701 A 12 2018-07-01 1.0 0.0
20180801 B 13 2018-08-01 9.0 2.0
C 14 2018-08-01 3.0 2.0
20180901 A 15 2018-09-01 2.0 3.0
B 16 2018-09-01 7.0 3.0
20181001 C 17 2018-10-01 3.0 5.0
A 18 2018-10-01 6.0 6.0
B 19 2018-10-01 8.0 7.0
20181101 A 20 2018-11-01 2.0 0.0
20181201 B 21 2018-12-01 7.0 4.0
C 22 2018-12-01 9.0 4.0
如果您可以详细说明代码以使其仅在我的变量"date_int"大于或等于特定值时才起作用,那将很有用,因为它将为我节省大量时间。
df['date'].apply(pd.Timestamp.timestamp).astype(int)
或
pd.to_datetime(df['date']).values.astype(int)
解释:
下面的更新代码使用不同的方法。它不会在创建数据后从DataFrame
中删除数据,而是根本不使用该数据。这是通过mask_it()
函数完成的。此函数可用于基于date_int
创建boolean
mask
。此mask
可以应用于所有arrays/lists
,例如idx
,data
等,以仅保留那些在date_int
上或之后的数据点。接下来,这些索引/数据的子集用于创建DataFrame
。
注释:
- 下面的代码 1 似乎是正确的,但与问题中发布的 ouptut 不匹配。在这里,
df1
和df2
是在任何分组之前从已经masked
索引列表创建的。因此,仅使用date_int
当天或之后的日期。因此,.cumcount()
可能是正确的。这可能很重要,因为.group()
level=[1,2]
不包括任何日期。
输出 - 下面的代码 2 与发布有问题的输出匹配,但可能不正确。因为,这里的分组是在"删除任何
dates
之前"完成的。因此,level=[1,2]
的grouping
可以包括date_int
之前的日期。这可以通过使用df2.groupby(level=[1,2])['xx'].groups()
进行检查。
旁注:问题中更新的代码为我提供了与问题中发布的不同的预期df10
输出
代码-1:
### Import libraries
import numpy as np
import pandas as pd
import datetime
### Create function for mask
def mask_it(date_int, xarray):
# Create 'start_date'
dt = str(date_int) # convert integer 'date_int' to string
dt = datetime.datetime(year=int(dt[0:4]), month=int(dt[4:6]), day=int(dt[6:8])) # convert to date
dtyear = dt.strftime("%y") # get last two digits of year
dtmonth = dt.strftime("%B")[0:3] # get first three characters of month
start_date = str(dtmonth)+'-'+str(dtyear)
start_date = datetime.datetime.strptime(start_date, '%b-%y')
# Convert array dates from string to datetime
xarray_NEW = np.array([ datetime.datetime.strptime(i, '%b-%y') for i in xarray])
print(type(start_date))
# Create boolean mask
mask = [i >= start_date for i in xarray_NEW]
return mask
### Function to create `df1`
def start_date_df(date_int):
# Index
df1_idx_date = np.array(['Jan-18', 'Jan-18', 'Feb-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'May-18', 'Jun-18', 'Jun-18', 'Jun-18','Jul-18', 'Aug-18', 'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18', 'Nov-18', 'Dec-18', 'Dec-18',])
df1_idx_char = np.array(['A', 'B', 'B', 'A', 'B', 'C', 'D', 'E', 'B', 'A', 'B', 'C','A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'A', 'B', 'C'])
# Data
df1_data = np.array([{'place': 1}, {'place': 5}, {'place': 3}, {'place': 2}, {'place': 7}, {'place': 3},{'place': 1}, {'place': 6}, {'place': 3}, {'place': 5}, {'place': 2}, {'place': 3},{'place': 1}, {'place': 9}, {'place': 3}, {'place': 2}, {'place': 7}, {'place': 3}, {'place': 6}, {'place': 8}, {'place': 2}, {'place': 7}, {'place': 9}])
#:::::::::::::::::::::::::::::::::::::::::::::
### Subset by using mask_it
#:::::::::::::::::::::::::::::::::::::::::::::
# Get mask
mask1 = mask_it(date_int, df1_idx_date)
# Use mask
df1_idx_date = df1_idx_date[mask1]
df1_idx_char = df1_idx_char[mask1]
df1_data = df1_data[mask1]
#:::::::::::::::::::::::::::::::::::::::::::::
### Get 'idx' and 'data'
idx = [df1_idx_date, df1_idx_char]
data = list(df1_data)
df = pd.DataFrame(data, index=idx, columns=['place'])
df.index.names=['date','name']
df=df.reset_index()
df['date'] = pd.to_datetime(df['date'],format = '%b-%y') # http://strftime.org/
#df=df.set_index(['date','type'])
df.reset_index(inplace=True)
df['place'] = df.place.astype('float')
df['date_int'] = df['date'].astype('str').str.replace('-','').astype('int64')
df.set_index(['date_int','name'], inplace = True)
return df
### Function to create 'df2'
def start_date_df2(date_int):
# Index
df2_idx_date = np.array(['Jan-18', 'Jan-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18','Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Aug-18', 'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18','Oct-18', 'Oct-18','Oct-18', 'Dec-18', 'Dec-18',])
df2_idx_char1 = np.array(['A', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C','C', 'C', 'D', 'D', 'D', 'D', 'E', 'E', 'E', 'E', 'A', 'A', 'B', 'B', 'C', 'C', 'B', 'C', 'A', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'B', 'C'])
df2_idx_char2 = np.array(['B', 'A', 'B', 'C', 'D', 'E', 'A', 'C', 'D', 'E', 'A', 'B','D', 'E', 'A', 'B', 'C', 'E', 'A', 'B', 'C', 'D', 'B', 'C', 'A', 'C', 'A', 'B', 'C', 'B', 'B', 'A', 'A', 'B', 'C', 'B', 'C', 'A', 'C', 'B'])
# Data
df2_data = np.array([{'xx': -4, 'win': -1}, {'xx': 4, 'win': 1}, {'xx': -5, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': 1, 'win': 1}, {'xx': -4, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': 4, 'win': 1}, {'xx': 6, 'win': 1}, {'xx': 1, 'win': 1}, {'xx': 1, 'win': 1}, {'xx': -4, 'win': -1},{'xx': 2, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': -6, 'win': -1}, {'xx': -2, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 4, 'win': 1}, {'xx': -1, 'win': -1}, {'xx': 3, 'win': 1}, {'xx': 5, 'win': 1}, {'xx': 3, 'win': 1}, {'xx': 2, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': -2, 'win': -1}, {'xx': 1, 'win': 1}, {'xx': 6, 'win': 1}, {'xx': -6, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 3, 'win': 1}, {'xx': -2, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': 2, 'win': 1}, {'xx': -2, 'win': -1}, {'xx': 2, 'win': 1}])
#:::::::::::::::::::::::::::::::::::::::::::::
### Subset by using mask_it
#:::::::::::::::::::::::::::::::::::::::::::::
# Get mask
mask2 = mask_it(date_int, df2_idx_date)
# Use mask
df2_idx_date = df2_idx_date[mask2]
df2_idx_char1 = df2_idx_char1[mask2]
df2_idx_char2 = df2_idx_char2[mask2]
df2_data = df2_data[mask2]
#:::::::::::::::::::::::::::::::::::::::::::::
### Get 'idx' and 'data'
idx = [df2_idx_date, df2_idx_char1, df2_idx_char2]
data = list(df2_data)
df2 = pd.DataFrame(data, index=idx, columns=['xx','win'])
df2.index.names=['date','i1', 'i2']
df2=df2.reset_index()
df2['date'] = pd.to_datetime(df2['date'],format = '%b-%y') # http://strftime.org/
#df=df.set_index(['date','type'])
df2.reset_index(inplace=True)
df2['xx'] = df2.xx.astype('float')
df2['date_int'] = df2['date'].astype('str').str.replace('-','').astype('int64')
df2=df2.drop(['date','index'], axis=1)
df2.set_index(['date_int','i1','i2'], inplace = True)
#........................
# Limit by date_int
df2 = df2.groupby(level=[1,2])['xx'].cumcount().sum(level=[0,1])
#df2 = df2.loc[date_int:,:]
#........................
return df2
### Get DataFrames based on 'date_int'
df = start_date_df(20180501)
df['output'] = start_date_df2(20180501)
df['output'].fillna(0,inplace=True)
输出 - 1
print(df)
index date place output
date_int name
20180501 B 0 2018-05-01 3.0 0.0
20180601 A 1 2018-06-01 5.0 0.0
B 2 2018-06-01 2.0 0.0
C 3 2018-06-01 3.0 0.0
20180701 A 4 2018-07-01 1.0 0.0
20180801 B 5 2018-08-01 9.0 1.0
C 6 2018-08-01 3.0 1.0
20180901 A 7 2018-09-01 2.0 1.0
B 8 2018-09-01 7.0 1.0
20181001 C 9 2018-10-01 3.0 3.0
A 10 2018-10-01 6.0 3.0
B 11 2018-10-01 8.0 4.0
20181101 A 12 2018-11-01 2.0 0.0
20181201 B 13 2018-12-01 7.0 3.0
C 14 2018-12-01 9.0 3.0
<小时 /><小时 />代码 - 2:
### Import libraries
import numpy as np
import pandas as pd
import datetime
### Create function for mask
def mask_it(date_int, xarray):
# Create 'start_date'
dt = str(date_int) # convert integer 'date_int' to string
dt = datetime.datetime(year=int(dt[0:4]), month=int(dt[4:6]), day=int(dt[6:8])) # convert to date
dtyear = dt.strftime("%y") # get last two digits of year
dtmonth = dt.strftime("%B")[0:3] # get first three characters of month
start_date = str(dtmonth)+'-'+str(dtyear)
start_date = datetime.datetime.strptime(start_date, '%b-%y')
# Convert array dates from string to datetime
xarray_NEW = np.array([ datetime.datetime.strptime(i, '%b-%y') for i in xarray])
print(type(start_date))
# Create boolean mask
mask = [i >= start_date for i in xarray_NEW]
return mask
### Function to create `df1`
def start_date_df(date_int):
# Index
df1_idx_date = np.array(['Jan-18', 'Jan-18', 'Feb-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'May-18', 'Jun-18', 'Jun-18', 'Jun-18','Jul-18', 'Aug-18', 'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18', 'Nov-18', 'Dec-18', 'Dec-18',])
df1_idx_char = np.array(['A', 'B', 'B', 'A', 'B', 'C', 'D', 'E', 'B', 'A', 'B', 'C','A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'A', 'B', 'C'])
# Data
df1_data = np.array([{'place': 1}, {'place': 5}, {'place': 3}, {'place': 2}, {'place': 7}, {'place': 3},{'place': 1}, {'place': 6}, {'place': 3}, {'place': 5}, {'place': 2}, {'place': 3},{'place': 1}, {'place': 9}, {'place': 3}, {'place': 2}, {'place': 7}, {'place': 3}, {'place': 6}, {'place': 8}, {'place': 2}, {'place': 7}, {'place': 9}])
#:::::::::::::::::::::::::::::::::::::::::::::
### Subset by using mask_it
#:::::::::::::::::::::::::::::::::::::::::::::
# Get mask
mask1 = mask_it(date_int, df1_idx_date)
# Use mask
df1_idx_date = df1_idx_date[mask1]
df1_idx_char = df1_idx_char[mask1]
df1_data = df1_data[mask1]
#:::::::::::::::::::::::::::::::::::::::::::::
### Get 'idx' and 'data'
idx = [df1_idx_date, df1_idx_char]
data = list(df1_data)
df = pd.DataFrame(data, index=idx, columns=['place'])
df.index.names=['date','name']
df=df.reset_index()
df['date'] = pd.to_datetime(df['date'],format = '%b-%y') # http://strftime.org/
#df=df.set_index(['date','type'])
df.reset_index(inplace=True)
df['place'] = df.place.astype('float')
df['date_int'] = df['date'].astype('str').str.replace('-','').astype('int64')
df.set_index(['date_int','name'], inplace = True)
return df
### Function to create 'df2'
def start_date_df2(date_int):
# Index
df2_idx_date = np.array(['Jan-18', 'Jan-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18','Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Aug-18', 'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18','Oct-18', 'Oct-18','Oct-18', 'Dec-18', 'Dec-18',])
df2_idx_char1 = np.array(['A', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C','C', 'C', 'D', 'D', 'D', 'D', 'E', 'E', 'E', 'E', 'A', 'A', 'B', 'B', 'C', 'C', 'B', 'C', 'A', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'B', 'C'])
df2_idx_char2 = np.array(['B', 'A', 'B', 'C', 'D', 'E', 'A', 'C', 'D', 'E', 'A', 'B','D', 'E', 'A', 'B', 'C', 'E', 'A', 'B', 'C', 'D', 'B', 'C', 'A', 'C', 'A', 'B', 'C', 'B', 'B', 'A', 'A', 'B', 'C', 'B', 'C', 'A', 'C', 'B'])
# Data
df2_data = np.array([{'xx': -4, 'win': -1}, {'xx': 4, 'win': 1}, {'xx': -5, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': 1, 'win': 1}, {'xx': -4, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': 4, 'win': 1}, {'xx': 6, 'win': 1}, {'xx': 1, 'win': 1}, {'xx': 1, 'win': 1}, {'xx': -4, 'win': -1},{'xx': 2, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': -6, 'win': -1}, {'xx': -2, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 4, 'win': 1}, {'xx': -1, 'win': -1}, {'xx': 3, 'win': 1}, {'xx': 5, 'win': 1}, {'xx': 3, 'win': 1}, {'xx': 2, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': -2, 'win': -1}, {'xx': 1, 'win': 1}, {'xx': 6, 'win': 1}, {'xx': -6, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 3, 'win': 1}, {'xx': -2, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': 2, 'win': 1}, {'xx': -2, 'win': -1}, {'xx': 2, 'win': 1}])
"""
#:::::::::::::::::::::::::::::::::::::::::::::
### Subset by using mask_it
#:::::::::::::::::::::::::::::::::::::::::::::
# Get mask
mask2 = mask_it(date_int, df2_idx_date)
# Use mask
df2_idx_date = df2_idx_date[mask2]
df2_idx_char1 = df2_idx_char1[mask2]
df2_idx_char2 = df2_idx_char2[mask2]
df2_data = df2_data[mask2]
#:::::::::::::::::::::::::::::::::::::::::::::
"""
### Get 'idx' and 'data'
idx = [df2_idx_date, df2_idx_char1, df2_idx_char2]
data = list(df2_data)
df2 = pd.DataFrame(data, index=idx, columns=['xx','win'])
df2.index.names=['date','i1', 'i2']
df2=df2.reset_index()
df2['date'] = pd.to_datetime(df2['date'],format = '%b-%y') # http://strftime.org/
#df=df.set_index(['date','type'])
df2.reset_index(inplace=True)
df2['xx'] = df2.xx.astype('float')
df2['date_int'] = df2['date'].astype('str').str.replace('-','').astype('int64')
df2=df2.drop(['date','index'], axis=1)
df2.set_index(['date_int','i1','i2'], inplace = True)
#........................
# Limit by date_int
df2 = df2.groupby(level=[1,2])['xx'].cumcount().sum(level=[0,1])
#df2 = df2.loc[date_int:,:]
#........................
return df2
### Get DataFrames based on 'date_int'
df = start_date_df(20180501)
df['output'] = start_date_df2(20180501)
df['output'].fillna(0,inplace=True)
输出 - 2:
print(df)
index date place output
date_int name
20180501 B 0 2018-05-01 3.0 0.0
20180601 A 1 2018-06-01 5.0 3.0
B 2 2018-06-01 2.0 3.0
C 3 2018-06-01 3.0 2.0
20180701 A 4 2018-07-01 1.0 0.0
20180801 B 5 2018-08-01 9.0 2.0
C 6 2018-08-01 3.0 2.0
20180901 A 7 2018-09-01 2.0 3.0
B 8 2018-09-01 7.0 3.0
20181001 C 9 2018-10-01 3.0 5.0
A 10 2018-10-01 6.0 6.0
B 11 2018-10-01 8.0 7.0
20181101 A 12 2018-11-01 2.0 0.0
20181201 B 13 2018-12-01 7.0 4.0
C 14 2018-12-01 9.0 4.0