仅针对数据帧的一部分 - 熊猫进行分组计算



我正在尝试使用两个数据帧和一个分组代码进行一些计算。但是,只有当我的变量"date_int"大于或等于特定数字(例如,20180501;相当于日期"2018-05-01"(时,我才无法找到进行这些计算的方法。

换句话说,我正在使用的 groupby 代码不仅考虑相关组合(从 2018-05-01 开始的组合(,并且对以前的组合进行所有计算。我的观点是节省时间,并有一个代码,该代码仅计算我从 2018-05-01 开始寻找的组合。

下面我给出两个数据帧、计算(代码的冲突部分(和预期结果。

数据帧 1 (df(:

idx = [np.array(['Jan-18', 'Jan-18', 'Feb-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'May-18', 'Jun-18', 'Jun-18', 'Jun-18','Jul-18', 'Aug-18', 'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18', 'Nov-18', 'Dec-18', 'Dec-18',]),np.array(['A', 'B', 'B', 'A', 'B', 'C', 'D', 'E', 'B', 'A', 'B', 'C','A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'A', 'B', 'C'])]
data = [{'place': 1}, {'place': 5}, {'place': 3}, {'place': 2}, {'place': 7}, {'place': 3},{'place': 1}, {'place': 6}, {'place': 3}, {'place': 5}, {'place': 2}, {'place': 3},{'place': 1}, {'place': 9}, {'place': 3}, {'place': 2}, {'place': 7}, {'place': 3}, {'place': 6}, {'place': 8}, {'place': 2}, {'place': 7}, {'place': 9}]
df = pd.DataFrame(data, index=idx, columns=['place'])
df.index.names=['date','name']
df=df.reset_index()
df['date'] = pd.to_datetime(df['date'],format = '%b-%y') # http://strftime.org/
#df=df.set_index(['date','type'])
df.reset_index(inplace=True)
df['place'] = df.place.astype('float')
df['date_int'] = df['date'].astype('str').str.replace('-','').astype('int64')
df.set_index(['date_int','name'], inplace = True)

数据帧 2 (df2(:

idx = [np.array(['Jan-18', 'Jan-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18','Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Aug-18',  'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18','Oct-18', 'Oct-18','Oct-18', 'Dec-18', 'Dec-18',]),
np.array(['A', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C','C', 'C', 'D', 'D', 'D', 'D', 'E', 'E', 'E', 'E', 'A', 'A', 'B', 'B', 'C', 'C', 'B', 'C', 'A', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'B', 'C']),
np.array(['B', 'A', 'B', 'C', 'D', 'E', 'A', 'C', 'D', 'E', 'A', 'B','D', 'E', 'A', 'B', 'C', 'E', 'A', 'B', 'C', 'D', 'B', 'C', 'A', 'C', 'A', 'B', 'C', 'B', 'B', 'A', 'A', 'B', 'C', 'B', 'C', 'A', 'C', 'B'])]
data = [{'xx': -4, 'win': -1}, {'xx': 4, 'win': 1}, {'xx': -5, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': 1, 'win': 1}, {'xx': -4, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': 4, 'win': 1}, {'xx': 6, 'win': 1}, {'xx': 1, 'win': 1}, {'xx': 1, 'win': 1}, {'xx': -4, 'win': -1},{'xx': 2, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': -6, 'win': -1}, {'xx': -2, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 4, 'win': 1}, {'xx': -1, 'win': -1}, {'xx': 3, 'win': 1}, {'xx': 5, 'win': 1}, {'xx': 3, 'win': 1}, {'xx': 2, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': -2, 'win': -1}, {'xx': 1, 'win': 1}, {'xx': 6, 'win': 1}, {'xx': -6, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 3, 'win': 1}, {'xx': -2, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': 2, 'win': 1}, {'xx': -2, 'win': -1}, {'xx': 2, 'win': 1}]
df2 = pd.DataFrame(data, index=idx, columns=['xx','win'])
df2.index.names=['date','i1', 'i2']
df2=df2.reset_index()
df2['date'] = pd.to_datetime(df2['date'],format = '%b-%y') # http://strftime.org/
#df=df.set_index(['date','type'])
df2.reset_index(inplace=True)
df2['xx'] = df2.xx.astype('float')
df2['date_int'] = df2['date'].astype('str').str.replace('-','').astype('int64')
df2=df2.drop(['date','index'], axis=1)
df2['date_int2']=df2['date_int']
df2.set_index(['date_int','i1','i2'], inplace = True)

冲突代码(对于大于或等于数字 20180501 的变量"date_int"的值不这样做(

df10=df.copy()
if (df10['date']>='2018-05-01').any():
df10['output'] = (df2.assign(to_this = df2['xx'][df2['date_int2']>=20180501])).groupby(level=[1,2]).to_this.cumcount().sum(level=[0,1])
df10['output'].fillna(0,inplace=True)

预期成果:

index       date  place  output
date_int name                                 
20180501 B         8 2018-05-01    3.0     0.0
20180601 A         9 2018-06-01    5.0     3.0
B        10 2018-06-01    2.0     3.0
C        11 2018-06-01    3.0     2.0
20180701 A        12 2018-07-01    1.0     0.0
20180801 B        13 2018-08-01    9.0     2.0
C        14 2018-08-01    3.0     2.0
20180901 A        15 2018-09-01    2.0     3.0
B        16 2018-09-01    7.0     3.0
20181001 C        17 2018-10-01    3.0     5.0
A        18 2018-10-01    6.0     6.0
B        19 2018-10-01    8.0     7.0
20181101 A        20 2018-11-01    2.0     0.0
20181201 B        21 2018-12-01    7.0     4.0
C        22 2018-12-01    9.0     4.0

如果您可以详细说明代码以使其仅在我的变量"date_int"大于或等于特定值时才起作用,那将很有用,因为它将为我节省大量时间。

df['date'].apply(pd.Timestamp.timestamp).astype(int)                                                              

pd.to_datetime(df['date']).values.astype(int)  

解释

下面的更新代码使用不同的方法。它不会在创建数据后从DataFrame中删除数据,而是根本不使用该数据。这是通过mask_it()函数完成的。此函数可用于基于date_int创建booleanmask。此mask可以应用于所有arrays/lists,例如idxdata等,以仅保留那些在date_int上或之后的数据点。接下来,这些索引/数据的子集用于创建DataFrame

注释

  1. 下面的代码 1 似乎是正确的,但与问题中发布的 ouptut 不匹配。在这里,df1df2是在任何分组之前从已经masked索引列表创建的。因此,仅使用date_int当天或之后的日期。因此,.cumcount()可能是正确的。这可能很重要,因为.group()level=[1,2]不包括任何日期。
  2. 输出
  3. 下面的代码 2 与发布有问题的输出匹配,但可能不正确。因为,这里的分组是在"删除任何dates之前"完成的。因此,level=[1,2]grouping可以包括date_int之前的日期。这可以通过使用df2.groupby(level=[1,2])['xx'].groups()进行检查。

旁注:问题中更新的代码为我提供了与问题中发布的不同的预期df10输出

代码-1

### Import libraries
import numpy as np
import pandas as pd
import datetime
### Create function for mask
def mask_it(date_int, xarray):
# Create 'start_date'
dt = str(date_int) # convert integer 'date_int' to string
dt = datetime.datetime(year=int(dt[0:4]), month=int(dt[4:6]), day=int(dt[6:8])) # convert to date
dtyear = dt.strftime("%y") # get last two digits of year
dtmonth = dt.strftime("%B")[0:3] # get first three characters of month
start_date = str(dtmonth)+'-'+str(dtyear)
start_date =  datetime.datetime.strptime(start_date, '%b-%y')

# Convert array dates from string to datetime
xarray_NEW = np.array([ datetime.datetime.strptime(i, '%b-%y') for i in xarray])
print(type(start_date))
# Create boolean mask
mask = [i >= start_date for i in xarray_NEW]

return mask

### Function to create `df1`
def start_date_df(date_int):

# Index
df1_idx_date = np.array(['Jan-18', 'Jan-18', 'Feb-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'May-18', 'Jun-18', 'Jun-18', 'Jun-18','Jul-18', 'Aug-18', 'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18', 'Nov-18', 'Dec-18', 'Dec-18',])
df1_idx_char = np.array(['A', 'B', 'B', 'A', 'B', 'C', 'D', 'E', 'B', 'A', 'B', 'C','A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'A', 'B', 'C'])
# Data
df1_data = np.array([{'place': 1}, {'place': 5}, {'place': 3}, {'place': 2}, {'place': 7}, {'place': 3},{'place': 1}, {'place': 6}, {'place': 3}, {'place': 5}, {'place': 2}, {'place': 3},{'place': 1}, {'place': 9}, {'place': 3}, {'place': 2}, {'place': 7}, {'place': 3}, {'place': 6}, {'place': 8}, {'place': 2}, {'place': 7}, {'place': 9}])    

#:::::::::::::::::::::::::::::::::::::::::::::
### Subset by using mask_it
#:::::::::::::::::::::::::::::::::::::::::::::
# Get mask
mask1 = mask_it(date_int, df1_idx_date) 
# Use mask
df1_idx_date = df1_idx_date[mask1]
df1_idx_char = df1_idx_char[mask1]
df1_data = df1_data[mask1]
#:::::::::::::::::::::::::::::::::::::::::::::   

### Get 'idx' and 'data'
idx = [df1_idx_date, df1_idx_char]
data = list(df1_data)

df = pd.DataFrame(data, index=idx, columns=['place'])
df.index.names=['date','name']
df=df.reset_index()
df['date'] = pd.to_datetime(df['date'],format = '%b-%y') # http://strftime.org/
#df=df.set_index(['date','type'])
df.reset_index(inplace=True)
df['place'] = df.place.astype('float')
df['date_int'] = df['date'].astype('str').str.replace('-','').astype('int64')
df.set_index(['date_int','name'], inplace = True)

return df

### Function to create 'df2'
def start_date_df2(date_int):

# Index
df2_idx_date = np.array(['Jan-18', 'Jan-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18','Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Aug-18',  'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18','Oct-18', 'Oct-18','Oct-18', 'Dec-18', 'Dec-18',])
df2_idx_char1 = np.array(['A', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C','C', 'C', 'D', 'D', 'D', 'D', 'E', 'E', 'E', 'E', 'A', 'A', 'B', 'B', 'C', 'C', 'B', 'C', 'A', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'B', 'C'])
df2_idx_char2 = np.array(['B', 'A', 'B', 'C', 'D', 'E', 'A', 'C', 'D', 'E', 'A', 'B','D', 'E', 'A', 'B', 'C', 'E', 'A', 'B', 'C', 'D', 'B', 'C', 'A', 'C', 'A', 'B', 'C', 'B', 'B', 'A', 'A', 'B', 'C', 'B', 'C', 'A', 'C', 'B'])

# Data
df2_data = np.array([{'xx': -4, 'win': -1}, {'xx': 4, 'win': 1}, {'xx': -5, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': 1, 'win': 1}, {'xx': -4, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': 4, 'win': 1}, {'xx': 6, 'win': 1}, {'xx': 1, 'win': 1}, {'xx': 1, 'win': 1}, {'xx': -4, 'win': -1},{'xx': 2, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': -6, 'win': -1}, {'xx': -2, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 4, 'win': 1}, {'xx': -1, 'win': -1}, {'xx': 3, 'win': 1}, {'xx': 5, 'win': 1}, {'xx': 3, 'win': 1}, {'xx': 2, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': -2, 'win': -1}, {'xx': 1, 'win': 1}, {'xx': 6, 'win': 1}, {'xx': -6, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 3, 'win': 1}, {'xx': -2, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': 2, 'win': 1}, {'xx': -2, 'win': -1}, {'xx': 2, 'win': 1}])

#:::::::::::::::::::::::::::::::::::::::::::::
### Subset by using mask_it
#:::::::::::::::::::::::::::::::::::::::::::::
# Get mask
mask2 = mask_it(date_int, df2_idx_date) 
# Use mask
df2_idx_date = df2_idx_date[mask2]
df2_idx_char1 = df2_idx_char1[mask2]
df2_idx_char2 = df2_idx_char2[mask2]
df2_data = df2_data[mask2]
#::::::::::::::::::::::::::::::::::::::::::::: 

### Get 'idx' and 'data'
idx = [df2_idx_date, df2_idx_char1, df2_idx_char2]
data = list(df2_data)


df2 = pd.DataFrame(data, index=idx, columns=['xx','win'])
df2.index.names=['date','i1', 'i2']
df2=df2.reset_index()
df2['date'] = pd.to_datetime(df2['date'],format = '%b-%y') # http://strftime.org/
#df=df.set_index(['date','type'])
df2.reset_index(inplace=True)
df2['xx'] = df2.xx.astype('float')
df2['date_int'] = df2['date'].astype('str').str.replace('-','').astype('int64')
df2=df2.drop(['date','index'], axis=1)
df2.set_index(['date_int','i1','i2'], inplace = True)


#........................
# Limit by date_int
df2 = df2.groupby(level=[1,2])['xx'].cumcount().sum(level=[0,1]) 

#df2 = df2.loc[date_int:,:]
#........................

return df2

### Get DataFrames based on 'date_int'
df = start_date_df(20180501)
df['output'] = start_date_df2(20180501)
df['output'].fillna(0,inplace=True)

输出 - 1

print(df)
index       date  place  output
date_int name                                 
20180501 B         0 2018-05-01    3.0     0.0
20180601 A         1 2018-06-01    5.0     0.0
B         2 2018-06-01    2.0     0.0
C         3 2018-06-01    3.0     0.0
20180701 A         4 2018-07-01    1.0     0.0
20180801 B         5 2018-08-01    9.0     1.0
C         6 2018-08-01    3.0     1.0
20180901 A         7 2018-09-01    2.0     1.0
B         8 2018-09-01    7.0     1.0
20181001 C         9 2018-10-01    3.0     3.0
A        10 2018-10-01    6.0     3.0
B        11 2018-10-01    8.0     4.0
20181101 A        12 2018-11-01    2.0     0.0
20181201 B        13 2018-12-01    7.0     3.0
C        14 2018-12-01    9.0     3.0
<小时 /><小时 />

代码 - 2

### Import libraries
import numpy as np
import pandas as pd
import datetime
### Create function for mask
def mask_it(date_int, xarray):
# Create 'start_date'
dt = str(date_int) # convert integer 'date_int' to string
dt = datetime.datetime(year=int(dt[0:4]), month=int(dt[4:6]), day=int(dt[6:8])) # convert to date
dtyear = dt.strftime("%y") # get last two digits of year
dtmonth = dt.strftime("%B")[0:3] # get first three characters of month
start_date = str(dtmonth)+'-'+str(dtyear)
start_date =  datetime.datetime.strptime(start_date, '%b-%y')

# Convert array dates from string to datetime
xarray_NEW = np.array([ datetime.datetime.strptime(i, '%b-%y') for i in xarray])
print(type(start_date))
# Create boolean mask
mask = [i >= start_date for i in xarray_NEW]

return mask

### Function to create `df1`
def start_date_df(date_int):

# Index
df1_idx_date = np.array(['Jan-18', 'Jan-18', 'Feb-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'May-18', 'Jun-18', 'Jun-18', 'Jun-18','Jul-18', 'Aug-18', 'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18', 'Nov-18', 'Dec-18', 'Dec-18',])
df1_idx_char = np.array(['A', 'B', 'B', 'A', 'B', 'C', 'D', 'E', 'B', 'A', 'B', 'C','A', 'B', 'C', 'A', 'B', 'C', 'A', 'B', 'A', 'B', 'C'])
# Data
df1_data = np.array([{'place': 1}, {'place': 5}, {'place': 3}, {'place': 2}, {'place': 7}, {'place': 3},{'place': 1}, {'place': 6}, {'place': 3}, {'place': 5}, {'place': 2}, {'place': 3},{'place': 1}, {'place': 9}, {'place': 3}, {'place': 2}, {'place': 7}, {'place': 3}, {'place': 6}, {'place': 8}, {'place': 2}, {'place': 7}, {'place': 9}])    

#:::::::::::::::::::::::::::::::::::::::::::::
### Subset by using mask_it
#:::::::::::::::::::::::::::::::::::::::::::::
# Get mask
mask1 = mask_it(date_int, df1_idx_date) 
# Use mask
df1_idx_date = df1_idx_date[mask1]
df1_idx_char = df1_idx_char[mask1]
df1_data = df1_data[mask1]
#:::::::::::::::::::::::::::::::::::::::::::::   

### Get 'idx' and 'data'
idx = [df1_idx_date, df1_idx_char]
data = list(df1_data)

df = pd.DataFrame(data, index=idx, columns=['place'])
df.index.names=['date','name']
df=df.reset_index()
df['date'] = pd.to_datetime(df['date'],format = '%b-%y') # http://strftime.org/
#df=df.set_index(['date','type'])
df.reset_index(inplace=True)
df['place'] = df.place.astype('float')
df['date_int'] = df['date'].astype('str').str.replace('-','').astype('int64')
df.set_index(['date_int','name'], inplace = True)

return df

### Function to create 'df2'
def start_date_df2(date_int):

# Index
df2_idx_date = np.array(['Jan-18', 'Jan-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18', 'Mar-18', 'Mar-18', 'Mar-18','Mar-18','Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Jun-18', 'Aug-18',  'Aug-18', 'Sep-18', 'Sep-18', 'Oct-18','Oct-18', 'Oct-18','Oct-18', 'Oct-18','Oct-18', 'Dec-18', 'Dec-18',])
df2_idx_char1 = np.array(['A', 'B', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C','C', 'C', 'D', 'D', 'D', 'D', 'E', 'E', 'E', 'E', 'A', 'A', 'B', 'B', 'C', 'C', 'B', 'C', 'A', 'B', 'C', 'C', 'A', 'A', 'B', 'B', 'B', 'C'])
df2_idx_char2 = np.array(['B', 'A', 'B', 'C', 'D', 'E', 'A', 'C', 'D', 'E', 'A', 'B','D', 'E', 'A', 'B', 'C', 'E', 'A', 'B', 'C', 'D', 'B', 'C', 'A', 'C', 'A', 'B', 'C', 'B', 'B', 'A', 'A', 'B', 'C', 'B', 'C', 'A', 'C', 'B'])

# Data
df2_data = np.array([{'xx': -4, 'win': -1}, {'xx': 4, 'win': 1}, {'xx': -5, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': 1, 'win': 1}, {'xx': -4, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': 4, 'win': 1}, {'xx': 6, 'win': 1}, {'xx': 1, 'win': 1}, {'xx': 1, 'win': 1}, {'xx': -4, 'win': -1},{'xx': 2, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': -6, 'win': -1}, {'xx': -2, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 4, 'win': 1}, {'xx': -1, 'win': -1}, {'xx': 3, 'win': 1}, {'xx': 5, 'win': 1}, {'xx': 3, 'win': 1}, {'xx': 2, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -1, 'win': -1}, {'xx': -2, 'win': -1}, {'xx': 1, 'win': 1}, {'xx': 6, 'win': 1}, {'xx': -6, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': -3, 'win': -1}, {'xx': -5, 'win': -1}, {'xx': 3, 'win': 1}, {'xx': -2, 'win': -1}, {'xx': 5, 'win': 1}, {'xx': 2, 'win': 1}, {'xx': -2, 'win': -1}, {'xx': 2, 'win': 1}])
"""
#:::::::::::::::::::::::::::::::::::::::::::::
### Subset by using mask_it
#:::::::::::::::::::::::::::::::::::::::::::::
# Get mask
mask2 = mask_it(date_int, df2_idx_date) 
# Use mask
df2_idx_date = df2_idx_date[mask2]
df2_idx_char1 = df2_idx_char1[mask2]
df2_idx_char2 = df2_idx_char2[mask2]
df2_data = df2_data[mask2]
#::::::::::::::::::::::::::::::::::::::::::::: 
"""

### Get 'idx' and 'data'
idx = [df2_idx_date, df2_idx_char1, df2_idx_char2]
data = list(df2_data)


df2 = pd.DataFrame(data, index=idx, columns=['xx','win'])
df2.index.names=['date','i1', 'i2']
df2=df2.reset_index()
df2['date'] = pd.to_datetime(df2['date'],format = '%b-%y') # http://strftime.org/
#df=df.set_index(['date','type'])
df2.reset_index(inplace=True)
df2['xx'] = df2.xx.astype('float')
df2['date_int'] = df2['date'].astype('str').str.replace('-','').astype('int64')
df2=df2.drop(['date','index'], axis=1)
df2.set_index(['date_int','i1','i2'], inplace = True)


#........................
# Limit by date_int
df2 = df2.groupby(level=[1,2])['xx'].cumcount().sum(level=[0,1]) 

#df2 = df2.loc[date_int:,:]
#........................

return df2

### Get DataFrames based on 'date_int'
df = start_date_df(20180501)
df['output'] = start_date_df2(20180501)
df['output'].fillna(0,inplace=True)

输出 - 2:

print(df)
index       date  place  output
date_int name                                 
20180501 B         0 2018-05-01    3.0     0.0
20180601 A         1 2018-06-01    5.0     3.0
B         2 2018-06-01    2.0     3.0
C         3 2018-06-01    3.0     2.0
20180701 A         4 2018-07-01    1.0     0.0
20180801 B         5 2018-08-01    9.0     2.0
C         6 2018-08-01    3.0     2.0
20180901 A         7 2018-09-01    2.0     3.0
B         8 2018-09-01    7.0     3.0
20181001 C         9 2018-10-01    3.0     5.0
A        10 2018-10-01    6.0     6.0
B        11 2018-10-01    8.0     7.0
20181101 A        12 2018-11-01    2.0     0.0
20181201 B        13 2018-12-01    7.0     4.0
C        14 2018-12-01    9.0     4.0

最新更新