将熊猫的日期划分为时间范围


14  [2018-03-14, 2018-03-13, 2017-03-06, 2017-02-13]
15  [2017-07-26, 2017-06-09, 2017-02-24]
16  [2018-09-06, 2018-07-06, 2018-07-04, 2017-10-20]
17  [2018-10-03, 2018-09-13, 2018-09-12, 2018-08-3]
18  [2017-02-08]

这是我的数据,每个ID都有自己的日期,从2017-02-05到2018-06-30。我需要将日期划分为5个时间范围,每个时间范围为4个月,这样在前4个月里,每个ID都应该只有该时间范围内的日期(从2017-02-05到2017-06-05(,就像这个

14  [2017-03-06, 2017-02-13]
15  [2017-02-24]
16  [null] # or delete empty rows, it doesn't matter
17  [null]
18  [2017-02-08]

然后是2017-06-05至2017-10-05,依此类推,每4个月一次。此外,我不能使用嵌套for循环,因为数据太大。这是我迄今为止尝试的

months_4 = individual_dates.copy()
for _ in months_4['Date']:
_ = np.where(pd.to_datetime(_) <= pd.to_datetime('2017-9-02'), _, np.datetime64('NaT'))

months_8 = individual_dates.copy()
range_8 = pd.date_range(start='2017-9-02', end='2017-11-02')
for _ in months_8['Date']:
_ = _[np.isin(_, range_8)]

完全没有结果,无论如何,数据都保持不变

更新:我做了你说的

individual_dates['Date'] = individual_dates['Date'].str.strip('[]').str.split(', ')

df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())), 
'ID' : individual_dates['ClientId'].repeat(individual_dates['Date'].str.len())
})
df

这是的结果

Date    ID
0   '2018-06-30T00:00:00.000000000' '2018-06-29T00...   14
1   '2017-03-28T00:00:00.000000000' '2017-03-27T00...   15
2   '2018-03-14T00:00:00.000000000' '2018-03-13T00...   16
3   '2017-12-14T00:00:00.000000000' '2017-03-28T00...   17
4   '2017-05-30T00:00:00.000000000' '2017-05-22T00...   18
5   '2017-03-28T00:00:00.000000000' '2017-03-27T00...   19
6   '2017-03-27T00:00:00.000000000' '2017-03-26T00...   20
7   '2017-12-15T00:00:00.000000000' '2017-11-20T00...   21
8   '2017-07-05T00:00:00.000000000' '2017-07-04T00...   22
9   '2017-12-12T00:00:00.000000000' '2017-04-06T00...   23
10  '2017-05-21T00:00:00.000000000' '2017-05-07T00...   24

为了获得更好的性能,我建议将列表转换为列-将其压平,然后通过isinboolean indexing:进行过滤

from itertools import chain
df = pd.DataFrame({
'Date' : list(chain.from_iterable(individual_dates['Date'].tolist())), 
'ID' : individual_dates['ID'].repeat(individual_dates['Date'].str.len())
})
range_8 = pd.date_range(start='2017-02-05', end='2017-06-05')
df['Date'] = pd.to_datetime(df['Date'])
df = df[df['Date'].isin(range_8)]
print (df)
Date  ID
0 2017-03-06  14
0 2017-02-13  14
1 2017-02-24  15
4 2017-02-08  18

最新更新