Python:如果记录的排序方式与列表相同,我希望返回基于列表的数据帧子集



我有一个数据帧,它有一千多条记录,我想返回一个切片的数据帧,其中值的顺序与列表类似。

例如

lst = [0,1,0,0,0,1]

输入

date season hot_or_cold
0   2012-01-01 Winter 0
1   2012-01-02 Winter 1
2   2012-01-03 Winter 0
3   2012-01-04 Winter 0
4   2012-01-05 Winter 0
5   2012-01-06 Winter 1
6   2012-01-07 Winter 1
7   2012-01-08 Winter 1
8   2012-01-09 Winter 0
9   2012-01-10 Winter 1
10   2012-01-11 Winter 0
# 1 - hot
# 0 - cold

输出

date season hot_or_cold
0   2012-01-01 Winter 0
1   2012-01-02 Winter 1
2   2012-01-03 Winter 0
3   2012-01-04 Winter 0
4   2012-01-05 Winter 0
5   2012-01-06 Winter 1

提前感谢

基本问题是在数据帧中找到一些模式,我在这里得到了这个模式,并实现了它。

import pandas as pd 
import numpy as np
arr = [0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 1, 1]
df = pd.DataFrame(data = arr, columns=['binary'])
pattern = [0,1, 0, 0, 0, 1]
matched = df.rolling(len(pattern)).apply(lambda x:all(np.equal(x, pattern)))
matched = matched.sum(axis = 1).astype(bool)   #Sum to perform boolean OR
idx_matched = np.where(matched)[0]
subset = [range(match-len(pattern)+1, match+1) for match in idx_matched]
result = pd.concat([df.iloc[subs,:] for subs in subset], axis = 0)
result

定义以下两个函数:

  1. 查找s(系列,更长(和lst之间的匹配(列表,更短(。

    def fndMatch(s, lst):
    len1 = s.size
    len2 = len(lst)
    for i1 in range(len1 - len2 + 1):
    i2 = i1 + len2
    if s.iloc[i1:i2].eq(lst).all():
    return (i1, i2)
    return (None, None) 
    

    当找到匹配时,结果是两个切片边界,否则为一对None值。

  2. 获取df的片段,其中hot_or_cold列匹配lst

    def getFragment():
    i1, i2 = fndMatch(df.hot_or_cold, lst)
    if i1 is None:
    return None
    else:
    return df.iloc[i1:i2]
    

当您调用它(getFragment()(时,结果是:

date  season  hot_or_cold
0  2012-01-01  Winter            0
1  2012-01-02  Winter            1
2  2012-01-03  Winter            0
3  2012-01-04  Winter            0
4  2012-01-05  Winter            0
5  2012-01-06  Winter            1

具有累积功能的其他方式

from itertools import accumulate
import pandas as pd 
def accum(x):
return list(accumulate(x))
lst = [0,1,0,0,0,1]
f = lambda x : accum([[i] for i in x])
b = df.groupby(['season'])['hot_or_cold'].apply(f)
df['col_accum2']  =  [(('Match ' if item[-len(lst):] == lst else 'NotMatch') if len(item) >= len(lst) else 'small list'  ) for subitem in b for item in subitem]

最新更新