在Numpy数组列中搜索3个或多个连续值.然后从另一列中获取一个值



本质上,我想扫描numpy数组列中的3个或连续值。如果有3个或更多的连续值,我想从同一行不同的列中取另一个值,用于连续值开始和结束的位置。

示例

numpy arr = [
[2, 7, 2, 1]
[1, 2, 3, 4]
[4, 6, 6, 4]
[8, 2, 6, 4]
[9, 3, 1, 4]
[2, 7, 2, 1]
]

来自上面的数组。我想扫描第4列,看看第4列是否连续出现3次以上。如果是这样,我想从第二列中获取值,在第二列开始和结束,并将其存储在另一个数组中。在这种情况下,它将是3和1

您可以使用panda来实现这一点,方法是移动要比较的列以检测更改,并计算该列重复的次数。

您没有指定如果同一系列数字有多个重复会发生什么情况,所以我将提供一个通用的解决方案。如果你事先知道同一个数字序列不能再重复,你可能会简化这个解决方案。

# Imports and define data
import numpy as np
import pandas as pd
data = [[2, 7, 2, 1],
[1, 2, 3, 4],
[4, 6, 6, 4],
[8, 2, 6, 4],
[9, 3, 1, 4],
[2, 7, 2, 1]]
df = pd.DataFrame(data, columns=['A', 'B', 'C', 'D'])
# Compare the last column, see where we have a change and label it 1
df['shift'] = df['D'].shift()
df['change'] = np.where(df['D'] == df['shift'], 0, 1)
# Assign a group number for each change (in case same sequence repeats later)
df['group'] = df['change'].cumsum()
# Build a dictionary mapping no. of repeats to group number and assign back to df
consecutives = df.groupby('group')['D'].count()
df['num_consecutives'] = df['group'].map(consecutives)
# Specify the number of consecutives to filter by, group by the "group" col
# and the last col in case there are repeats, so you can identify each instance
# of the first and last appearances, then find the first and last values of 
# the col of interest. You mention 3 and 1, so I assume that's the third col.
df[df['num_consecutives']>3].groupby(['group', 'D'])['C'].agg(['first', 'last'])

相关内容

最新更新