我正在使用scipy.binned_statistic来获取箱内点的频率,这样:
h, xedge, yedge, binindex = scipy.stats.binned_statistic_2d(X, Y, Y, statistic='mean', bins=160)
我能够使用以下方法过滤掉某些垃圾箱:
filter = list(np.argwhere(h > 5).flatten())
由此,我可以从我感兴趣的数据的边缘和边缘中获取箱边/中心。
从这些感兴趣的箱中获取原始数据的最python方法是什么?例如,如何获取包含超过 5 个点的箱中包含的原始数据?
是的,这可以通过一些索引魔法来实现。我不确定这是否是最 Python 的方式,但它应该很接近。
使用stats.binned_statistic
的1d解决方案:
from scipy import stats
import numpy as np
values = np.array([1.0, 1.0, 2.0, 1.5, 3.0]) # not used with 'count'
x = np.array([1, 1, 1, 4, 7, 7, 7])
statistic, bin_edges, binnumber = stats.binned_statistic(x, values, 'count', bins=3)
print(statistic)
print(bin_edges)
print(binnumber)
# find the bins with equal or more than three events
# if you are using custom bins where events can be lower or
# higher than your specified bins -> handle this
# get the bin numbers according to some condition
idx_bin = np.where(statistic >= 3)[0]
print(idx_bin)
# A binnumber of i means the corresponding value is
# between (bin_edges[i-1], bin_edges[i]).
# -> increment the bin indices by one
idx_bin += 1
print(idx_bin)
# the rest is easy, get the boolean mask and apply it
is_event = np.in1d(binnumber, idx_bin)
events = x[is_event]
print(events)
对于 2d 或 nd,您可以多次使用上述解决方案,并使用np.logical_and
(2d( 或np.logical_and.reduce((x, y, z))
(nd,请参阅此处( 为每个维度组合is_event
掩码。
使用stats.binned_statistic_2d
的 2D 解决方案基本相同:
from scipy import stats
import numpy as np
x = np.array([1, 1.5, 2.0, 4, 5.5, 1.5, 7, 1])
y = np.array([1.0, 7.0, 1.0, 3, 7, 7, 7, 1])
values = np.ones_like(x) # not used with 'count'
# check keyword expand_binnumbers, use non-linearized
# as they can be used as indices without flattening
ret = stats.binned_statistic_2d(x,
y,
values,
'count',
bins=2,
expand_binnumbers=True)
print(ret.statistic)
print('binnumber', ret.binnumber)
binnumber = ret.binnumber
statistic = ret.statistic
# find the bins with equal or more than three events
# if you are using custom bins where events can be lower or
# higher than your specified bins -> handle this
# get the bin numbers according to some condition
idx_bin_x, idx_bin_y = np.where(statistic >= 3)#[0]
print(idx_bin_x)
print(idx_bin_y)
# A binnumber of i means the corresponding value is
# between (bin_edges[i-1], bin_edges[i]).
# -> increment the bin indices by one
idx_bin_x += 1
idx_bin_y += 1
print(idx_bin_x)
print(idx_bin_y)
# the rest is easy, get the boolean mask and apply it
is_event_x = np.in1d(binnumber[0], idx_bin_x)
is_event_y = np.in1d(binnumber[1], idx_bin_y)
is_event_xy = np.logical_and(is_event_x, is_event_y)
events_x = x[is_event_xy]
events_y = y[is_event_xy]
print('x', events_x)
print('y', events_y)