逐行从二进制数组中随机选择百分比?



我有一个二进制数组...我想要的是能够从每一行中挑选特定百分比的...... 假设每行 100 个,我想从第一行随机返回 20%,从第二行随机返回 10%, 40% 来自第 3 名,30% 来自第 4 名(当然总共 100%(。

0| 00000000001000000010000000000000000000001000000100000000000000000000000000000001 ... 
1| 00000000000000010000000000001000000000000100000000000000000000000000000000000000 ... 
2| 00000000000000000000000000000010010000000000000000000000000000010000100000000000 ... 
3| 01000000000000100000000000000000000000001000100000000000000010000000000000000000 ... 

这很容易,只需在每一行上做random.choice(one_idxs,%(。问题是目标位数也必须是 100 位...... 即,如果某些位重叠并且随机选择它们,则总数将不同于 100 位。

另外,在每一行上,它应该尝试选择以前未选择的位,至少作为一个选项!

任何想法


例如,我用于简单情况的代码(它不考虑所选索引是否跨行重复,仅在一行内(:

for every row :
ones_count = 100
bits_cnt = int(ones_count * probs[i])
idxs = array.get_row(i).one_idxs()
selected = np.random.choice(idxs, size=bits_cnt, replace=False)

只需要选择那些..这就是我使用索引的原因

为了方便起见,使用字符串列表而不是位数组并获取 4 个样本...

In [39]: data = ['10000101', 
...:         '11110000', 
...:         '00011000']                                                    
In [40]: idxs = random.sample(range(len(data[0])), 4)                           
In [41]: # 20% row 1, 30% row 2, 50% row 3                                      
In [42]: row_selections = random.choices(range(len(data)), [0.2, 0.3, 0.5], k=len(idxs))                                                               
In [43]: idxs                                                                   
Out[43]: [7, 3, 1, 4]
In [44]: row_selections                                                         
Out[44]: [0, 2, 0, 1]
In [45]: picks = [ data[r][c] for (r, c) in zip(row_selections, idxs)]          
In [46]: picks                                                                  
Out[46]: ['1', '1', '0', '0']

好的,根据您的评论,这应该可以更好地作为如何仅从每个列表/数组中按比例选择一个的示例:

import random
a1= '10001010111110001101010101'
a2= '00101010001011010010100010'
a1 = [int(t) for t in a1]
a2 = [int(t) for t in a2]
a1_one_locations= [idx for idx, v in enumerate(a1) if v==1]
a2_one_locations= [idx for idx, v in enumerate(a2) if v==1]
# lists of indices where 1 exists in each list...
print(a1_one_locations)
print(a2_one_locations)
n_samples = 6 # total desired
# 40% from a1, remainder from a2
a1_samples = int(n_samples * 0.4)
a2_samples = n_samples - a1_samples
a1_picks = random.sample(a1_one_locations, a1_samples)
a2_picks = random.sample(a2_one_locations, a2_samples)
# print results
print('indices from a1: ', a1_picks)
print('indices from a2: ', a2_picks)

输出:

[0, 4, 6, 8, 9, 10, 11, 12, 16, 17, 19, 21, 23, 25]
[2, 4, 6, 10, 12, 13, 15, 18, 20, 24]
indices from a1:  [6, 21]
indices from a2:  [10, 15, 4, 20]

最新更新