如何打乱列表，以便特定长度的每个子列表都有唯一的项目?

给定：随机随机排列的唯一话语的 ID 列表和各个话语所属的说话人列表(顺序相同(

问题：如何对uttIDList重新排序，以便每个包含32个元素的子列表，从步长为32的第一个元素开始，包含来自不同说话人的话语？重要的是，如果我们在不同的随机列表上重新运行算法，我们也会得到不同的子列表。

例如：

uttIDList = [1, 0, 9, 7, 100, 2, 3, 8301, ...] (length dividable by 32)
spkIDList = [0, 0, 3, 2, 1, 4, 20, 4, ...] 
sublist0 = uttIDList[0:32]
sublist1 = uttIDList[32:64]
...
sublistN = uttIDList[N-32,N]

完全忽略原始排序。列出说话者和话语的参考列表(例如词典(。将问题旋转四分之一圈：您将从头开始，将每个说话人的话语分布在子列表中。

将k列表初始化为空，其中k = N/32
检查您的话语词典：如果任何说话人有超过k个话语，请删除多余的话语。
按话语数量的降序对说话人列表进行排序;这将有助于避免游戏结束的问题。
对每个扬声器重复以下操作：
- 列出所有尚未满的子列表(即 len(子列表(<32(
- 设 i = len(说话人的话语列表(
- 使用random.sample对未填充的子列表(大小i(随机抽样
- 将说话人的话语追加到所选列表中。

这在大多数情况下都有效;最终可能会得到一个具有 2 个开放插槽的子列表，而最后一个说话人有 2 个要放置的话语。在现实生活中，一个简单的交换就可以解决畸变。

这能让你走吗？

这应该可以做到。

# first map the id of each speak to a list of the indices that correspond to the speaker
spk_indices_map = {}
for i, spk_id in enumerate(spkIDList):
if spk_id not in spk_indices_map:
spk_indices_map[spk_id] = []
spk_indices_map[spk_id].append(i)
# next shuffle the order of the indices for each speaker
# this still preserves which indices correspond to each speaker
for spk_id in spk_indices_map:
shuffle(spk_indices_map[spk_id])
# the shuffled utterance and speaker lists with the desired properties
shuffled_uttIDList = []
shuffled_spkIDList = []
done = False
while not done:
# while every speaker has at least one utterance not in the shuffled lists
for spk_id in spk_indices_map:
# add an utterance from each speaker to the shuffled lists
if not spk_indices_map[spk_id]:
done = True
break
else:
index = spk_indices_map[spk_id].pop()
shuffled_uttIDList.append(uttIDList[index])
shuffled_spkIDList.append(spkIDList[index])
print(shuffled_uttIDList)
print(shuffled_spkIDList)

我们跟踪与每个说话人对应的话语的所有索引。然后，对于每个发言者，我们打乱其索引的顺序。对于按顺序排列的每个说话人，我们从随机排列的索引列表中获取一个话语。

如果并非每个说话人都有相同数量的话语，则随机排列列表中的最后一个子列表将小于所需的大小，可以忽略。

相关内容

最新更新

热门标签：