我有99940个数据,有三列,第一列和第二列有id号,第三列有权重,就像下面的草图数据一样。有441个唯一id值,它们在第1列和第2列中重复。我想对id1进行分组,使每组都有三个连续的值,如果相应的id2与组内的id1具有相同的值,则求和权重。
data:
id1 id2 weight
1 3 10
1 4 10
1 7 10
1 8 10
2 1 10
2 5 10
3 2 10
4 3 10
4 6 10
5 3 10
6 4 10
7 2 10
8 1 10
result:
group(1)
id1 id2 weight selected
1 3 10 Yes (Because group1 has 1,2,3 and id1 is 1 and id2 is 3)
1 4 10 No
1 7 10 No
1 8 10 No
2 1 10 Yes (Because group1 has 1,2,3 and id1 is 2 and id2 is 1)
2 5 10 No
3 2 10 Yes (Because group1 has 1,2,3 and id1 is 3 and id2 is 2)
Weight = 30
group(2)
4 3 10 No
4 6 10 Yes (Because group2 has 4,5,6 and id1 is 4 and id2 is 6)
5 3 10 No
6 4 10 Yes (Because group2 has 4,5,6 and id1 is 4 and id2 is 4)
Weight=20
group(3)
7 2 10 No
8 1 10 No
等等
我尝试过使用grouppixels、sortrows来实现权重得分,并根据另一列查找列值,但我发现在创建组时遇到了困难。
您可以像这样使用ismember
函数:
首先,识别唯一的id1值
id1 = data(:, 1);
unique_id1 = unique(id1, 'stable');
然后,以3为一组循环遍历唯一id,并从data
中提取与三个id中的任何一个匹配的行。
weights = [];
groups = cell(0);
for ii = 1:3:length(unique_id1) - 2
% Pull out just the id1 values in this group
ids_in_group = unique_id1(ii:ii+2);
% answer has 1 if id1 is in ids_in_group, 0 otherwise
select_row_for_group = ismember(id1, ids_in_group);
% Logical indexing, select only rows with 1 in select_row_for_group
group_data = data(select_row_for_group, :);
% Append new group to our cell array
groups{end+1} = group_data;
% Select a row in the group for weight calculation if its id2 is in ids_in_group
select_row_for_weight = ismember(group_data(:, 2), ids_in_group);
% Select only the weights we want
selected_weights = group_data(select_row_for_weight, 3);
% Sum the selected weights
weightsum = sum(selected_weights);
% Append to weights array
weights(end+1) = weightsum;
end
现在你有了:
>> groups{1}
ans =
1 3 10
1 4 10
1 7 10
1 8 10
2 1 10
2 5 10
3 2 10
>> groups{2}
ans =
4 3 10
4 6 10
5 3 10
6 4 10
>> groups{3}
ans =
7 2 10
8 1 10
>> weights
weights =
30 20 0