如何使用MATLAB在一列中基于另一列对数据进行分组



我有99940个数据,有三列,第一列和第二列有id号,第三列有权重,就像下面的草图数据一样。有441个唯一id值,它们在第1列和第2列中重复。我想对id1进行分组,使每组都有三个连续的值,如果相应的id2与组内的id1具有相同的值,则求和权重。

data:
id1    id2     weight
1       3        10
1       4        10
1       7        10
1       8        10
2       1        10
2       5        10
3       2        10
4       3        10
4       6        10
5       3        10
6       4        10
7       2        10
8       1        10
result:
group(1)
id1    id2     weight   selected
1       3        10       Yes (Because group1 has 1,2,3 and id1 is 1 and id2 is 3)
1       4        10       No
1       7        10       No
1       8        10       No 
2       1        10       Yes (Because group1 has 1,2,3 and id1 is 2 and id2 is 1)
2       5        10       No 
3       2        10       Yes (Because group1 has 1,2,3 and id1 is 3 and id2 is 2)
Weight = 30
group(2)
4       3        10     No
4       6        10     Yes (Because group2 has 4,5,6 and id1 is 4 and id2 is 6)
5       3        10     No
6       4        10     Yes (Because group2 has 4,5,6 and id1 is 4 and id2 is 4)
Weight=20
group(3)
7       2        10     No
8       1        10     No

等等

我尝试过使用grouppixels、sortrows来实现权重得分,并根据另一列查找列值,但我发现在创建组时遇到了困难。

您可以像这样使用ismember函数:

首先,识别唯一的id1值

id1 = data(:, 1);
unique_id1 = unique(id1, 'stable');

然后,以3为一组循环遍历唯一id,并从data中提取与三个id中的任何一个匹配的行。

weights = [];
groups = cell(0);
for ii = 1:3:length(unique_id1) - 2
% Pull out just the id1 values in this group
ids_in_group = unique_id1(ii:ii+2);

% answer has 1 if id1 is in ids_in_group, 0 otherwise
select_row_for_group = ismember(id1, ids_in_group);
% Logical indexing, select only rows with 1 in select_row_for_group
group_data = data(select_row_for_group, :);
% Append new group to our cell array
groups{end+1} = group_data;
% Select a row in the group for weight calculation if its id2 is in ids_in_group
select_row_for_weight = ismember(group_data(:, 2), ids_in_group);
% Select only the weights we want
selected_weights = group_data(select_row_for_weight, 3);
% Sum the selected weights
weightsum = sum(selected_weights);
% Append to weights array
weights(end+1) = weightsum; 
end

现在你有了:

>> groups{1}
ans =
1     3    10
1     4    10
1     7    10
1     8    10
2     1    10
2     5    10
3     2    10
>> groups{2}
ans =
4     3    10
4     6    10
5     3    10
6     4    10
>> groups{3}
ans =
7     2    10
8     1    10
>> weights
weights =
30    20     0

相关内容

  • 没有找到相关文章

最新更新