仅按组保留行，其中第一个是某个值 SAS

>我已将数据排序为

ID value1 value2 
1    A       1
1    A       2
1    A       1
2    A       2
2    B       1
3    A       1
3    B       1
3    B       1

我想将此数据更改为一个新数据集，其中我只有最后一个值 1 为 B 的 ID 的数据，即它应该看起来像这样：

ID value1 value2 
2    A       2
2    B       1
3    A       1
3    B       1
3    B       1

我试过了

data want;
set have; 
by ID;
if last.value1 = 'B' then output;
run;

但这没有用。有人可以帮助我吗？提前感谢！

一种称为 DOW + 顺序循环的技术允许您：

循环访问组以计算某些状态变量。在这种情况下，变量用于跟踪状态是组最后一行中的值 1 = 'B'？
根据您的条件利用状态变量，同时循环访问同一组。在这种情况下，如果需要该组，则输出该行。此循环依赖于 DO 循环功能，该功能在循环初始化时确定一次限制。

原始数据必须按组变量排序。

data want;
do _n_ = 1 by 1 until (last.id);
set have;
by id;
end;
%* _n_ is group size at this point;
_want_group = value1 = 'B';  %* state variable -- is B in last row?;
do _n_ = 1 to _n_;   %* process the group again, using a different SET buffer;
set have;          %* same data set as in the do/until;
%* no by required because loop limit _n_ is group size;
if _want_group then
OUTPUT;          %* output all rows of group as desired;
end;
drop _want_group;
run;

最简单的方法是分多个步骤。第一步是创建一个数据集，其中包含最后一个值为"B"的所有 ID。

data tmp;
set have;
by ID value1;
if last.ID and value1='B' then output;
run;

现在，数据集tmp具有所需的所有 ID，因此您可以从原始数据集中选择这些 ID。

proc sql;
create table want as
select *
from have
where id in (select distinct id from tmp);
quit;

如果要在一个数据步骤中解决此问题，则可以按 value1 降序对原始表进行排序，然后运行以下数据步骤。

proc sort data=have;
by id descending value1;
run;
data want(where=(keep_flag));
set have;
by id descending value1;
retain keep_flag;
if first.id and value1='B' then keep_flag=1;
else if first.id then keep_flag=0;
run;

相关内容

最新更新

热门标签：