用组替换下一个可用值的SAS中的空值



我试图替换在SAS中第一个非NULL条目之前发生的缺失值。我有以下数据:

StudentID           Day          TestScore
Student001          0            .
Student001          1            78
Student001          2            89
Student002          3            . 
Student002          4            .
Student002          5            .
Student002          6            95

我想修改数据,以便将零值替换为下一个可用的非零条目:

StudentID           Day          TestScore
Student001          0            78
Student001          1            78
Student001          2            89
Student002          3            95 
Student002          4            95
Student002          5            95
Student002          6            95
data scores;
length StudentID $ 10;
input StudentID $ Day TestScore;
datalines;
Student001          0            .
Student001          1            78
Student001          2            89
Student002          3            . 
Student002          4            .
Student002          5            .
Student002          6            95 
;
run;
proc sort data = scores;
by descending day;
run;
data scores;
drop addscore;
retain addscore;
set scores;
if testscore ne . then addscore = testscore;
if testscore eq . then testscore = addscore;
run;
proc sort data = scores;
by day;
run;
proc sort data = have;
  by id descending day ;
run;
data want;
  set have;
  by id;
  retain last_score;
  if first.id then call missing(last_score);
  if not missing(score) then last_score = score;
  else score = last_score;
run;
proc sort data=want;
  by id day;
run;

fyi,如果给定ID的最后一个已知分数之后,则不会设置丢失值。即,如果您有类似的东西:

Student002          5            95
Student002          6            .

然后,仅在第5天之前的ID 002记录将获得95的值。这是您的可能条件吗?如果是,此解决方案将需要轻微的修改

您可以使用DOW环识别下一个非错过的分数,以及随后的DOW环路以应用非错过分数。DOW方法不需要分类并维护原始的行顺序。

data want;
  do _n_ = 1 by 1 until (last.id or not missing(score));
    set have;
    by id;
  end;
  _score = score;
  do _n_ = 1 to _n_;
    set have;
    score = _score;
    output;
  end;
  drop _score;
run;

在SQL中,假定的订单,可以在相关的子问题中查找估算的值。

proc sql;
  create table want as
  select 
    id, day, 
    case 
      when not missing(score) then score
      else (select score from have as inner 
            where inner.id = outer.id 
              and inner.day > outer.day
              and not missing(score)
              having inner.day = min(inner.day)
            )
    end as score
  from have as outer;

相关内容

  • 没有找到相关文章

最新更新