给定的数据集"temp"如下。。
索引 | code1 | code2 | code3 |
---|---|---|---|
A | P1 | P2 | P3 |
B | P1 | P3 | P4 |
C | P2 | P4 | N1 |
怎么样
data have;
input (index code1 code2 code3)($);
datalines;
A P1 P2 P3
B P1 P3 P4
C P2 P4 N1
;
data temp;
set have;
array c code:;
do over c;
v = c;
d = 1;
output;
end;
run;
proc transpose data = temp out = want(drop = _:);
by index;
id v;
var d;
run;
您可以在不使用宏的情况下,通过在DATA
步骤中使用ARRAY
和VNAME
函数来实现这一点。
data want;
set have;
/* Initialize flag variables. */
length P1-P4 3 N1 3;
/* Define arrays. */
array code [*] code1-code3;
array flags [*] P1-P4 N1;
/* Loop over the arrays. */
do i = 1 to dim(flags);
flags[i] = 0;
do j = 1 to dim(code);
if vname(flags[i]) = code[j] then flags[i] = 1;
end;
end;
keep index P1-P4 N1;
run;
将值转换为变量名的最简单方法是通过PROC TRANSPOSE。因此,首先将您的宽数据集转换为高数据集。您可以使用PROC TRANSPORTE来实现这一点,但要使您的目标数据集PROC TRANSPOSE需要一些数字变量来转置。那么,为什么不使用数据步骤来创建高数据集,并包含一个设置为1的数字变量呢。
PROC TRANSPOSE步骤将为您提供一个新变量值为1或缺失的数据集。可以使用PROC STDIZE将缺失的值更改为零。
data have;
input index $ (code1-code3) (:$32.) ;
cards;
A P1 P2 P3
B P1 P3 P4
C P2 P4 N1
;
data tall;
set have ;
array code code1-code3;
length _name_ $32 dummy 8;
retain dummy 1;
do column=1 to dim(code);
_name_=code[column];
if not missing(_name_) then output;
end;
run;
proc transpose data=tall out=want(drop=_name_);
by index ;
id _name_;
var dummy;
run;
proc stdize reponly missing=0 data=want ;
var _numeric_;
run;
还有一种选择:
proc transpose data=have out=long;
by index;
var code:;
run;
data long2;
set long;
value = 1;
run;
proc transpose data=long2 out=wide;
by index;
id col1;
var value;
run;
/* Convert missing to zeroes */
data want;
set wide;
array vars _NUMERIC_;
do over vars;
if(vars = .) then vars = 0;
end;
drop _NAME_;
run;
输出:
index P1 P2 P3 P4 N1
A 1 1 1 0 0
B 1 0 1 1 0
C 0 1 0 1 1