>我使用以下数据步骤将多个观测值连接成一个变量:
data Data_PreFinal;
set work.reasons;
by Number;
length Changes $4000.;
retain Changes;
if first.Number then Changes = EndoReason;
else Changes = catx(', ', Changes, EndoReason);
if last.Number then output;
run;
例如,我想确保如果数据集原因如下所示:
Number EndoReason
1 Bucket1
1 Bucket2
1 Bucket1
1 Bucket3
1 Bucket2
1 Bucket2
2 Bucket2
2 Bucket2
2 Bucket1
2 Bucket2
生成的数据集Data_PreFinal如下所示:
Number EndoReason
1 Bucket1, Bucket2, Bucket3
2 Bucket2, Bucket1
而不是列出 EndoReason 变量中的所有重复值。
任何帮助将不胜感激!
谢谢!
只需在当前 Changes 字符串中搜索特定行的值,并且仅在它不存在时才连接。index
函数是要使用的函数,我还稍微修改了您的代码以使用call catx
而不是catx
(我认为在这些情况下它更整洁(。
data reasons;
input Number EndoReason $;
datalines;
1 Bucket1
1 Bucket2
1 Bucket1
1 Bucket3
1 Bucket2
1 Bucket2
2 Bucket2
2 Bucket2
2 Bucket1
2 Bucket2
;
run;
data Data_PreFinal;
set work.reasons;
by Number;
length Changes $4000.;
retain Changes;
if first.Number then call missing(Changes);
if not index(Changes,trim(EndoReason)) then call catx(', ', Changes, EndoReason);
if last.Number then output;
run;
朋友!,也许先删除重复的观察结果会很有用。例如:
data reasons;
input Number EndoReason : $30.;
datalines;
1 Bucket1
1 Bucket2
1 Bucket1
1 Bucket3
1 Bucket2
1 Bucket2
2 Bucket2
2 Bucket2
2 Bucket1
2 Bucket2
;
*Only eliminate duplicates;
proc sort data=reasons out=reasons_nodup nodup;
by Number EndoReason;
run;
data Data_PreFinal;
set work.reasons_nodup;
by Number;
length Changes $4000.;
retain Changes;
if first.Number then Changes = EndoReason;
else Changes = catx(', ', Changes, EndoReason);
if last.Number then output;
drop EndoReason;
rename Changes = EndoReason;
run;
祝你好运!