使用宏对数据集中的列进行按组求和



>我有一个数据集,看起来像:

Month   Cost_Center      Account    Actual    Annual_Budget
June     53410           Postage       13      234
June     53420           Postage       0       432
June     53430           Postage       48      643
June     53440           Postage       0       917
June     53710           Postage       92      662
June     53410           Phone         73      267
June     53420           Phone         103     669
June     53430           Phone         90      763
...

我想首先分别对"实际"和"年度"列求和,然后创建一个变量,如果全年的实际外推值大于"年度"列,它将在其中标记。

我有以下代码:

Data Test; 
set Combined;
%All_CC; /*MACRO TO INCLUDE ALL COST CENTERS*/
%Total_Other_Expenses;/*MACRO TO INCLUDE SPECIFIC Account Descriptions*/
Sum_Actual = sum(Actual);
Sum_Annual = sum(Annual_Budget);
Run_Rate = Sum_Actual*12;
if Run_Rate > Sum_Annual then Over_Budget_Alarm = 1;
run; 

但是,当我运行此代码时,它不会按组求和,例如,这是我得到的输出:

Account_Description    Sum_Actual Sum_Annual   Run_Rate  Over_Budget_Alarm
Postage             13      234             146           
Postage             0       432              0 
Postage             48      643             963            1
Postage             0       917             0
Postage             92      662             634            1

我正在寻找输出,其中所有"邮资"都汇总为实际和年度,只留下一行数据。

  1. 使用 PROC MEANS 汇总数据
  2. 使用数据步骤和 IF/THEN 语句创建标志。

    proc means data=have N SUM NWAY STACKODS;
    class account;
    var amount annual_budget;
    ods output summary = summary_stats1;
    output out = summary_stats2 N = SUM= / AUTONAME;
    run;
    data want;
    set summary_stats;
    if sum_actual > sum_annual_budget then flag=1; 
    else flag=0;
    run;
    

SAS DATA 步骤行为非常复杂(SAS 语言参考:概念中的"关于数据步骤执行"(。您看到的默认行为是:在每次迭代结束时(即对于每个输入行(,将行写入输出数据集,并重置 PDV - 所有数据步骤变量。

你不能指望"直观地"编写基本SAS,而不花几天时间先学习它,所以我建议使用PROC SQL,除非你有理由不这样做。

如果你真的想在数据步骤中聚合,你必须使用一种叫做 BY 组处理的东西:在确保输入数据集按 BY vars 排序后,你可以使用类似以下内容的东西:

data Test (keep = Month Account Sum_Actual Sum_Annual /*...your Run_Rate and Over_Budget_Alarm...*/);
set Combined; /* the input table */
by Month Account; /* must be sorted by these */
retain Sum_Actual Sum_Annual; /* don't clobber for each input row */
if first.account then do; /* instead do it manually for each group */
Sum_Actual = 0;
Sum_Annual = 0;
end;
/* accumulate the values from each row */
Sum_Actual = sum(Sum_Actual, Actual);
Sum_Annual = sum(Sum_Annual, Annual_Budget);
/* Note that Sum_Actual = Sum_Actual+Actual; will not work if any of the input values is 'missing'. */
if last.account then do;
/* The group has been processed.
Do any additional processing for the group as a whole, e.g.
calculate Over_Budget_Alarm. */
output; /* write one output row per group */
end;
run;
Proc SQL

对于理解聚合数据检查非常有效。 在没有看到宏的作用的情况下,我会说在输出数据集test后执行运行速率检查。

你不显示其他月份的行,但我必须假设annual_budget值在所有月份都是恒定的——如果是这样,我认为没有理由对annual_budget求和;将任何东西与sum(annual_budget)进行比较可能是在不正确的时间尺度上,没有用。

从展会数据中很难判断您是否想知道其中任何一个

  • 哪个(或如果某些(月份的run_rate超过了annual_budget
  • 哪个(或如果某些(月份run_rate超过annual_budget余额(即annual_budget减去前几个月的支出(

假设测试中的每一行是针对单个年份/月份/成本中心/帐户的 - 如果不是,则必须将基础数据聚合到该级别。

Proc SQL;
* retrieve presumed constant annual_budget values from data;
* this information might (should) already exist in another table;
* presume constant annual budget value at each cost center | account combination;
* distinct because there are multiple months with the same info;
create table annual_budgets as
select distinct Cost_Center, Account, Annual_Budget
from test;
create table account_budgets as
select account, sum(annual_budget) as annual_budget 
from annual_budgets
group by account;
* flag for some run rate condition;
create table annual_budget_mon_runrate_check as
select 
2019 as year,
account,
sum(actual) as yr_actual,  /* across all month/cost center */
min (
select annual_budget from account_budgets as inner
where inner.account = outer.account
) as account_budget,
max (
case when actual * 12 > annual_budget then 1 else 0 end
) as
excessive_runrate_flag label="At least one month had a cost center run rate that would exceed its annual_budget")
from 
test as outer
group by
year, account;

您可以添加where子句来限制处理account

将标志计算中的max更改为sum将返回运行率过高的成本中心月数。

最新更新