在WHERE子句/语句中使用INDEXW函数



我试图理解下面代码的行为。我觉得这似乎不对,但我希望有人能看看,以防我疯了或怎么了。

data testdata;
length var1 var2 $ 10;
var1 = "house";  var2 = "";       output;
var1 = "house";  var2 = "car";    output;
var1 = "house";  var2 = "house";  output;
run;
proc sql;
* Select all three obs- ok;
create table try1 as select * from testdata where indexw("house", var1);
* Selects one obs - ok;
create table try2 as select * from testdata where indexw("house", var2);
* Selects one obs - ok;
create table try3 as select * from testdata where indexw("car", var2);
* Selects all three obs - why?;
create table try4 as select * from testdata where indexw("house", var1) and indexw("house", var2);
* Selects one obs - ok;
create table try5 as select * from testdata where indexw("house", var1) and indexw("car", var2);
* Explicit comparison to zero - selects one obs - ok;
create table try6 as select * from testdata where indexw("house", var1) and (indexw("house", var2) ne 0);
* Compare to VAR2 first - selects one obs - ok;
create table try7 as select * from testdata where indexw("house", var2) and indexw("house", var1);
quit;

当我运行这段代码时,TRY1有三个观测值,而TRY2有一个观测值——VAR1="house"和VAR2 ="house"分别。这是我所期望的,基于此,我希望TRY4只包含VAR1和VAR2都是"房屋"的单个观察值。相反,TRY4从输入中选择所有三个观测值。

更奇怪的是,TRY6使用显式比较0,并且只选择预期的一个观察值,TRY7也是如此,它颠倒了比较的顺序。

类似的事情发生在数据步骤如果我使用一个声明,但如果我使用if语句:

114  data t;
115    set testdata;
116    where indexw("house", var1) and indexw("house", var2);
117  run;
NOTE: There were 3 observations read from the data set WORK.TESTDATA.
WHERE not (not INDEXW('house', var1));
NOTE: The data set WORK.T has 3 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time           0.00 seconds
cpu time            0.00 seconds

118  data t;
119    set testdata;
120    if indexw("house", var1) and indexw("house", var2);
121  run;
NOTE: There were 3 observations read from the data set WORK.TESTDATA.
NOTE: The data set WORK.T has 1 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time           0.00 seconds
cpu time            0.00 seconds

请注意,第一个数据步骤输出所有三个观测值,而第二个数据步骤只输出一个。

第一步的日志显示了一个线索——看起来SAS以一种改变其含义的方式重新解释了WHERE子句。

这是怎么回事?

在Windows 10上运行64位SAS 9.4 (TS1M7)

这个bug在SAS中至少早在9.4M5就存在了。直接联系SAS以获取有关此错误的详细信息。您可以在http://support.sas.com

处筹集门票。正如我所记得的,它与使用相同的函数有关(可能它还要求两种用法使用相同的第一个参数?),也如您的示例所示,将函数结果处理为布尔值。

两个表达式的相似性混淆了优化逻辑。您所展示的添加显式比较运算符的方法可以避免该错误。

where 0<indexw("house", var1) and 0<indexw("house", var2);

最新更新