我一直在尝试我的大脑和谷歌搜索所有的地方,以找出是否有一个简单的方法来获得数据文件中的变量的数量纯粹使用语法。这样做的原因是我需要处理很多小文件,我需要合并新数据。但是,如果您以编程方式执行此操作,则有可能由于变量名称类型错误而导致数据集中不存在的新变量被添加。
因此,我想知道合并前和合并后的变量数量。为此,我试着写了一个宏,但是SPSS宏在计量方面真的很糟糕,而且你似乎不能用关键字"all"来输入这些宏。(IE:所有变量).
此外,我还寻找了将数据集信息导出到将直接显示变量数量的输出的语法。据我所知,这是不存在的。
所以我实际上已经设法通过OMS(输出管理系统)使其工作,但这种方式非常复杂,并且每个数据文件都有很多行文本。见下文:
* Create datafile to check if merge went okay.
DATA LIST LIST /FileName(A50) N_PRE(F8) N_POST(F8).
DATASET NAME CheckList.
* Open basefile.
GET FILE='DATAFILE_BASE.sav'.
DATASET NAME Data WINDOW=FRONT.
* Set settings to show variable names instead of variable labels in output.
SET Small=0.0001 THREADS=AUTO TVars=Names OVars=Labels TNumbers=Labels ONumbers=Labels DIGITGROUPING=No LEADZERO=No ODISPLAY=tables.
* Use OMS to determine number of variables.
DATASET DECLARE COUNT.
OMS
/SELECT TABLES
/IF COMMANDS=['Frequencies'] SUBTYPES=['Frequencies']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='COUNT' VIEWER=YES
/TAG='Frequencies'.
FREQUENCIES ALL.
OMSEND TAG = ['Frequencies'].
* Retain only variable names and remove duplicates.
DATASET ACTIVATE COUNT.
ADD FILES FILE *
/KEEP Label_,.
EXECUTE.
SORT CASES BY Label_(A).
MATCH FILES
/FILE=*
/BY Label_
/FIRST=PrimaryFirst
/LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE MatchSequence=MatchSequence+1.
END IF.
LEAVE MatchSequence.
FORMATS MatchSequence (f7).
COMPUTE InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
/FILE=*
/DROP=PrimaryFirst InDupGrp MatchSequence.
SELECT IF (PrimaryLast=1).
EXECUTE.
* Count variables and use OMS again to determine max.
COMPUTE VarCount = $CASENUM.
DATASET DECLARE PRECOUNT.
OMS
/SELECT TABLES
/IF COMMANDS=['Descriptives'] SUBTYPES=['Descriptive Statistics']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='PRECOUNT' VIEWER=YES
/TAG='Descriptives'.
DESCRIPTIVES VARIABLES=VarCount
/STATISTICS=MAX.
OMSEND TAG = ['Descriptives'].
DATASET ACTIVATE PRECOUNT.
* Reduce to one line, cleanup and add identifiers.
SELECT IF ~SYSMIS(Maximum).
STRING FileName (A50).
COMPUTE FileName = 'FILENAME.SAV'.
RENAME VARIABLES (N = N_PRE).
ADD FILES FILE *
/KEEP FileName N_PRE.
EXECUTE.
DATASET CLOSE COUNT.
* Merge data.
GET FILE='DATAFILE_NEW.sav'.
DATASET NAME MergeData WINDOW=FRONT.
DATASET ACTIVATE Data.
ADD FILES /FILE=*
/FILE='MergeData'.
EXECUTE.
DATASET CLOSE MergeData.
* Do another OMS run to check post_N.
DATASET DECLARE COUNT.
OMS
/SELECT TABLES
/IF COMMANDS=['Frequencies'] SUBTYPES=['Frequencies']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='COUNT' VIEWER=YES
/TAG='Frequencies'.
FREQUENCIES ALL.
OMSEND TAG = ['Frequencies'].
* Retain only variable names and remove duplicates.
DATASET ACTIVATE COUNT.
ADD FILES FILE *
/KEEP Label_,.
EXECUTE.
SORT CASES BY Label_(A).
MATCH FILES
/FILE=*
/BY Label_
/FIRST=PrimaryFirst
/LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE MatchSequence=MatchSequence+1.
END IF.
LEAVE MatchSequence.
FORMATS MatchSequence (f7).
COMPUTE InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
/FILE=*
/DROP=PrimaryFirst InDupGrp MatchSequence.
SELECT IF (PrimaryLast=1).
EXECUTE.
* Count variables and use OMS again to determine max.
COMPUTE VarCount = $CASENUM.
DATASET DECLARE POSTCOUNT.
OMS
/SELECT TABLES
/IF COMMANDS=['Descriptives'] SUBTYPES=['Descriptive Statistics']
/DESTINATION FORMAT=SAV NUMBERED=TableNumber_
OUTFILE='POSTCOUNT' VIEWER=YES
/TAG='Descriptives'.
DESCRIPTIVES VARIABLES=VarCount
/STATISTICS=MAX.
OMSEND TAG = ['Descriptives'].
DATASET ACTIVATE POSTCOUNT.
* Reduce to one line, cleanup and add identifiers.
SELECT IF ~SYSMIS(Maximum).
STRING FileName (A50).
COMPUTE FileName = 'FILENAME.SAV'.
RENAME VARIABLES (N = N_POST).
ADD FILES FILE *
/KEEP FileName N_POST.
EXECUTE.
DATASET CLOSE COUNT.
* Merge the post and precount and add to checklist.
DATASET ACTIVATE PRECOUNT.
MATCH FILES /FILE=*
/TABLE='POSTCOUNT'
/BY FileName.
EXECUTE.
DATASET ACTIVATE CheckList.
ADD FILES /FILE=*
/FILE='PRECOUNT'.
EXECUTE.
DATASET CLOSE PRECOUNT.
DATASET CLOSE POSTCOUNT.
这是我想要的。获取合并前的度量,合并后的度量,将它们链接在一起,并将它们添加到预定义的检查表中,该检查表将在最后进行处理(简单的post_n减去pre_n计算,以显示哪些文件关闭)。但我们讨论的是50-100个小数据集,每个数据集大约150行。
语法是通过matlab与来自变量数据库的一些特定输入生成的,因此编写它不是问题。这只是一个令人费解的混乱。任何想法精简代码吗?
致意。
编辑:感谢@eli-k提供了一个更优雅的解决方案(整个语法以前需要4-5分钟才能运行,这要快得多,因为它不需要在每次迭代中对大数据集执行频率命令)。
我稍微更新了一下宏,以允许一些额外的定制(并允许合并前和合并后的执行)。
DEFINE !countVars (outputvar = !TOKENS(1)
/datasetname = !TOKENS(1))
* Figure out number of variables from Dictionary.
DATASET DECLARE tmp.
OMS /SELECT TABLES /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information']
/DESTINATION FORMAT=SAV OUTFILE='tmp' VIEWER=NO.
DISPLAY DICTIONARY.
OMSEND.
DATASET DECLARE !datasetname.
DATASET ACTIVATE tmp.
OMS
/SELECT TABLES /IF COMMANDS=['Frequencies'] SUBTYPES=['Frequencies']
/DESTINATION FORMAT=SAV OUTFILE=!datasetname VIEWER=NO
/TAG='Frequencies'.
FREQ COMMAND_.
OMSEND TAG = ['Frequencies'].
DATASET ACTIVATE !datasetname.
DATASET CLOSE tmp.
RENAME VARIABLES (Frequency = !outputvar).
ADD FILES FILE * /KEEP !outputvar.
EXECUTE.
!ENDDEFINE.
!countVars outputvar=N_PRE datasetname=PRECOUNT.
STRING FileName (A50).
COMPUTE FileName = 'FILENAME'.
EXECUTE.
这里有一个方法,同时使用OMS和宏。宏将使用OMS为您提供活动数据集中的变量计数。
define !countVars ()
dataset name orig.
DATASET DECLARE tmp.
OMS /SELECT TABLES /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information']
/DESTINATION FORMAT=SAV OUTFILE='tmp' VIEWER=no.
display dictionary.
omsend.
dataset activate tmp.
freq Command_.
dataset activate orig.
dataset close tmp.
!enddefine.
现在宏已经定义好了,你可以在语法的任何地方调用它,像这样:
!countVars .
,在&;frequency&;下查看活动数据集中变量的个数。在输出表中
只是为了完整:使用SPSS内置的python方法获取变量计数非常简单。
begin program.
import spss
print(spss.GetVariableCount())
end program.
这个代码片段只是将活动数据集中的变量数量打印到输出窗口。在整个脚本中周期性地为变量赋值将使跟踪多个合并中的变量数量变得非常容易。