I have been trying my brain and googling all of the place to figure out if there is an easy way to get the number of variables in a datafile purely using syntax. The reason for this is that I work with a lot of small files to which I need to merge new data. However, if you do this programatically, there is a chance due to wrongly types variable names, a new variable will be added to the dataset if it does not already exist.
Hence I want to know the number of variables before a merge and after a merge. For this I have tried to write a macro, but SPSS macro's are really terrible at arimetric and you cannot seem to feed these macro's with the keyword "ALL" (IE: all variables).
Additionally I have looked for syntax which exports info about the dataset to the output where the number of variables will be directly shown. As far as I could find, this did not exist.
So I have in fact managed to make it work via OMS (output management system), but the way is super convoluted and is a lot of lines of text per datafile. See below:
* Create datafile to check if merge went okay. DATA LIST LIST /FileName(A50) N_PRE(F8) N_POST(F8). DATASET NAME CheckList. * Open basefile. GET FILE='DATAFILE_BASE.sav'. DATASET NAME Data WINDOW=FRONT. * Set settings to show variable names instead of variable labels in output. SET Small=0.0001 THREADS=AUTO TVars=Names OVars=Labels TNumbers=Labels ONumbers=Labels DIGITGROUPING=No LEADZERO=No ODISPLAY=tables. * Use OMS to determine number of variables. DATASET DECLARE COUNT. OMS /SELECT TABLES /IF COMMANDS=['Frequencies'] SUBTYPES=['Frequencies'] /DESTINATION FORMAT=SAV NUMBERED=TableNumber_ OUTFILE='COUNT' VIEWER=YES /TAG='Frequencies'. FREQUENCIES ALL. OMSEND TAG = ['Frequencies']. * Retain only variable names and remove duplicates. DATASET ACTIVATE COUNT. ADD FILES FILE * /KEEP Label_,. EXECUTE. SORT CASES BY Label_(A). MATCH FILES /FILE=* /BY Label_ /FIRST=PrimaryFirst /LAST=PrimaryLast. DO IF (PrimaryFirst). COMPUTE MatchSequence=1-PrimaryLast. ELSE. COMPUTE MatchSequence=MatchSequence+1. END IF. LEAVE MatchSequence. FORMATS MatchSequence (f7). COMPUTE InDupGrp=MatchSequence>0. SORT CASES InDupGrp(D). MATCH FILES /FILE=* /DROP=PrimaryFirst InDupGrp MatchSequence. SELECT IF (PrimaryLast=1). EXECUTE. * Count variables and use OMS again to determine max. COMPUTE VarCount = $CASENUM. DATASET DECLARE PRECOUNT. OMS /SELECT TABLES /IF COMMANDS=['Descriptives'] SUBTYPES=['Descriptive Statistics'] /DESTINATION FORMAT=SAV NUMBERED=TableNumber_ OUTFILE='PRECOUNT' VIEWER=YES /TAG='Descriptives'. DESCRIPTIVES VARIABLES=VarCount /STATISTICS=MAX. OMSEND TAG = ['Descriptives']. DATASET ACTIVATE PRECOUNT. * Reduce to one line, cleanup and add identifiers. SELECT IF ~SYSMIS(Maximum). STRING FileName (A50). COMPUTE FileName = 'FILENAME.SAV'. RENAME VARIABLES (N = N_PRE). ADD FILES FILE * /KEEP FileName N_PRE. EXECUTE. DATASET CLOSE COUNT. * Merge data. GET FILE='DATAFILE_NEW.sav'. DATASET NAME MergeData WINDOW=FRONT. DATASET ACTIVATE Data. ADD FILES /FILE=* /FILE='MergeData'. EXECUTE. DATASET CLOSE MergeData. * Do another OMS run to check post_N. DATASET DECLARE COUNT. OMS /SELECT TABLES /IF COMMANDS=['Frequencies'] SUBTYPES=['Frequencies'] /DESTINATION FORMAT=SAV NUMBERED=TableNumber_ OUTFILE='COUNT' VIEWER=YES /TAG='Frequencies'. FREQUENCIES ALL. OMSEND TAG = ['Frequencies']. * Retain only variable names and remove duplicates. DATASET ACTIVATE COUNT. ADD FILES FILE * /KEEP Label_,. EXECUTE. SORT CASES BY Label_(A). MATCH FILES /FILE=* /BY Label_ /FIRST=PrimaryFirst /LAST=PrimaryLast. DO IF (PrimaryFirst). COMPUTE MatchSequence=1-PrimaryLast. ELSE. COMPUTE MatchSequence=MatchSequence+1. END IF. LEAVE MatchSequence. FORMATS MatchSequence (f7). COMPUTE InDupGrp=MatchSequence>0. SORT CASES InDupGrp(D). MATCH FILES /FILE=* /DROP=PrimaryFirst InDupGrp MatchSequence. SELECT IF (PrimaryLast=1). EXECUTE. * Count variables and use OMS again to determine max. COMPUTE VarCount = $CASENUM. DATASET DECLARE POSTCOUNT. OMS /SELECT TABLES /IF COMMANDS=['Descriptives'] SUBTYPES=['Descriptive Statistics'] /DESTINATION FORMAT=SAV NUMBERED=TableNumber_ OUTFILE='POSTCOUNT' VIEWER=YES /TAG='Descriptives'. DESCRIPTIVES VARIABLES=VarCount /STATISTICS=MAX. OMSEND TAG = ['Descriptives']. DATASET ACTIVATE POSTCOUNT. * Reduce to one line, cleanup and add identifiers. SELECT IF ~SYSMIS(Maximum). STRING FileName (A50). COMPUTE FileName = 'FILENAME.SAV'. RENAME VARIABLES (N = N_POST). ADD FILES FILE * /KEEP FileName N_POST. EXECUTE. DATASET CLOSE COUNT. * Merge the post and precount and add to checklist. DATASET ACTIVATE PRECOUNT. MATCH FILES /FILE=* /TABLE='POSTCOUNT' /BY FileName. EXECUTE. DATASET ACTIVATE CheckList. ADD FILES /FILE=* /FILE='PRECOUNT'. EXECUTE. DATASET CLOSE PRECOUNT. DATASET CLOSE POSTCOUNT. This does what I want. Get a pre-merge measurement, a post-merge measurement, links them together and adds them to a predifined checklist, which is processed at the end (simple post_n minus pre_n caluclation to show which files are off). But we're talking about 50-100 small datasets to be done this way, which is 150 lines-ish per dataset.
The syntax is generated through matlab with some specific inputs from a variable database, so writing it is not the problem. It's just a convoluted mess. Any ideas to streamline this code?
Regards.
EDIT: Thanks to @eli-k for supplying a much more elegant solution (the whole syntax previously took 4-5 minutes to run, this is much quicker, since it doesn't have to do a FREQUENCIES command on a big dataset every iteration).
I updated the macro a little bit to allow for some extra customization (and to allow a pre and post merge execution).
DEFINE !countVars (outputvar = !TOKENS(1) /datasetname = !TOKENS(1)) * Figure out number of variables from Dictionary. DATASET DECLARE tmp. OMS /SELECT TABLES /IF COMMANDS=['File Information'] SUBTYPES=['Variable Information'] /DESTINATION FORMAT=SAV OUTFILE='tmp' VIEWER=NO. DISPLAY DICTIONARY. OMSEND. DATASET DECLARE !datasetname. DATASET ACTIVATE tmp. OMS /SELECT TABLES /IF COMMANDS=['Frequencies'] SUBTYPES=['Frequencies'] /DESTINATION FORMAT=SAV OUTFILE=!datasetname VIEWER=NO /TAG='Frequencies'. FREQ COMMAND_. OMSEND TAG = ['Frequencies']. DATASET ACTIVATE !datasetname. DATASET CLOSE tmp. RENAME VARIABLES (Frequency = !outputvar). ADD FILES FILE * /KEEP !outputvar. EXECUTE. !ENDDEFINE. !countVars outputvar=N_PRE datasetname=PRECOUNT. STRING FileName (A50). COMPUTE FileName = 'FILENAME'. EXECUTE.