Programmability in SPSS 14: A Radical Increase in Power

Programmability in SPSS 14:
A Radical Increase in Power
A Platform for Statistical Applications
Jon K. Peck
Technical Advisor
SPSS Inc.
[email protected]
May, 2006
Copyright (c) SPSS Inc, 2006
The Five Big Things
1.
External Programming Language (BEGIN PROGRAM)
2.
Multiple Datasets
3.
XML Workspace and OMS Enhancements
4.
Dataset and Variable Attributes
5.
Drive SPSS Processor Externally
Working together, they dramatically increase
the power of SPSS.
SPSS becomes a platform that enables you to
build statistical/data manipulation applications.
GPL provides new programming power for graphics.
Copyright (c) SPSS Inc, 2006
Multiple Datasets

Many datasets open at once

One is active at a time (set by syntax or UI)

DATASET ACTIVATE command

Each dataset has a Data Editor window

Copy, paste, and merge between windows

Write tabular results to a dataset using Output
Management System


Retrieve via Programmability
No longer necessary to organize jobs linearly
Copyright (c) SPSS Inc, 2006
XML Workspace

Store dictionary and selected results in workspace

Write results to workspace as XML with Output
Management System (OMS)

Retrieve selected contents from workspace via
external programming language

Persists for entire session
Copyright (c) SPSS Inc, 2006
OMS Output: XML or Dataset

Write tabular results to Datasets with OMS


Main dataset remains active
Prior to SPSS 14, write to SAV file, close active, and open to use
results

Tables can be accessed via workspace or as datasets

XML workspace and XPath accessors are very general


Accessed via programmability functions
Dataset output more familiar to SPSS users


Accessed via programmability functions or traditional SPSS syntax
Use with DATASET ACTIVATE command
Copyright (c) SPSS Inc, 2006
Attributes

Extended metadata for files and variables

VARIABLE ATTRIBUTE, DATAFILE ATTRIBUTE

Keep facts and notes about data permanently with
the data. E.g., validation rules, source, usage,
question text, formula

Two kinds: User defined and SPSS defined

Saved with the data in the SAV file

Can be used in program logic
Copyright (c) SPSS Inc, 2006
Programmability

Integrates external programming language into SPSS syntax



SPSS has integrated the Python language



BEGIN PROGRAM … END PROGRAM
set of functions to communicate with SPSS
SDK enabling other languages available
New: VB.NET available soon
External processes can drive SPSS Processor

VB.NET works only in this mode

SPSS Developer Central has SDK, Python Integration Plug-In,
and many extension modules

Available for all SPSS 14 platforms
Copyright (c) SPSS Inc, 2006
The Python Language

Free, portable, elegant, object oriented, versatile,
widely supported, easy to learn,…

Download from Python.org.

Version 2.4.1 or later required

Python tutorial

Python user discussion list

The Cheeseshop: Third-party modules
Copyright (c) SPSS Inc, 2006
Legal Notice

SPSS is not the owner or licensor of the Python
software. Any user of Python must agree to the
terms of the Python license agreement located on
the Python web site. SPSS is not making any
statement about the quality of the Python program.
SPSS fully disclaims all liability associated with
your use of the Python program.
Copyright (c) SPSS Inc, 2006
Programmability Enables…

Generalized jobs by controlling logic based on

Variable Dictionary
 Procedure output (XML or datasets)
 Case data (requires SPSS 14.0.1)
 Environment

Enhanced data management

Manipulation of output

Computations not built in to SPSS

Use of intelligent Python IDE driving SPSS (14.0.1)


statement completion, syntax checking, and debugging
External Control of SPSS Processor
Copyright (c) SPSS Inc, 2006
Programmability Makes
Obsolete…

SPSS Macro



except as a shorthand for lists or constants
Learning Python is much easier than learning Macro
SaxBasic

except for autoscripts

but autoscripts become less important

These have not gone away.

The SPSS transformation language continues to be
important.
Copyright (c) SPSS Inc, 2006
Demonstration
Code and supporting modules can be downloaded
from SPSS Developer Central
examples are on the CD
Copyright (c) SPSS Inc, 2006
Initialization for Examples

* SPSS Directions, May 2006.
* In preparation for the examples, specify where SPSS
standard data files reside.
BEGIN PROGRAM.
import spss, spssaux
spssaux.GetSPSSInstallDir("SPSSDIR")
END PROGRAM.

This program creates a File Handle pointing to the SPSS installation
directory, where the sample files are installed
Copyright (c) SPSS Inc, 2006
Example 0: Hello, world
* EXAMPLE 0: My first program.
BEGIN PROGRAM.
import spss
print "Hello, world!"
END PROGRAM.

Inside BEGIN PROGRAM, you write Python code.

import spss connects program to SPSS.

Import needed once per session.

Output goes to Viewer log items.

Executed when END PROGRAM reached.
Run
Copyright (c) SPSS Inc, 2006
Example 1: Run SPSS Command
*Run an SPSS command from a program; create file handle.
BEGIN PROGRAM.
import spss, spssaux
spss.Submit("SHOW ALL.")
spssaux.GetSPSSInstallDir("SPSSDIR")
END PROGRAM.

Submit, in module spss is called to run one or more SPSS
commands within BEGIN PROGRAM.

One of many functions (API's) that interacts with SPSS.

GetSPSSInstallDir, in the spssaux module, creates a FILE
HANDLE to that directory
Run
Copyright (c) SPSS Inc, 2006
Example 2: Some API's
* Print useful information in the Viewer and then get help
on an API.
BEGIN PROGRAM.
spss.Submit("GET FILE='SPSSDIR/employee data.sav'.")
varcount = spss.GetVariableCount()
casecount = spss.GetCaseCount()
print "The number of variables is " + str(varcount) + "
and the number of cases is " + str(casecount)
print help(spss.GetVariableCount)
END PROGRAM.

There are API's in the spss module to get variable dictionary
information.

help function prints short API documentation in Viewer.
Run
Copyright (c) SPSS Inc, 2006
Example 3a: Data-Directed
Analysis
* Summarize variables according to measurement level.
BEGIN PROGRAM.
import spss, spssaux
spssaux.OpenDataFile("SPSSDIR/employee data.sav")
# make variable dictionaries by measurement level
catVars = spssaux.VariableDict(variableLevel=['nominal',
'ordinal'])
scaleVars = spssaux.VariableDict(variableLevel=['scale'])
print "Categorical Variables\n"
for var in catVars:
print var, var.VariableName, "\t", "var.VariableLabel"
Continued
Copyright (c) SPSS Inc, 2006
Example 3a (continued)
# summarize variables based on measurement level
if catVars:
spss.Submit("FREQ " + " ".join(catVars.variables))
if scaleVars:
spss.Submit("DESC "+" ".join(scaleVars.variables))
# create a macro listing scale variables
spss.SetMacroValue("!scaleVars", "
".join(scaleVars.variables))
END PROGRAM.
DESC !scaleVars.
" ".join(['x', 'y', 'z']) produces
'x y z'
Run
Copyright (c) SPSS Inc, 2006
Example 5: Handling Errors
* Handle an error.
BEGIN PROGRAM.
import sys
Use another standard Python module.
try:
spss.Submit("foo.")
except:
print "That command did not work!
END PROGRAM.

Errors generate exceptions


", sys.exc_info()[0]
Makes it easy to check whether a long syntax job worked
Hundreds of standard modules and many others available from
SPSS and third parties
Run
Copyright (c) SPSS Inc, 2006
Example 8: Create Basis Variables
* Create set of dummy variables for a categorical
variable and a macro name for them.
BEGIN PROGRAM.
import spss, spssaux, spssaux2
mydict = spssaux.VariableDict()
spssaux2.CreateBasisVariables(mydict.["educ"],
"EducDummy", macroname = "!EducBasis")
spss.Submit("REGRESSION /STATISTICS=COEF /DEP=salary"
+ "/ENTER=jobtime prevexp !EducBasis.")
END PROGRAM.

Discovers educ values from the data and generates
appropriate transformation commands.

Creates macro !EducBasis
Run
Copyright (c) SPSS Inc, 2006
Example 9: Merge Directory
Contents
* Automatically add cases from all SAV files in a directory.
BEGIN PROGRAM.
import glob
savlist = glob.glob("c:/temp/parts/*.sav")
if savlist:
cmd = ["ADD FILES "] +
["/FILE='" + fn + "'" for fn in savlist] +
[".", "EXECUTE."]
spss.Submit(cmd)
print "Files merged:\n", "\n".join(savlist)
else:
print "No files found to merge"
END PROGRAM.

The glob module resolves file-system wildcards

If savlist tests whether there are any matching files.
Run
Copyright (c) SPSS Inc, 2006
Example 10: Use Parts of Output XML
* Run regression; get selected statistics, but do not display the
regular Regression output. Use OMS and Xpath wrapper functions.
BEGIN PROGRAM.
import spss, spssaux
spssaux.OpenDataFile("SPSSDIR/CARS.SAV")
try:
handle, failcode = spssaux.CreateXMLOutput(\
"REGRESSION /DEPENDENT accel /METHOD=ENTER weight horse
year.", visible=False)
horseCoef = spssaux.GetValuesFromXMLWorkspace(\
handle, "Coefficients", rowCategory="Horsepower",
colCategory="B",cellAttrib="number")
print "The effect of horsepower on acceleration is: ",
horseCoef
Rsq = spssaux.GetValuesFromXMLWorkspace(\
handle, "Model Summary", colCategory="R Square",
cellAttrib="text")
print "The R square is: ", Rsq
spss.DeleteXPathHandle(handle)
except:
print "*** Regression command failed. No results available."
raise
END PROGRAM.
Run
Copyright (c) SPSS Inc, 2006
Example 11: Transformations in
Python Syntax
BEGIN PROGRAM.
import spss, Transform
spssaux.OpenDataFile('SPSSDIR/employee data.sav')
newvar = Transform.Compute(varname="average_increase",
varlabel="Salary increase per month of experience
if at least a year",\
varmeaslvl="Scale",\
varmissval=[999,998,997],\
varformat="F8.4")
newvar.expression = "(salary-salbegin)/jobtime"
newvar.condition = "jobtime > 12"
newvar.retransformable=True
newvar.generate()
# Get exception if compute fails
Transform.timestamp("average_increase")
spss.Submit("DISPLAY DICT /VAR=average_increase.")
spss.Submit("DESC average_increase.")
END PROGRAM.
Run
Copyright (c) SPSS Inc, 2006
Example 11A: Repeat Transform
BEGIN PROGRAM.
import spss, Transform
try:
Transform.retransform("average_increase")
Transform.timestamp("average_increase")
except:
print "Could not update average_increase."
else:
spss.Submit("display dictionary"+\
"/variable=average_increase.")
END PROGRAM.

Transformation saved using Attributes
Run
Copyright (c) SPSS Inc, 2006
Example 12: Controlling the
Viewer Using Automation
BEGIN PROGRAM.
import spss, viewer
spss.Submit("DESCRIPTIVES ALL")
spssapp = viewer.spssapp()
try:
actualName = spssapp.SaveDesignatedOutput(\
"c:/temp/myoutput.spo")
except:
print "Save failed. Name:", actualName
else:
spssapp.ExportDesignatedOutput(\
"c:/temp/myoutput.doc", format="Word")
spssapp.CloseDesignatedOutput()
END PROGRAM.
Run
Copyright (c) SPSS Inc, 2006
Example 13: A New Procedure
Poisson Regression
BEGIN PROGRAM.
import spss, spssaux
from poisson_regression import *
spssaux.OpenDataFile(\
'SPSSDIR/Tutorial/Sample_Files/autoaccidents.sav')
poisson_regression("accident", covariates=["age"],
factors=["gender"])
END PROGRAM.

Poisson regression module built from SPSS CNLR and
transformations commands.

PROGRAMS can get case data and use other Python
modules or code on it.
Run
Copyright (c) SPSS Inc, 2006
Example 14: Using Case Data
* Mean salary by education level.
BEGIN PROGRAM.
import spssdata
data = spssdata.Spssdata(indexes=('salary', 'educ'))
Counts ={}; Salaries={}
for case in data:
cat = int(case.educ)
Counts[cat] = Counts.get(cat, 0) + 1
Salaries[cat] = Salaries.get(cat,0) + case.salary
print "educ mean salary\n"
for cat in sorted(Counts):
print " %2d
$%6.0f" % (cat,
Salaries[cat]/Counts[cat])
del data
END PROGRAM.
Run
Copyright (c) SPSS Inc, 2006
Example 14a: Output As a Pivot
Table
BEGIN PROGRAM.
# <accumulate Counts and Salaries as in Example 14>
desViewer = viewer.spssapp().GetDesignatedOutput()
rowcats = []; cells = []
for cat in sorted(Counts):
rowcats.append(int(cat))
cells.append(Salaries[cat]/Counts[cat])
ptable = viewer.PivotTable("a Python table",
tabletitle="Effect of Education on Salary",
caption="Data from employee data.sav",
rowdim="Years of Education",
rowlabels=rowcats,
collabels=["Mean Salary"],
cells = cells,
tablelook="c:/data/goodlook.tlo")
ptable.insert(desViewer)
END PROGRAM.
Run
Copyright (c) SPSS Inc, 2006
Exploring OMS Dataset Output
get file='c:/spss14/cars.sav'.
DATASET NAME maindata.
DATASET DECLARE regcoef.
DATASET DECLARE regfit.
OMS /IF SUBTYPE=["coefficients"]
/DESTINATION FORMAT = sav OUTFILE=regcoef.
OMS /IF SUBTYPE=["Model Summary"]
/DESTINATION FORMAT = sav OUTFILE=regfit.
REGRESSION /DEPENDENT accel /METHOD=ENTER
weight horse year.
OMSEND.
Use OMS directly to figure out what to retrieve programmatically
Copyright (c) SPSS Inc, 2006
Example 10a: Use Bits of Output Datasets
BEGIN PROGRAM.
import spss, spssaux, spssdata
try:
coefhandle, rsqhandle, failcode =
spssaux.CreateDatasetOutput(\
"REGRESSION /DEPENDENT accel /METHOD=ENTER
weight horse year.",
subtype=["coefficients", "Model Summary"])
cursor = spssdata.Spssdata(indexes=["Var2",
"B"], dataset=coefhandle)
for case in cursor:
if case.Var2.startswith("Horsepower"):
print "The effect of horsepower on
acceleration is: ", case.B
cursor.close()
Copyright (c) SPSS Inc, 2006
Example 10a: Use Bits of Output –
Datasets (continued)
cursor =spssdata.Spssdata(indexes=["RSquare"],
dataset=rsqhandle)
row = cursor.fetchone()
print "The R Squared is: ", row.RSquare
cursor.close()
except:
print "*** Regression command failed. No
results available."
raise
spssdata.Dataset("maindata").activate()
spssdata.Dataset(coefhandle).close()
spssdata.Dataset(rsqhandle).close()
END PROGRAM.
Run
Copyright (c) SPSS Inc, 2006
What We Saw

Variable Dictionary access

Procedures selected based on variable properties

Actions based on environment

Automatic construction of transformations

Error handling

Variables that remember their formulas

Management of the SPSS Viewer

New statistical procedure

Access to case data
Copyright (c) SPSS Inc, 2006
Externally Controlling SPSS

SPSS Processor (backend) can be embedded and
controlled by Python or other processes

Build applications using SPSS functionality
invisibly

Application supplies user interface

No SPSS Viewer

Allows use of Python IDE to build programs

Pythonwin or many others
Copyright (c) SPSS Inc, 2006
PythonWin IDE Controlling SPSS
Copyright (c) SPSS Inc, 2006
What Are the Programmability
Benefits?

Extend SPSS functionality

Write more general and flexible jobs

Handle errors

React to results and metadata

Implement new features

Write simpler, clearer, more efficient code

Greater productivity

Automate repetitive tasks

Build SPSS functionality into other applications
Copyright (c) SPSS Inc, 2006
Getting Started

SPSS 14 (14.0.1 for data access and IDE)

Python (visit Python.org)



Installation
Tutorial
Many other resources

SPSS® Programming and Data Management, 3rd Edition: A Guide for SPSS®
and SAS® Users new

SPSS Developer Central
 Python Plug-In (14.0.1 version covers 14.0.2)

Python and
Plug-In
On the CD in
SPSS 15
Example modules

Dive Into Python (diveintopython.org) book or PDF

Practical Python by Magnus Lie Hetland

Python Cookbook, 2nd ed by Martelli, Ravenscroft, & Ascher
Copyright (c) SPSS Inc, 2006
Recap
 Five
power features of SPSS 14
 Examples
 How
of programmability using Python
to get started: materials and resources
Copyright (c) SPSS Inc, 2006
Questions
?
?
?
?
Copyright (c) SPSS Inc, 2006
In Closing
Working together these new features give you a
dramatically more powerful SPSS.
SPSS becomes a platform that enables you to
build your own statistical applications.
1.
Programmability
2.
Multiple datasets
3.
XML Workspace and OMS enhancements
4.
Attributes
5.
External driver application
Copyright (c) SPSS Inc, 2006
Contact
Jon Peck can now be reached at:
[email protected]
Copyright (c) SPSS Inc, 2006