Sign up for our e-mail list

Join our e-mail list to receive updates on SCSUG activities

Select list(s) to subscribe to

Presentations

Click on a title to view abstract or click here to Expand All or Collapse All.

 

10 SAS Skills for Grad Student Survival: A Grad Student 'How-To' Paper --- Elisa L. Priest

Grad students learn the basics of SAS programming in class or on their own. Classroom lessons focus on statistical procedures and the datasets are usually ready for analysis.  Students may learn basic coding such as IF-THEN statements, variable creation, and functions.  However, real world research projects are usually complex and may require a variety of different SAS tools and techniques for data exploration and analysis.  Most of these tools and techniques students have to learn on their own.

Throughout the many, many, years that I was in graduate school, I spent a lot of time learning SAS on my own.  This paper is a culmination of the SAS challenges I overcame and the SAS skills that I learned outside of the classroom.  These 10 SAS skills helped me to survive graduate school and successfully write a complex simulation analysis in SAS for my dissertation. This simulation was one of the most challenging SAS programs that I ever wrote.

This paper is the sixth in the "Grad Student How-To" series and provides graduate students with a list of 10 essential SAS skills plus references to find more detailed information on each of the topics.

A Dose of SAS to Brighten your Health Care Data Blues --- Gabriela Cantu and Christopher Klekar

The development and adoption of electronic health records (EHR) is creating exciting times in health care research.  Today, copious amounts of EHR and secondary data lie at the tips of the researcher?s fingers and robust data management techniques are essential in transforming the data to maximize its potential. Data derived from the EHR is critical for grant applications, pilot studies and is becoming the main source of data for research projects.

Data managers encounter roadblocks when dealing with health care data and SAS® tips and tricks on bulky, cumbersome datasets transform them into clean and simple ones that are analysis ready.  Dealing with longitudinal data, containing multiple observations per patient is the livelihood for researchers; however, poor restructuring and misidentification of the necessary data points can frustrate the analysis process.

PROC TRANSPOSE is often the first SAS PROC thought of when a data manager is trying to restructure data for analytics. In this paper, we will show two examples how the limits of PROC TRANSPOSE are overcome with some key maneuvers in the DATA step, the FIRST. function, and MERGE procedures.  Secondly, we will demonstrate how PROC SQL can be used to eliminate unwanted records within a longitudinal dataset before analysis. Lastly, this paper will answer the age old question of 'What is a researcher to do, when needing to look down longitudinal data and manipulate it?' Have no fear, utilization of PROC SQL and a LEFT JOIN in novel ways will save the day, and your sanity.

A Survey of Some of the Most Useful SAS® Functions --- Ron Cody

SAS Functions provide amazing power to your DATA step programming. Some of these functions are essential, some of them save you writing volumes of unnecessary code. This paper covers some of the most useful SAS functions. Some of these functions may be new to you and they will change the way you program and approach common programming tasks.

Best Time to Call Model --- Jingjing Qu

It is believed that customers of different demographic segments respond differently to telemarketing solicitations by time of day and day of week.  Making telemarketing solicitation calls to customers when they are least likely to be available can reduce contacts and sales. Best time to call models can identify best time to reach people and help reduce telemarketing hours and increase sales. Logistic regression, multinomial logistic regression, and SAS Marketing Optimization are used to develop model and assign best time slot to call a customer.

Clinical Data Dashboards for Centralized Monitoring Using SAS® and Microsoft® SharePoint® --- Jagan Mohan Achi and Randy Rak

In recent guidance on risk-based monitoring practices, the Food and Drug Administration identified options for monitoring the data quality of a clinical investigation.  Centralized monitoring is a practice in which clinical data management, statisticians, or clinical monitors1 review the data on an ongoing basis and can easily identify the accuracy, completeness, and integrity of data.  In this paper, we would like to describe the methods of developing an in-house tool using SAS® BASE (SAS® Institute, Cary, NC), GRAPH procedures (SAS® Institute, Cary, NC), and Microsoft® SharePoint® (Microsoft® Corporation, Redmond, WA) to achieve this goal.  Centralized monitoring may decrease the cost of a clinical investigation and ensure identification of problems with the trial early on by maximizing the value of electronic data capture systems.  Clinical dashboards created using SAS® BASE and GRAPH in conjunction with Microsoft® SharePoint® may provide cost savings, because most pharmaceutical/medical device companies frequently utilize these systems and no additional technology or human resources are required.

1. US Food and Drug Administration. Guidance for Industry Oversight of Clinical Investigations – A Risk-Based Approach to Monitoring (Aug 2011)

Converting Social Security Numbers to Pseudos --- Leon Kattengell
SAS 2013 Forum Presentation October 2013

It is not only a moral nor ethical issue, but a legal responsibility to all the ITs, the Programming, the SQL, and the SAS Communities to manage and handle Personal Identification Numbers with confidentiality, as sensitive data, and classified information to reduce and prevent frauds and identity theft crimes as well as violations to The Health Information Privacy Act.

PERSONAL IDENTIFICATION NUMBERS

  • Social Security
  • Credit Cards
  • Checking Accounts
  • Savings Accounts
  • Life Insurance Policies
  • Driving Licenses
  • Passports
  • Medical Records

MAIN AFFECTED PLAYERS

  • All US Citizens
  • All Human Resources
  • Credit Cards Companies
  • Financial Institutions
  • Insurance Companies
  • Hospitals
  • Health Care Clinics
  • Health Care Providers, Doctor, Dentist, Etc.
  • National Security
  • Defense Department
  • US Immigration and Naturalization
  • Social Security Administration
  • Rest of Government Agencies

SOME STATISTICS

  • 10 Million Americans Identity Theft Last Year
  • 48 Billion Lost On Cost
  • Your Chances of Being a Victim are 1 in 4
  • 84,907 HIPAA Privacy Complaints Received Since 2003

SOME RECOMMENDATIONS

  • Identify Personal Identification Data and Information
  • Conduct an Analysis of Threats and Vulnerabilities
  • Perform Risk Assessment

(Risk = Probability of Events X Impact of Consequences of Events)
Apply Appropriate Countermeasures such as, manage and handle Social Security Numbers with confidentiality, as sensitive data, and classified informaion.

Demystifying PROC SQL Join Algorithms --- Kirk Lafler

When it comes to performing PROC SQL joins, users supply the names of the tables for joining along with the join conditions, and the PROC SQL optimizer determines which of the available join algorithms to use for performing the join operation. Attendees learn about the different types of join algorithms and explore nested loop (brute-force), sort-merge, index, and hash join methods along with selected options to control processing.

Finding Duplicates with First. and Last. --- Clarence Wm. Jackson

Duplicated records can waste resources, and cause other issues. This short paper describes how to use the FIRST. and LAST. to find duplicated records.

Getting Connected - The Power of Using Hash Objects --- Philip Burtch

The Hash Object and the Hash Iterator Object¹, two data step component objects, offer a simple and powerful solution for many complex tasks. This paper will describe the hash object and show how to merge a number of data sets using hash objects, base SAS, and object dot notation. It will also illustrate how the program data vector (PDV) enables the interaction of data set variables and data variables within the objects.  The hash object is an in-memory solution allowing direct access via keys during program execution. It is fast and can contain an unlimited number of key items and data items. The only limitation is available memory. As memory size increases and the constraints of available memory lessen, the usefulness of hash objects will only increase.

Hypertension Management in Hypertensive Adults with Diabetes in Primary Care setting --- Rane Pratik P. MBA., Bhansali Archita, Sawant Ruta, Dr. Rajender R. Aparasu, MPharm, PhD, FAPhA

Objective: This study examined hypertension management in primary care setting, in a nationally representative sample of hypertensive diabetic adults.

Methods: A retrospective cross-sectional study was conducted using the 2010 National Ambulatory Medical Care Survey (NAMCS) to examine hypertension management by the Internists, family practitioner and general medicine practitioner in hypertensive diabetic adults (>20 years). Pharmacotherapy was defined as prescription of the first line treatment (Angiotensin-converting enzyme inhibitors (ACEI) or Angiotensin receptor blockers (ARBs)). Multivariate logistic regression analysis was performed to evaluate blood pressure management (<130/80mm Hg). Survey procedures like proc surveryfreq and proc surveylogistic were used in SAS 9.3 software to adjust for the complex survey design of NAMCS.

Results: An estimated 21 million patient visits were made by hypertensive diabetic adults in 2010. Blood pressure was controlled in 44.78% of the patient visits by hypertensive diabetics. Nearly 32.95% of these visits involved diet/nutrition counseling, 22.34% involved exercise-counseling and 19.85% involved weight reduction counseling. In 50.5% of the visits ACE inhibitors or ARBs were prescribed. Patient visiting an Internist had 83.8% (p<0.0471) higher likelihood of controlled blood pressure as compared to other primary physicians. Age, gender, race/ethnicity, health insurance statuses were not associated with hypertension control in office visits.

Conclusion: The study found that hypertension management was better by Internists than FP/GMP in hypertensive diabetic patients. A targeted effort should be made to improve hypertension management in diabetic patients by other primary care physicians.

Introduction to SAS Hash Objects --- Chris Schacherer

Integrating data from multiple datasets is a common programming requirement, and the approach taken to fulfilling this requirement can significantly impact the execution-time of your SAS program.  SAS hash objects provide a very fast way to look up values from another dataset within the context of a DATA step, reducing the time spent sorting, merging, and creating iterative datasets. The current work provides an overview of SAS hash objects and examples of their use.

Leveraging Metadata with the SAS Macro Language --- Keith G. Curtis

Do you spend too much time typing variable names in your programs?  Do you want SAS to write this code for you?  By leveraging metadata from Proc Contents in tandem with Call Symputx, the need to manually enter variables is eliminated.

Macro utility to compare multiple SAS datasets --- Krish Krishnan

This utility can be used anywhere there is a need to use PROC COMARE. This macro program will compare one or several list of PERMANENT SAS datasets.  Macro code is setup to accept user provided list of permanent SAS datasets to compare. SAS ODS is used to generate a customized RTF report with traffic-lighting summarizing in one line all the comparison information for each compare. This report identifies 16 comparison errors that get generated out of the PROC COMPARE and in addition the utility will also flag if production dataset date is after QC dataset date. If needed, the variable ordering (VARNUM) between the base and compare dataset will be compared (option is available within the macro). Also when comparing, if any specific characters (e.g. blank space) need to be ignored, the utility will ignore a specific character or a list of characters. Reason the characters may need to be ignored is if a table or listing output is validated by creating a dataset image of those outputs, production and QC programmer for example may not have generated all the indenting spaces needed for the output exactly the same way. Validation process could focus just on the values and not on the spaces. Final report generated will indicate in one line for each comparison, whether the compare passes (in green) or fails (in red). If compare fails, the report will identify in red the reason(s) why the compare failed (all in one line)?

One problem - Multiple solutions; various ways of removing duplicates from dataset using SAS® --- Jaya Dhillon

In real world, we seldom come across data which is in ready to use format. So, Data cleaning is one of the most important aspect for an analyst and one of the major problem faced is removing duplicate records. For instance, we may want to have all unique Employee ID in our dataset or we may want only unique transactions from a list of bank transactions. This paper will cover various options available for treating duplicates in different ways. Usage of options such as NODUPKEY in combination with DUPOUT, how NOUNIQUEKEY, NODUPRECS and NODUPKEY are different from each other, will be some of the questions answered here.

Practical Array Usage with String Search --- Charles Basham

This presentation will demonstrate practicle application of single dimension array usage, string searching array contents iteratively, and building macro vars and sql stmts from arrays. Procedures  Transpose, Data Step, and SQL will be utilized.

Preparing Interaction Variables for Logistic Regression --- Bruce Lund

Interactions between two (or more) variables often add predictive power to a binary logistic regression model beyond what the original variables offer alone. In the simplest case, if X1 and X2 are zero-one valued variables, then their interaction variable is X1_X2 = X1*X2. However, X1_X2, in combination with X1 and X2, uses 3 degrees of freedom. A nominal variable XC with four values can be defined from X1 and X2 with values XC = compress(X1 || X2). Perhaps a collapsing of the four levels of XC to three values (having 2 d.f.) would provide nearly as much predictive power as the saturated model X1, X2, X1_X2 while providing more predictive power than X1, X2 alone. In this paper this question is answered for interactions of nominal or numeric X1 and X2, each with 2 or more levels. First, the user creates XC. Then a best-collapse algorithm optimally collapses the levels of XC until a stopping point is reached that provides a trade-off between degrees of freedom and predictive power. All data processing was performed using Base SAS®.

SAS Macros - Advanced Topics --- Dari Mazloom

A macro is an efficient method to accomplish the following tasks:

  • Modularize a program into smaller and more manageable pieces.
  • Encapsulate repetitive tasks into re-usable code.

The presentation will show how to:

  • Use EXECUTE routine to call a macro within a DATA step.
  • Use a macro to read a SAS dataset using Open, ATTRN, CLOSE functions.
  • Use a macro to read a subset of a SAS dataset.
  • Create macro variables within a macro.
  • Default and Override macro parameters.
SAS Macros - An Introduction --- Dari Mazloom

A macro is an efficient method to accomplish the following tasks:

  • Modularize a program into smaller and more manageable pieces.
  • Encapsulate repetitive tasks into re-usable code.

The presentation will show how a macro is created and called.

  • It will explain the different ways of passing parameters into a macro.
  • It will demonstrate the difference between Keyword parameters and positional parameters.
Seamless Dynamic Web (and Smart Device!) Reporting with SAS® --- Donald L. Penix, Jr.

The SAS® Business Intelligence platform provides a wide variety of reporting interfaces and capabilities through a suite of bundled components. SAS® Enterprise Guide®, SAS® Web Report Studio, SAS® Add-In for Microsoft Office, and SAS® Information Delivery Portal all provide a means to help organizations create and deliver sophisticated analysis to their information consumers . However businesses often struggle with the ability to easily and efficiently create and deploy these reports to the web and smart devices. If it is done, it is usually at the expense of giving up dynamic ad-hoc reporting capabilities in return for static output or possibly limited parameter-driven customization.

The obstacles facing organizations that prevent them from delivering robust ad-hoc reporting capabilities on the web are numerous. More often than not, it is due to the lack of IT resources and/or project budget. Other failures may be attributed to breakdowns during the reporting requirements development process. If the business unit(s) and the developers cannot come to a consensus on report layout, critical calculations, or even what specific data points should make up the report, projects will often come to a grinding halt.

This paper will discuss a solution that enables organizations to quickly and efficiently produce SAS reports on the web and your mobile device – in less than 10 minutes! It will also show that by providing self-service functionality to the end users, most of the reporting requirement development process can be eliminated, thus accelerating production-ready reports and reducing overall maintenance costs of the application. Finally, this paper will also explore how the other tools on the SAS Business Intelligence platform can be leveraged within an organization.

Shortcuts to Save Time While Working with SAS --- Ruth Marisol Rivera Barragan

Have you asked to yourself things like these:

  • How can I review the log faster in an interactive session?
  • Is there a button that can do it for me?
  • Can I clear the log and output windows with a shortcut?

If you answered Yes to at least one of these questions, continue reading.

This paper will give you a list of Shortcuts that will make your experience working with SAS faster, for example: how to customize buttons to check your log interactively; how to make your log and output windows clear before you run your program, just by hitting an F key; how to find an specific value on a dataset, without to scroll down for all the variables available, just by using the correct operator in the where expression, and some other interesting topics that really will help you to save time!

Strategies and Techniques for Debugging SAS® Program Errors and Warnings --- Kirk Lafler

As a SAS® user, you've probably experienced first-hand more than your share of program code bugs, and realize that debugging SAS program errors and warnings can, at times, be a daunting task.  This presentation explores the world of SAS errors and warnings, provides important information about syntax errors, input and output data sources, system-related default specifications, and logic scenarios specified in program code. Attendees learn how to apply effective techniques to better understand, identify, and fix errors and warnings, enabling program code to work as intended.

Structural Time Series Models with Time-Varying Effects - Short-Term Demand Forecasts for Retail Utility Industry --- Fujiang Wen

Short-term demand with retail utility industry could be impacted by numerous internal and external factors, therefore forecasts for the short-term demand is much more challenging than for long-term demand.  While the utilities may not be able to control these factors, they could be capable to better understand and predict the demand trends and patterns, and thus optimize their business operation.  This presentation illustrates a structural time series model, Unobserved Component Model (UCM), used for short-term demand forecasts with daily consumption in an urban water retail system.  The model was utilized to quantify the effects of different components on daily consumption, such as, trend component, seasonal component, cyclical component, and external regression effects.  The model provided useful tools to quantify regression effects with nonlinear relationship with the response series in SPLINEREG statement.  An intervention analysis was also used to estimate the impact of the restrictions on the daily demand using time-series models which include a transfer function in ARIMA models and time-varying regression effects in the RANDOMREG statement. The study indicated the UCM model is a convenient way of additively decomposing a time series into different trends and was used to evaluate whether the restrictions have been effective in reducing daily average and peak demand.

The Complexities of an Address --- Barry Mullins

Addresses can come in many forms that are messy, unorganized and not standardized. Trying to match people by their address could take days to manually match, but with SAS can be done quickly. However, without standardizing the addresses, SAS can miss matches or even give you false positives.

This paper will discuss the steps to standardizing and fuzzy matching addresses. I will show examples of standardizing addresses and parsing out the pieces using PRXCHANGE and TRANWRD.  Also, I will discuss the importance of removing common words, like street, to help increase the accuracy of matching addresses to one another using COMPGED. The examples are depicted in SAS EG 4.2.

The Concepts and Practice of Analysis with SAS® Enterprise Guide® --- Chris Schacherer

SAS Enterprise Guide continues to gain acceptance among SAS programmers as an enterprise solution for a wide range of reporting and analytic applications. However, in order for organizations to realize maximum benefit from their investment in Enterprise Guide, subject-matter experts and a new generation of analysts who have not been trained as SAS programmers need to be trained in the use of this tool. The current work provides a framework for this training by explaining the relationship between Enterprise Guide and traditional SAS programming, introducing the basic Enterprise Guide concepts necessary to function in this environment, and presenting examples of common tasks that will help these users become immediately productive.

The Good, The Bad, and The Ugly --- Toby Dunn and Kirk Lafler

The SAS® System has all the tools users need to read data from a variety of external sources. This has been one of the most important and powerful features since the mid-1970s. The cornerstone of this power begins with the INFILE and INPUT statements, the use of single- and double-trailing @ signs, and the ability to read data using a predictable form or pattern. This paper provides insights into the INFILE statement, the various styles of INPUT statements, and illustrates examples of how data can be read into SAS using the DATA step.

Type, Point and Click - A Practical Guide to SAS Enterprise Guide® Prompts --- Patricia Hettinger

Your SAS Enterprise Guide project is set up.  Processes have been created, linking code in the desired order.  Now if only you didn't have to keep changing your code every time your input file name changed or your date range or ?.

This paper gives an overview of the prompts available in SAS Enterprise 4.3 and ways to automate your project. We will also touch on the general nature of macro variables, and the various quoting and unquoting functions.

Using PROC STANDARD and PROC SCORE to impute missing multivariate values --- Paul Montagna

One of the most frustrating aspects of multivariate analysis is that one missing value in one column will cause the entire sample row to be deleted.  This means all the other values of the sample are lost, and one is not able to classify that sample.  There are at least two ways to deal with this problem, what I call the ?pre-analysis method? and the ?post analysis method.?  There are many approaches to impute a missing value prior to analyses so the samples are retained.  One simple method is to impute the missing value ?before? the multivariate analysis using PROC STANDARD to replace missing values based on a statistical characteristics or PROC REG to use a predictive approach.  Alternatively, one could impute the missing value ?after? the multivariate analysis using the multivariate coefficients as input to PROC SCORE, thus being able to classify the sample.  Either approach has its strengths and weaknesses.

Using SAS for Analyzing Bank's Customers' Behavior and Profitability --- Hassan Marzoughi

The objective of this paper is to discover a typical Bank's customers' behavior by using SAS software. For instance, this paper tries to answer questions concerning profitability of different channels (online banking verses using bank's branches) and services to make proper decision about customers that use these channels and services. In this regard, this paper shows how SAS can be used to build models which find the relationship between Tenure and customer satisfaction or identify the factors that impact average customer profitability including customers' age, income, and satisfaction. Also, it discusses using SAS for comparing different branches and/or different channels profitability. Revealing differences between old and new customers, regarding their profitability and the type of the services that they use, is another part of this paper. The average profit, customer profitability based on different bank districts, profitability of different channels and services, relationship between customers' age and their use of different channels, specification of customers that left the bank, customer loyalty and the churn and return rate of bank's customers are discussed in this paper. Results of such analysis can help banks' managers to make proper decision including but not limited to decisions like developing or closing a branch, advancing or reducing their services for different branches, increasing fees or using rebates and other intensive to increase a service's profitability, and keeping their old customers or acquiring new customers.

Using SAS Software to Determine the Mixed Rotterdam Demand Model Parameters and Test Statistical References --- Maryam Tabarestani

Although SAS as strong statistical software can be used to model mixed demand model, so far all studies have applied TSP and other software to analyze these types of models. This paper is demonstrating that how SAS can be comprised to estimate the parameters of a mixed demand model for the first time. A mixed demand system as a third class of demand model is an alternative demand specification in consumer behavior theory. It is introduced by Samuelson in 1965. This system is able to provide a more flexible functional form for the many agricultural commodities which are prepared and marketed in several forms. Samuelson's mixed demand model provides a theoretical basis for the consumer demand studies which applying time series data. Although the mixed demand has many features, its applied literature is rare. The Rotterdam parameterization is one of the differential approaches that introduced by Theil (1965) and Barten (1966). The Rotterdam demand model contains some special attractions based on consumer behavior theory. This specification is able to satisfy all mixed demand assumptions and restrictions. A mixed Rotterdam demand specification needs a broad knowledge of mathematics and an appropriate statistical software package which is able to process such a system and produces desired outputs according to model's theory and requirements. This paper illustrates that how SAS statements can be adopt to calculate the mixed Rotterdam demand variables, to convert model's equations to the SAS codes, to run the model to estimate parameters, and to test model and parameters? accuracy.

Using SAS to Build Customer Level Datasets for Predictive Modeling --- Scott M. Shockley

If you are using operational data to build datasets at the customer level, you?re faced with the challenge of condensing a plethora of raw transactions into a dataset that summarizes each customer, one row per customer. You will probably have to use multiple tables with different levels of granularity. Some of the data will change over time, and some of it won't. If the focus of your research is to predict events like customer defection, then time will be a major consideration and make this process even more difficult.

The goal of this paper is to guide readers through the process of transforming raw data into a dataset for predictive modeling that accurately represents a customer and all the factors that could possibly impact the outcome being predicted. The paper will use specific examples like how to calculate derived variables based on complex conditions related to time. For example, how to calculate an average billing amount over time, but only during the most recent uninterrupted period of customer tenure.

The discussion will center on the technical details, but also the business and mathematical logic behind the decisions being made.
Some of the topics covered are:

  • Using PROC MEANS to aggregate customer transactions
  • Using PROC TRANSPOSE to transform data from long to wide
  • Using character functions to summarize the results of many variables into one
  • Using do loops and arrays to calculate derived variables conditional
Using SAS® Enterprise BI for Integrated Bank Reporting --- James Beaver and Tobin Scroggins

This was presented at the 2013 Global Forum

This paper shows how base SAS, Enterprise Guide, ETS and SAS BI are used to provide a comprehensive view of bank performance.  Data is extracted from the G/L, loan, deposit and application systems, real time data is accessed to provide up-to the minute results on loan activity and system reports are read in to provide additonal information. Proc Computab is used to create financial statements, OLAP cubes are used to provide reports on bank balance sheet components and budget comparisons on non interest income and expense items by department, and dashboards are used to provide real time reports on loan originations. The reports are presented using SAS BI through the SAS data portal to provide real time, trend and historical reports on the bank?s performance.

Writing Macro Do Loops with Dates from Then to When --- Ronald J. Fehd

Dates are handled as numbers with formats in SAS(R) software. The SAS macro language is a text-handling language. Macro %do statements require integers for their start and stop values.

This article examines the issues of converting dates into integers for use in macro %do loops. Two macros are provided: A template to modify for reports and a generic calling macro function which contains a macro %do loop that can return items in the associative array of dates.

Example programs are provided which illustrate unit testing and calculations to produce reports for simple and complex date intervals.

Zero my Hero! --- Anna Vincent

Have you ever needed to have ALL the values of a variable show up in a report or a proc tabulate but one or two of the values have nothing but zeroes in them and refuse to show themselves in the final tables?

Here are some quick tips on how to get 'zero out' values to show up for proc report and proc tabulate!

Note that additional presentations, including half day seminars, stat tutorial and HOWs, are being given by SAS Presenters.

 

Share and Enjoy:
  • Facebook
  • Twitter
  • LinkedIn