2017 Sponsors

Presentations

2016 SCSUG Educational Forum

San Antonio, TX

 


Click on a title to view abstract or click here to Expand All or Collapse All.

%EventChart: A Macro to Visualize Data with Multiple Timed Events --- Peng, A.
Data with multiple events at several time points are commonly seen in clinical trials. For example, oncology clinical trial data can include the dates of randomization, first treatment, tumor response, recurrence, and off study. Event chart is one tool to visualize these timeline data with multiple events. A typical event chart plots one horizontal line for each individual subject with the x-axis as dates or time intervals from a specific event. Markers with different attributes are put on the line for different types of event. When more than one group is present, different line patterns can be used to distinguish the groups.

This paper firstly introduces some basic aspects of plotting event charts, including data structure and programming tips. Then the author presents a SAS macro %EventChart, which is capable of producing event charts with several user-friendly options. By utilizing the new STYLEATTRS statement in SGPLOT procedure in SAS 9.4, the colors and markers for different events and the line patterns for different groups can be easily specified. An application of the macro to an oncology study is provided.

An Ad Hoc Method for Computing Pseudo Effect Size for Mixed Models --- Tippey, K. & Longnecker, M
Effect size is increasingly being reported in journals across multiple domains.  Many fields that use complex experimental designs, such as psychology and engineering, must analyze those designs using mixed models.  Because of the intrinsic nature of the computation, none of the packages (R, SPSS, SAS®, STATA, etc.) provide effect sizes.  Hence, when analyzing models that use fixed and random factors, such as split-plot and repeated measures designs, an alternative method must be used to compute effect size using output from procedures capable of appropriately analyzing mixed models.  This paper presents an ad hoc method of computing effect size by working backwards from the F-values output by SAS® Proc MIXED to produce eta-squared, partial_eta-squared, Omega squared, and Cohen’s F pseudo estimates.  A SAS® program will be provided to perform the calculations.

An Analysis of the Repetitiveness of Lyrics in Predicting a Song's Popularity --- Doyle, D
In the interest of understanding whether or not there is a correlation between the repetitiveness of a song’s lyrics and its popularity, the top ten songs from the year-end Billboard Hot 100 Songs chart from 2002 to 2015 were collect. These songs then had their lyrics assessed to determine the count of the top ten words used. These words counts were then used to predict the number of weeks the song was on the chart. The prediction model was analyzed to determine the quality of the model and if word count is a significant predictor of a song’s popularity. To investigate if song lyrics are becoming more simplistic over time there were several tests completed in order to see if the average word counts have been changing over the years. All analysis was completed in SAS® using various PROCs

.

An Introduction to SAS® Hash Programming Techniques --- Lafler, K
Beginning in Version 9, SAS software supports a DATA step programming technique known as hash that enables faster table lookup, search, merge/join, and sort operations. This presentation introduces what a hash object is, how it works, and the syntax required.  Essential programming techniques are illustrated to define a simple key, sort data, search memory-resident data using a simple key, match-merge (or join) two data sets, handle and resolve collision scenarios where two distinct pieces of data have the same hash value, as well as more complex programming techniques that use a composite key to search for multiple values.

Analysis of degradation patterns in the clinical efficacies of anti-microbial drug usage --- Arya, S. & Yallabandi, B
It's scary how bacteria and other pathogens have evolved over time and increased their resistances to drugs and other medical advancements. Studies have revealed that the harmful effects of antimicrobial drug resistance and the patients’ susceptibility to infection is growing at an alarming rate. It’s projected that by 2050 this trend could not only lead to the reduction of GDP by 2 to 3.5% every year but also lead to death of over 10 million people. According to Centers for Disease Control and Prevention (CDC), Antibiotic resistance infections in U.S alone costs an excess of $20 billion in health care and $35 billion societal costs every year. We have extracted sample data for 5 prevailing anti-microbial drugs from the Cerner database, that consists of data related to the health of a population both broadly and down to the level of an individual. It has comprehensive medical records of over 50 million patients that are periodically refreshed with latest information. Our variables consist of information related to patient encounters, medication details and anti-microbial drug usage and their susceptibility results which will be used in determining the effectiveness of drug prescription. We plan to analyse how success rate of each drug against patient encounters has changed over the past 10 years and then suggest the appropriate drug replacement measures such as Generic Substitution, as a prospective cost management opportunity which reduces the direct costs incurred in healthcare industry.

Anatomy of a Merge Gone Wrong --- Horstman, J.
The merge is one of the SAS programmer’s most commonly used tools. However, it can be fraught with pitfalls to the unwary user. In this paper, we look under the hood of the data step and examine how the program data vector works. We see what's really happening when datasets are merged and how to avoid subtle problems.

Bet on the Underdogs --- Chakraborty, A. & Sharma, J.
Ultimate Fighting Championship (UFC) is an incredibly entertaining and technical sport to watch. It is one of the fastest growing sports in the world. Sun Tzu famously said "Every battle is won before it is ever fought". Everyone in the UFC has talent; everyone in the UFC can win. But not everyone will win and hence a question arises: What determines a player’s victory or loss? Someone who is more prepared to win usually does win, but how do we define this preparedness. Are fighters adequately prepared if they are physically fit, or because of their training regimen, maybe because of psychological advantage, or their experience or age perhaps? There’s no end to these questions, thus our study concentrates on solving the enigma of these questions. Using Data analytics, we can quantify preparedness and measure differences in preparedness of UFC players to further predict a player’s chances of victory or loss.

Some factors that might influence the chances of Player’s win or loss are:

  • Body specifications like height, weight, age till match etc.
  • Dominance, psychological advantage, winning streak etc.
  • Proven track records of winning, experience in terms of number of matches played.
  • Audience support, training regime, affiliation to club etc.

Implications of the project are:

  • Knowing the odds of winning for any fighter would be beneficial for the sponsors to sponsor the best fighter.
  • Our study can help in the world of betting by facilitating bettors to make an educated bet with higher odds of winning.

Beyond IF THEN ELSE: Techniques for Conditional Execution of SAS Code --- Horstman, J.
Nearly every SAS program includes logic that causes certain code to be executed only when specific conditions are met. This is commonly done using the IF THEN ELSE syntax. In this paper, we will explore various ways to construct conditional SAS logic, including some that may provide advantages over the IF statement. Topics will include the SELECT statement, the IFC and IFN functions, the WHICH and CHOOSE function families, as well as some more esoteric methods, and we’ll make sure we understand the difference between a regular IF and the %IF statement in the macro language.

Big Data Epistemology  --- Davis, C. & Dykman, C.
The Greek Philosopher, Plato, is remembered as a rationalist.  He viewed all data collected through the senses as flawed.  He saw the world consisting of imperfect physical representations of abstract perfect theoretical forms.  For him, collecting and analyzing data was less important than understanding the ideal forms and the relationships between them.  Plato viewed decision-making as essentially model building from intuition reinforced by observation.    Aristotle, on the other hand, held an empiricist’s view of decision-making.  He actually called it, “posterior analytics”, by which he meant collecting data first and then analyzing based upon logic.  This implies a view of decision-making that emphasizes data collection and analytics.  In modern terms, Plato would be more humanistic in his decision-making and Aristotle’s would be more mechanistic.  It might be argued that the modern emphasis on big data in corporate decision-making reflects a fundamental philosophical shift from an essentially Platonic view of decision-making to an essentially Aristotelian one.  As businesses embrace this transformation, how prepared are we to fully accept its implications?  If a more mechanistic decision-making becomes the norm, then there is no reason not to fully automate it; and there are plenty of economic incentives to do just that.  However, if management accepts this, what is the role of human intuition and judgement in a world dominated by mechanistic decision-making?

Building a Better Dashboard Using SAS® Base Software --- Lafler, K, Horstman, J. & Muller, R.
Organizations around the world develop business intelligence dashboards to display the current status of “point-in-time” metrics and key performance indicators. Effectively designed dashboards often extract real-time data from multiple sources for the purpose of highlighting important information, numbers, tables, statistics, metrics, and other content on a single screen. This presentation introduces basic rules for “good” dashboard design and the metrics frequently used in dashboards, to build a simple drill-down dashboard using the DATA step, PROC FORMAT, PROC PRINT, PROC MEANS, ODS, ODS Statistical Graphics, PROC SGPLOT and PROC SGPANEL in SAS® Base software.

Categorization of Fitbit's Customer Complaints on Twitter --- Bhoir, S. & Arora, J
All companies are trying to be more customer centric and  implementing new measures to enhance the consumer experience. One such measure, recently implemented by many companies is the “Social Media Customer Service”. According to J.D. Power, 67% of consumers have used a company’s social media support page for customer service issue.  It has been reported that consumers can expect a reply within a couple of minutes from the support team regarding their issue.

One such company is Fitbit, which has grown quite popular in the recent days has encouraged its customers to generate a buzz in social media by expressing their reviews, discussing the new product launch and utility of Fitbit in customer’s day-to-day life. However, on the flip side, Fitbit’s twitter support page is flooded with issues that consumers are facing while using their products. There is at least one tweet (@FitbitSupport) every minute by an user or by the support team responding to an user’s complaint.. The primary objective of this research is to categorize these complaints and figure out the major issues such as whether it's related to activity tracking or, design or, tech specs or, application interactivity and so on. Because the tweets are model specific, we will compare if the issues are resolved between two generations of the product.

Create excel reports using SAS ODS --- Dari Mazloom
 The SAS Output Delivery System(ODS) provides a set of techniques to create and format Excel reports directly from your SAS code. This presentation will explain the following steps that are required to create Excel reports using SAS ODS and Base SAS:

  1. Using Excel formatting options available thru ODS, multiple Excel worksheets can be created inside a single Excel workbook.
  2. The output from SAS ODS is an XML file which opens into an Excel workbook.
  3. All of the above steps are accomplished in your SAS code using ODS Excel XP Tagsets.

Custom Excel Toolbox --- McCarthy, M.
A stored process which join two user selected Excel tables is presented.

Determining the functionality of Water Pumps in Tanzania Using SAS® EM and VA --- Dowdavarapu, I. & Damodaran, V. (OSU)
Accessibility to clean and hygienic drinking water is a basic luxury every human being deserves. In Tanzania, there are 23 million people who do not have access to safe water and are forced to walk miles in order to fetch Water for daily needs. The prevailing problem is more of a result of poor maintenance and inefficient functioning of existing infrastructure such as hand pumps.  To solve the current water crisis and ensure accessibility to safe water, there is a need to locate non-functional and functional pumps that need repair so that they can be repaired or replaced. It is highly cost ineffective and impractical to manually inspect the functionality of over 74,251 water points in a country like Tanzania. The objective of this study is to build a model to predict which pumps are functional, which needs some repair and which don’t work at all by using the data from the Tanzania Ministry of Water.  After pre-processing, the final data consists of 39 variables and 74,251 observations. We used SAS Bridge for ESRI and SAS VA to illustrate spatial variation of functional water points at regional level of Tanzania along with other socio economic variables. Among decision tree, neural network, logistic regression and HP random forest models, random forest model was found the best model. The classification of water pumps using the champion model will expedite maintenance operations of water points that will ensure clean and accessible water across Tanzania in low cost and short period of time.

Differentiate Yourself --- Lafler, K.
Today's employment marketplace is highly competitive. As a result, all SAS® professionals must learn how to differentiate themselves by acquiring the technical skills necessary to compete, and excel, in the global marketplace. This presentation illustrates how SAS professionals can begin enhancing their skills by accessing valuable and “free” SAS-related content. With the aid of a web browser and the Internet, anyone can access published PDF “white” papers, Word documents, PowerPoint presentations, comprehensive student notes, instructor lesson plans, hands-on exercises, webinars, audios, videos, SAS Institute’s comprehensive technical support website, and more to differentiate yourself and stand out from the competition.

Downloading, Configuring, and Using the Free SAS® University Edition Software --- Lafler, R., Lafler, K., & Shipp, C.
The announcement of SAS Institute’s free SAS University Edition is an exciting development for SAS users and learners around the world!  The software bundle includes Base SAS, SAS/STAT, SAS/IML, Designer Studio (user interface), and SAS/ACCESS for Windows, with all the popular features found in the licensed SAS versions.  This is an incredible opportunity for users, statisticians, data analysts, scientists, programmers, students, and academics everywhere to use (and learn) for career opportunities and advancement.  Capabilities include data manipulation, data management, comprehensive programming language, powerful analytics, high quality graphics, world-renowned statistical analysis capabilities, and many other exciting features.

This presentation discusses and illustrates the process of downloading and configuring the SAS University Edition.  Additional topics include the process of downloading the required applications, ‘key’ configuration strategies to run the SAS University Edition on your computer, where to find the best how-to videos to learn valuable techniques, and the demonstration of a few powerful features found in this exciting software bundle.  We conclude with a summary of tips for success in downloading, configuring and using the SAS University Edition.

Effective method on curing heart disease --- Varadarajan, K. & Ayyalasomayajula, S.
Heart disease and Heart attack is rising in the United States, and it is very predominant in Oklahoma. Oklahoma had the third highest death rate for heart disease in the nation and the death toll is around 9000 in the year 2012.

There are three treatment options for heart attacks they are:

  • Life style such as smoking, good nutrition, blood pressure, stress
  • Medication
  • Surgical intervention

Our aim of this project is to identity the most effective way of curing the heart disease from the above three methods. The best treatment can be identified from the factors like minimum recovery time and which type of treatment has the minimum chance of second attack. This has to be performed for people of same gender, location of where they live, smoking habits etc.

Below are the variables that has to be considered:

  • Age
  • Gender
  • BMI
  • Blood pressure
  • Smoking habit
  • Residence

Our main source of data for heart is from CHSI. We have obtained the data with due authorization and after completing the HIPA training.

We will be analyzing the data using Time series and Auto Forecaster for future trends. Also intend to do Survival analysis to calculate readmission rate and possible factors which may reduce the rate.

Enabling news trading by automatic categorization of news articles --- Kotekal, P. & Bhaskara, V.
Traders making decisions based on news developments is nothing new. Any big market announcements of a company such as annual and quarterly earnings, dividend announcements, acquisitions, mergers, tender offers, stock splits or major management changes are known to have direct impact on the company’s stock prices and news traders have always exhibited keen interest to exploit and act on this information. However, in an age where news is novel only for few minutes it is important for traders to identify its underlying message, assess its possible impact on stock trends and take on the spot decisions before the market has had time to adjust itself to this news. Through this paper we wish to predict the direction of stock price changes immediately after the news article publication. Our aim here is to build an automatic text categorization models using SAS Content Categorisation studio and Text Rule Builder that categorize into various custom topics in any published news article that eventually indicates direction of stock price (positive, negative and neutral). To perform this, initial data set is created from scraping the news articles of Apple Inc. along with its time stamp from TheEconomist.com, NYTimes, Wall Street Journal and Reuters.com using Python web crawler. We have also collected stock price of Apple Inc. for corresponding time stamps of news article publication. Final data set consists of 968 articles with its title, Change in stock price, time stamp and news content since April 2014. We will pre-label news articles in this dataset as positive, negative and neutral based on the changes in stock prices immediately after the publication of the article. To improve decision making process of a trader, a web tool will be developed that automatically classifies the news articles and suggests the increase/decrease in stock price.
Fast Dashboards: Producing real-time dashboards using SAS® Event Stream Processing. --- Rierson, D.
This presentation will cover the various issues found when creating dashboards for the web utilizing live streaming data.  Technologies covered are SAS® Event Stream Processing, RESTful web services, D3 graphics and Freeboard, an open-source dashboard.

Getting the Most Out of Your SAS Conference --- Kuligowski, A.
.

Good Programming Practice when Working Across PC SAS and Unix SAS --- Zhao, J
Many SAS users are working across multiple platforms, commonly combining Windows and Unix environments. Often times there are occasions where SAS code works on one platform but not the another. Portability is not just working across multi-platform environments, it also means making programs easier to be used across projects, across companies, or across clients and vendors. This paper examines some good programming practices to address common issues across PC SAS and Unix SAS. They include:

  • Avoid explicitly defining file paths in libname, filename, and %include statements that requires platform specific syntax such as forward slash (in Unix) or back slash (in PC SAS).
  • Avoid the use of X commands in SAS code to execute statements on the operating system which only works on Windows but not on Unix.
  • Use appropriate SAS rounding function for numeric variables to avoid different results when dealing with 64 bit operating systems and 32 bit systems. The difference between rounding before or after calculations and derivations is discussed.
  • Develop portable SAS code to import or export Excel spreadsheets across PC SAS and Unix SAS, especially when dealing with multiple worksheets within one Excel file.
  • Use SAS Enterprise Guide to access and run PC SAS programs in Unix effectively.

Hybrid recommendation system to provide suggestions based on user reviews --- Gnanasekar, S. & Subramanian, R.
If you have ever shopped on Amazon, Pandora or Netflix, you would have probably experienced recommendation systems in action. These systems analyze the historical buying behavior of their customers and make real time recommendations to them. The back end of these systems contain data mining models that make predictions about the product relevant to you. We plan to build a similar recommender system to suggest restaurants to user. All recommender system in market suffer from inaccuracy. We intend to combine content from Yelp reviews with user’s profile, restaurant profile to improve the accuracy of the recommender system. To implement our idea, we downloaded 2.2M reviews and 591K tips by 552,000 users from the Yelp® website. The dataset for 77,000 restaurants contain information such as user profile information. Traditional systems utilize only user’s ratings to recommend new restaurants. However, the system we propose will use both user’s reviews or content and ratings to provide recommendations. The collaborative based system is modeled through clustering by aggregating a particular user with other peer users based on the ratings provided for restaurants.

Key Reasons for SAS® Data Set Size Difference by SAS® Grid Migration --- Singh, P. & Sangwan, P.
When SAS users move from traditional departmental server to Grid environment they face many challenges with new SAS® Grid Platform settings. This paper describes one of such issue as “SAS Data Set Size” difference from non-Grid SAS server to SAS® Grid Platform. This paper explains few key reasons like SAS Encoding, 32bit vs 64 bit SAS, BUFSIZE, Database Encoding etc .. which affect the data set size during executing SAS job to SAS Grid Platform. It explains how these factors impact the data size and how we can address these issues to match to original size of SAS data sets. This will help users and SAS administrators to adopt new SAS Grid platform easily and in efficient manner.

Let the CAT Out of the Bag: String Concatenation in SAS 9 --- Horstman, J.
Are you still using TRIM, LEFT, and vertical bar operators to concatenate strings? It's time to modernize and streamline that clumsy code by using the string concatenation functions introduced in SAS 9. This paper is an overview of the CAT, CATS, CATT, and CATX functions introduced in SAS 9, and the new CATQ function added in SAS 9.2. In addition to making your code more compact and readable, this family of functions also offers some new tricks for accomplishing previously cumbersome tasks.

Longitudinal Data Techniques: Looking Across Observations --- Cody, R.
.

Loss Given Defaults --- Liu, Wensui
In the credit risk, LGD (Loss Given Default) is defined as the ratio between NCO (net charge-off) and GCO (gross charge-off) that would be a fractional outcome between 0 and 1. For many practitioners, OLS (Ordinary Least Squares) regression with Gaussian distributional assumption has been the top choice to model LGD. However, it is conceptually flawed to assume Gaussian distribution for a response variable in the (0, 1) range. In this presentation, several modeling techniques for fractional outcomes with their implementations in SAS should be discussed through a data analysis exercise. The purpose is to provide a relatively comprehensive survey of how to model Loss Given Default to the SAS user community and banking practitioners.

Modeling Practices of Operational Loss Forecasting --- Liu, Wensui
Since CCAR2012, modeling approaches of credit loss estimations have become converging. However, operational loss forecasting in the stress testing still remains challenging due to the lack of established modeling practices. Two modeling approaches are widely employed in the financial industry to estimate operational losses. While the simulation-based Loss Distribution Approach (LDA) suffers from regulatory criticism due to the unstable outcome and the difficulty to establish a direct relationship with economic assumptions, the regression-based method estimates the functional relationship between historical operational losses and macro-economic indicators. In our presentation, the full spectrum of quantitative analysis commonly used in the operational loss estimation would be introduced with the focus on regression models for operational loss frequency and severity with their SAS implementations. The purpose is to provide a general guidance of operational loss modeling to interested audiences with the banking background.

Optimization of Refining Crude Distillation Process Unit using Process Simulation and Statistical Modeling Methods --- Jose Bird
In this paper we implemented a methodology to optimize the operation of a refining crude distillation unit using a combination of process simulation and statistical modeling methods.  The primary objective was to estimate a set of operating targets for column pumparound and bottoms stripping steam flows that maximize the unit profitability over a typical range of crude rate and crude quality operating conditions. The crude unit has an advanced process control application that maximizes product draw rates but does not optimize the above variables. We used process simulation to evaluate the crude distillation unit performance over a feasible range of pumparound and bottoms stripping flows as existing operating data did not provide sufficient data. Crude quality and crude feed rate were sampled randomly from actual operating data to account for their inherent process variability. To develop a robust set of operating targets that would perform well under varying market conditions, alternate market scenarios were considered where gasoline margins exceeded diesel margins and vice versa when calculating the unit profit function. Several statistical modeling methods were used to build 3D profit response surfaces as a function of the operating targets to determine the economic optimum. The estimated optimum operating targets for pumparound and bottoms stripping steam flows are being implemented as part of the refinery’s advanced process control system.

Optimize the Domestic Air Traffic System of United States using SAS Enterprise Miner --- Piyush, G., Dwivedi, A., & Agrawal, A.
About 25% flights get delayed and 1.5% flights get cancelled every year. The flights delay leads to unexpected air-traffic and leads to billions of dollars in losses to the Aviation Industry.

However, in this data-driven world with the availability of sophisticated Statistical Analysis tools and techniques, Data Mining can be an answer to this quandary. By using the data of flight status between the period of September 2015 to March 2016 from the open data center of US Bureau of Transport Statistics, we can find the various factors that leads the flights to delay and help the Aviation industry with actionable insights to optimize the air traffic system.

Using Data Mining tools like SAS Enterprise Miner and SAS Visual Analytics, we can convert this idea into action. The various factors that may affect the flights to delay may include Weather delay, Security Delay, Type of air carrier, Distance between the origin and destination and many more.

This is an attempt to help the Aviation Industry with a sophisticated model and make the air travel better.

Panel Discussion: How to get involved with Regional SAS Users Groups (RUGs) and SAS Global Forum --- Kuligowski, A., Mendez, L., Alonzo, A., Jackson, C., Gengo, G
.

Parsing Useful Data Out of Unusual Formats Using SAS® --- Kuligowski, A.
Most “Introduction to Programming” courses will include a section on reading external data; the first assumption they make will be that the data are stored in some sort of documented and consistent format.  Fortunately, in the “real world”, a lot of the data we deal with has the same basic assumption of occurring in a documented, consistent format – a lot of it, but not all of it.

This presentation will address some techniques that can be used when we are not dealing with cleanly formatted data, when the data we want is in a less-than-ideal format, perhaps intermingled or seemingly buried with unnecessary clutter.  It will discuss the principles of using SAS® to parse a file to extract useful data from a normally unusable source.  This will be accomplished by citing examples of unusual data sources and the SAS Code used to parse it.

Point-and-Click Programming Using SAS® Enterprise  Guide® --- Lafler, R., Lafler, K., & Shapiro, M.
SAS® Enterprise Guide® (EG) empowers organizations with all the capabilities that SAS has to offer. Programmers, business analysts, statisticians and end-users have a powerful graphical user interface (GUI) with built-in wizards to perform reporting and analytical tasks, access to multi-platform enterprise data sources, deliver data and results to a variety of mediums and outlets, construct data manipulations without the need to learn complex coding constructs, and support data management and documentation requirements.  Attendees learn how to use the GUI to access tab-delimited and Excel input files; subset and summarize data; join two or more tables together; flexibly export results to HTML, PDF and Excel; and visually manage projects using flowcharts and diagrams.

PRESTO! CHANGEO! Making Data Transformations Look Like Magic --- Hettinger, P.
If you need to do data transformations such as from numeric to character, character to numeric, etc., here is a discussion of some useful SAS functions.  The paper will also address automation methods including table driven transformations.

Proc SQL Self-Join for Aggregation --- Fleming, S.
Have you ever needed to aggregate data within a table up to a certain record? For example, count how many payments a person has received prior to the current payment record. Using a PROC SQL self-join can accomplish this task in a single step. This approach will be compared to Data Step code, and advantages and disadvantages of each approach will be discussed.

Regression Analysis of the Levels of Chlorine in the Public Water Supply in Orange County, FL --- Doyle, D.
 Public water supplies contain disease-causing microorganisms in the water or transport ducts. In order to kill off these pathogens, a disinfectant, such as chlorine, is added to the water. Chlorine is the most widely used disinfectant in all U.S. water treatment facilities. Chlorine is known to be one of the most powerful disinfectants to restrict harmful pathogens from reaching the consumer. This paper will analyze a particular set of water samples randomly collected from locations in Orange County, Florida. Thirty water samples were collected and had their chlorine level, temperature, and pH recorded. A linear regression analysis was performed on the data collected with several qualitative and quantitative variables. Water storage time, temperature, time of day, location, pH, and dissolved oxygen level were designated as the independent variables collected from each water sample. All data collected was analyzed through various Statistical Analysis System (SAS®) procedures. A partial residual plot was used for each variable to determine possible relationships between the chlorine level and the independent variables. Stepwise selection was used to eliminate possible insignificant predictors. From there, several possible models for the data were selected. F tests were conducted to determine which of the models appears to be the most useful. There was an analysis of the residual plot, jackknife residuals, leverage values, Cook's D, PRESS statistic, and normal probability plot of the residuals. Possible outliers were investigated and the critical values for flagged observations were stated along with what problems the flagged values indicate.

SAS Macros - Advanced Topics --- Dari Mazloom
A macro is an efficient method to accomplish the following tasks:

  1. Modularize a program into smaller and more manageable sections.
  2. Encapsulate repetitive tasks into re-usable code.
  3. Modularization can help the code to be tested independently.

This presentation will cover the following subjects: Retrieving the value of a macro variable; Handling special characters; Using the EXECUTE statement to invoke a macro; Using %RETURN to return a variable from a macro.

SAS® Enterprise Guide for LASR Server --- Singh, P. & Sangwan, P.
It would be very helpful for SAS® Enterprise Guide users if they can upload the data to LASR server (SAS® Visual Analytics) directly from process flow. To achieve this functionality user needs to be given proper metadata access to load the table. For broader user, it becomes easy to manage the access, if we can use Access Control Template (ACT) in management console. ACT can be created with appropriate privileges and applied to the specific groups. This paper explains, how SAS Administrator can manage SAS® Visual Analytics environment for their SAS® Enterprise Guide users, so that they can load the data to LASR and can run SAS program.

SASGSUB for Job Workflow and Log Files --- Singh, P. & Sangwan, P.
 SAS® Grid Manager Client Utility (SASGSUB) is one of the key clients for users to use SAS® Grid platform. SASGSUB is standard utility and may not fulfill every use case/ business need of SAS users. This paper describes few techniques to enhance SASGSUB capabilities with the help of shell scripting. It explains how to achieve the business needs like creating Job workflows by running SAS jobs after finishing a defined set of running jobs, copy SAS log files from GRIDWORK to specific location without deletion from default location (deletion of GRIDWORK files lead to SASGSUB fail to check the job's status) etc.

Sentiment Analysis of Opinions about Self-driving cars --- Kawitkar, N. & Deshpande, S
Self-driving cars are no longer a futuristic dream. In recent past, Google launched a prototype of the self-driving car while Apple is also developing its own self-driving car.Companies like Tesla have just introduced an Auto Pilot version in their newer version of electric cars which have created quite a buzz in the car market.

This technology is said to enable aging or disable people to drive around without being dependent on anyone while also might affecting the accident rate due to human error. But many people are still skeptical about the idea of self-driving cars and that’s our area of interest.

In this project, we plan to do sentiment analysis on thoughts voiced by people on the Internet about self-driving cars.  We have obtained the data from http://www.crowdflower.com/data-for-everyone which contain these reviews about the self-driving cars. Our dataset contains 7,156 observations and 9 variables. We plan to do descriptive analysis of the reviews to identify key topics and then use supervised sentiment analysis. We also plan to track and report at how the topics and the sentiments change over time.

Spawning SAS Sleeper Cells and Calling Them into Action: Implementing Distributed Parallel Processing in SAS Studio SAS University Edition to Maximize Performance --- Hughes, T.
With the 2014 launch of the SAS® University Edition, the reach of SAS was expanded to researchers, educators, and students who could for the first time ever utilize a full version of SAS for free. As a student at the time (with no other access to SAS), this release personally allowed me to continue to explore SAS software and, as a direct result, I published eight white papers in 2014, each of which demonstrated code written in the SAS University Edition. However, because the SAS University Edition allows a maximum of two CPUs, multithreaded and parallel processing is enabled but significantly curtailed. Thus, performance of SAS software in the SAS University Edition typically is degraded when compared to the SAS windowing environment or SAS Enterprise Guide when each application is run on the same computer. And, while parallel processing and distributed computing solutions are becoming more common topics at SAS user conferences, no published solution functions within SAS Studio. This text breaks that barrier, introducing and demonstrating parallel and distributed methods that can harness one or multiple computers to substantially increase the performance of the SAS University Edition.

Strategies and Techniques for Getting the Most from Your Antivirus Software for SAS® Users --- Lafler, R. & Lafler, K
Malware, sometimes referred to as malicious software, represent software threats engineered to damage computer systems without the knowledge of the owner using the system.  SAS® users are increasingly becoming more prone to malware attacks and need to have a set of guidelines to help them get the most out of their antivirus software.  This presentation highlights the many different types of computer threats, classification approaches, detection strategies, and removal methods. Attendees learn what malware is; the types of malware including viruses, Trojans, rootkits, zombies, worms, spyware, adware, scareware, spam email, and denial of service (DOS) attacks; password protection and management strategies; software to detect and protect computer systems; detecting and protecting from software vulnerabilities; techniques for the removal of malicious software; and strategies for protecting your computer and data assets.

Take a SPA Day with the SAS® Performance Assessment (SPA): Baselining Software Performance across Diverse Environments To Elucidate Performance Placement and Performance Drivers --- Hughes, T.
 Software performance often is measured through program execution time with higher performing software executing more rapidly than lower performing software. SAS FULLSTIMER performance metrics moreover differentiate execution time into realtime, user CPU time, and system CPU time, providing a more comprehensive view of performance. Intrinsic factors affecting software performance include efficient coding, use of best practices, and data being processed. Extrinsic factors affecting performance can include SAS application options, configuration settings, third-party software, network, hardware, and other infrastructure. Thus, by stabilizing intrinsic factors and executing the same SAS software on the same SAS data sets and comparing FULLSTIMER metrics across execution environments, the role of extrinsic factors affecting SAS performance can be better identified and isolated. The SAS Performance Assessment (SPA) project introduced at SAS Global Forum 2016 and described at http://www.datmesis.com/spa.html?examined FULLSTIMER performance metrics, SAS system options and configuration, and user descriptions of SAS environments from dozens of users in various organizations. By executing standardized code that created and operated on standardized data sets, performance differences between organizations were identified and correlated with environmental factors. The tests were designed to complete in 15 hours, ensuring code could be executed at the end of the day and have completed by the following morning, decreasing impact on actual workload.

Text mining and sentiment analysis on video game user reviews using SAS® Enterprise Miner® and SAS® Sentiment Analysis Studio --- Batti, N. & Singh, B
Digital gaming has a history of more than 50 years. The industry started in the late 1960's when the game titles such as Pong, Centipede and Odyssey were introduced to consumer markets. Digital gaming is now a wide spread phenomenon and at least 70% of the US and Europe households say that they play video games using different consoles such as PC, Xbox, PS4, Nintendo etc. It is reported that in 2011, the total revenue of the industry amounted to about 17 billion USD. Each game is reviewed and rated on the internet by users who played the game and the reviews are often contrasting based on the sentiments expressed by the user.  Analyzing those reviews and ratings to describe the positive and negative factors of a specific game could help consumers make a more informed decision about the game.

In this paper, we will analyze 10,000 reviews and ratings on a scale (1-10) of 200 games culled from two sites: metacritic.com and gamespot.com. We will then build a predictive models to classify the reviews into positive, negative and mixed based on the sentiments of users and develop a score which defines the overall performance of the game so that users get all the required information about a game before purchasing a copy.

Time Saver for All: A Macro Toolbox --- Jou, P.
Macro functions are a very useful tool to any SAS Programmer. They enable a user to easily repeat code constantly used. In order to make the lives of myself and fellow Baylor colleagues easier, I developed a series of Macro functions. These are commonly performed tasks in our day-to-day operations. The macros I have created include attaining the number of observations within a dataset, checking the existence of a directory or variable, and suppressing output to the log to name a few. All of these save time and effort and anyone should be able to use these macros out of the box.

Link to Macro Toolbox: https://communities.sas.com/t5/SUGIR-Community/SAS-Macro-Toolbox/gpm-p/257166#M93

Using Fuzzy Logic to Match a Street Address --- McCarthy, M.
 This paper demonstrates the use of "Fuzzy" Logic to search an address field in a Base SAS data set. The search macro presented, called "Fuzzy", relies on a user selected threshold to match words in an address field. Users interact with the macro as a stored process accessible within the Microsoft Add-In tab.

Using INFILE and INPUT Statements to Introduce External Data into the SAS® System  --- Kuligowski, A
The SAS® System has numerous capabilities to store, analyze, report, and present data. However, those features are useless unless that data is stored in, or can be accessed by, the SAS System.  This presentation is designed to review the INFILE and INPUT statements.  It has been set up as a series of examples, each building on the other, rather than a mere recitation of the options as documented in the manual.  These examples will include various data sources, including DATALINES, sequential files, and CSV files. 

What’s Hot – Skills for SAS® Professionals --- Lafler, K.
.

Working with real-time data interactively using SAS® Event Stream Processing and SAS® Visual Statistics --- Rierson, D.
SAS® Event Stream Processing captures the real-time value of data before it is lost. SAS® Visual Statistics provides an easy-to-use interface to explore the relationships found in data.  By combining the in-memory capabilities of these two tools, the time lag between analysis and action is significantly reduced. This presentation will show how to model and surface important events in real time live data.  Also, a basic architecture will be covered.

http://www.sas.com/en_us/software/data-management/event-stream-processing.html

http://www.sas.com/en_us/software/analytics/visual-statistics.html

Your Local Fire Engine Has an Apparatus Inventory Sheet and So Should Your Software: Automatically Generating Software Use and Reuse Libraries and Catalogs from Standardized SAS® Code --- Hughes, T.
Fire and rescue apparatuses are required to maintain inventory sheets that describe the specific tools, devices, and other equipment located on every apparatus. From the location of every portable fire extinguisher, to the make, model, and location of every power tool, inventory sheets ensure that firefighters and other rescue personnel know exactly where to find equipment when necessary, when restocking an apparatus, or when auditing an apparatus’ inventory. At the departmental level, inventory sheets also can facilitate immediate identification of equipment in the event of a recall or need to upgrade to newer equipment. Software also should be controlled within a production environment, first and foremost to help organize code so it can be readily located when needed. However, when code is reused throughout an organization, a reuse library and reuse catalog also should be established to demonstrate where code reuse occurs both to ensure that reuse is using the most recent, tested, validated version, as well as to ensure that potential fallout from code modification can be avoided by understanding what software products reuse specific code modules (including programs and macros.) This text introduces Python code that automatically parses a directory structure, locates and parses all SAS program files (including those created or linked inside SAS Enterprise Guide project files), and automatically builds an HTML-linked reuse library and reuse catalog that depicts where specific SAS programs and macros are defined and reused. Because this is the first publication of software that can automatically examine contents of SAS Enterprise Guide project files, a separate section identifies other noteworthy uses of this software beyond the construction of code reuse libraries and catalogs.


SAS PRESENTERS

Bringing the US Department of Defense from PC to the Enterprise! (SAS) --- Stevens, E.
A United States Department of Defense agency with over USD 40 billion in sales and revenue, 25 thousand employees, and 5.3 million parts to source, partnered with SAS® to turn their disparate PC-based analytic environment into a modern SAS® Grid Computing server-based architecture. This presentation discusses the challenges of under-powered desktops, data sprawl, outdated software, difficult upgrades, and inefficient compute processing and the solution crafted to enable the agency to run as the Fortune 50 company that its balance sheet (and our nation’s security) demand. In the modern architecture, rolling upgrades, high availability, centralized data set storage, and improved performance enable improved forecasting getting our troops the supplies they need, when and where they need them.

Find What You Are Looking For And More in SAS® Enterprise Guide® (SAS) --- Smith, C.
Are you looking to track changes to your SAS® programs? Do you wish you could easily find errors, warnings, and notes in your SAS logs? Looking for a convenient way to find point-and-click tasks? Want to search your SAS® Enterprise Guide® project? How about a point-and-click way to view SAS system options and SAS macro variables? Or perhaps you want to upload data to the SAS® LASR™ Analytics Server, view SAS® Visual Analytics reports, or run SAS® Studio tasks, all from within SAS® Enterprise Guide®? You can find these capabilities and more in SAS® Enterprise Guide®. Knowing what tools are at your disposal and how to use them will put you a step ahead of the rest. Come learn about some of the newer features in SAS® Enterprise Guide® 7.1 and how you can leverage them in your work.

Highly Customized Graphs Using ODS Graphics (SAS) --- Kuhfeld, W.
You can use annotation, modify templates, and change dynamic variables to customize graphs in SAS®. Standard graph customization methods include template modification (which most people use to modify graphs that analytical procedures produce) and SG annotation (which most people use to modify graphs that procedures such as PROC SGPLOT produce). However, you can also use SG annotation to modify graphs that analytical procedures produce. You begin by using an analytical procedure, ODS Graphics, and the ODS OUTPUT statement to capture the data that go into the graph. You use the ODS document to capture the values that the procedure sets for the dynamic variables, which control many of the details of how the graph is created. You can modify the values of the dynamic variables, and you can modify graph and style templates. Then you can use PROC SGRENDER along with the ODS output data set, the captured or modified dynamic variables, the modified templates, and SG annotation to create highly customized graphs. This paper shows you how and provides examples.

Show Me the Money! Text Analytics for Decision-Making in Government Spending (SAS) --- Sabo, T.
Understanding organizational trends in spending can help overseeing government agencies make appropriate modifications in spending to best serve the organization and the citizenry. However, given millions of line items for organizations annually, including free-form text, it is unrealistic for these overseeing agencies to succeed by utilizing only a manual approach to this textual data. Using a publicly available dataset, this paper explores how business users can apply text analytics using SAS® Contextual Analysis to assess trends in spending for particular agencies, apply subject matter expertise to refine these trends into a taxonomy, and ultimately, categorize the spending for organizations in a flexible, user-friendly manner. SAS® Visual Analytics enables dynamic exploration, including modeling results from SAS® Visual Statistics, in order to assess areas of potentially extraneous spending, providing actionable information to the decision makers.

Ten Tips for Simulating Data with SAS® (SAS) --- Wicklin, R.
Data simulation is a fundamental tool for statistical programmers. SAS® software provides many techniques for simulating data from a variety of statistical models. However, not all techniques are equally efficient or equally effective. An efficient simulation can run in seconds, whereas an inefficient simulation might require days to run. An effective method produces an accurate statistical result, whereas an ineffective technique does not. This paper presents 10 techniques that will make your SAS simulations both efficient and effective.


STAT TUTORIAL

Designing and Analyzing Surveys with SAS/STAT Software (2 hr Tutorial) (SAS) --- Brown, G.
Designing probability-based sample surveys usually requires the use of strategies such as stratification, clustering and unequal weighting. Analyzing the resulting data requires specialized techniques that takes these strategies into account in order to produce statistically valid inferences. This requires specialized software.  This tutorial shows you how to use the SAS/STAT software specifically designed for selecting and analyzing probability samples for survey data. You will learn how to:

  • Select probability samples according to various designs with the SURVEYSELECT procedure.
  • Impute missing values in your sample with the SURVEYIMPUTE procedure.
  • Produce descriptive statistics with the SURVEYMEANS and SURVEYFREQ procedures.
  • Build statistical models with the SURVEYREG, SURVEYLOGISTIC, and SURVEYPHREG procedures.

The tutorial also discusses the characteristics of different variance estimation techniques, including both Taylor series and replication methods.

The course is intended for a broad audience of statisticians who are interested in analyzing sample survey data. Familiarity with basic statistics, including regression analysis, is strongly recommended.


SUPER DEMOS

Calling R from SAS/IML Software --- Rick Wicklin.
SAS/IML® 14.1 enables you to author, install, and call packages. A package consists of SAS/IML source code, documentation, data sets, and sample programs. Packages provide a simple way to share SAS/IML functions. An expert who writes a statistical analysis in SAS/IML can create a package and upload it to the SAS/IML File Exchange. A nonexpert can download the package, install it, and immediately start using it. Packages provide a standard and uniform mechanism for sharing programs, which benefits both experts and nonexperts. This paper describes how SAS/IML programmers can construct, upload, download, and install packages.  They’re not wrapped in brown paper or tied up with strings, but they’ll soon be a few of your favorite things!

Create Highly Customized Graphs Using ODS Graphic --- Warren Kuhfeld.
Learn how to modify templates, capture and change both the underlying data object and the dynamic variables, and use SG annotation to customize the graphs that analytical procedures produce.

Customizing the Kaplan-Meier Survival Plot --- Warren Kuhfeld.
You will learn about options and macros for customizing the Kaplan-Meier survival plot in PROC LIFETEST.

SAS Viya --- Amy Peters
.

What’s New in SAS Enterprise Guide? --- Casey Smith.
SAS Enterprise Guide continues to add easy-to-use features which allow you to work more efficiently. For example, you can now debug your DATA step code with a DATA step debugger tool, upload data to SAS Viya with a point-and-click task, control process flow execution behavior when an error occurs, output results to Excel and PowerPoint destinations with the click of a button, zoom views, filter the data grid with your own WHERE clause, easily define case-insensitive filters, and automatically get the latest product updates. Come see these and more of the new features and enhancements in SAS Enterprise Guide 7.1, 7.11, 7.12, and a sneak peek at 7.13.

Writing Packages: A New Way to Distribute and Use SAS/IML® Programs --- Rick Wicklin.
SAS/IML® 14.1 enables you to author, install, and call packages. A package consists of SAS/IML source code, documentation, data sets, and sample programs. Packages provide a simple way to share SAS/IML functions. An expert who writes a statistical analysis in SAS/IML can create a package and upload it to the SAS/IML File Exchange. A nonexpert can download the package, install it, and immediately start using it. Packages provide a standard and uniform mechanism for sharing programs, which benefits both experts and nonexperts. This paper describes how SAS/IML programmers can construct, upload, download, and install packages.  They’re not wrapped in brown paper or tied up with strings, but they’ll soon be a few of your favorite things!

 


Note that additional presentations are being given in the Student Symposium.

*** Subject to change. ***

 

Share and Enjoy:
  • Facebook
  • Twitter
  • LinkedIn
Visit Us On FacebookVisit Us On Linkedin