Abstract— Data Engineering and analytics is not a
buzzword it is a new fact of business life. A variety of corporations rely on
data analytics to make business decisions. Increasingly popularity of data
analytics, more and more technology tools appeared in market. Top data analysis
packages are SAS, SPSS and Statstica Data Miner in aspects of user
friendliness, data analysis capability and popularity.
Choosing data analytics tool is a tradeoff of costs and
benefits. However, sometimes people can be as partisan about their favorite statistical
package as they can be about their favorite Soccer team. These emotional
attachments could cause big trouble. Each statistical package has its own
strengths and weaknesses; the choice of data analytics tool should be based on
rationales, rather than individual preference.
Selecting a data analytics tool is not just about picking a
single package and sticking on it, but a process of learning and utilizing a
toolkit of package. Analytics tool should not only satisfy current needs, but
also address the strategic objectives of whole project. Knowing a single
package is not enough; a combination of several tools is always the best way to
SPSS, Stastica Data Miner, Data Engineering, Data Packages.
Software system was developed for
management of data, business intelligence, advanced analytics by the SAS
institute and was called as SAS software. This software can alter, mine,
retrieve and manage data from different area of sources and performs analysis
on the same, preferably statical analysis. SAS supports graphical user
interface. It provides point and click user friendly interface for
non-technical people 1.
Programs of SAS has PROC and DATA steps.
PROC steps analyze the data and DATA steps retrieve and manipulate data. Each
step has a series of statements. A data analytics tool which is enormously used
in Machine learning, Data Science, and in applications of Business Intelligence
is called as SAS analytics.
DATA steps: compilation and execution
and the two phases of this step. This has statements which are executable which
results in an action being taken by the software. It also has declarative
statements which has a set of instructions to alter and read the data. All the
executable statements are executed sequentially. PROC steps: the statements in
this are called as PROC statements which later call the function or procedure
1: SAS Enterprise Miner Diagram 5
These functions later analyze the data
and produces the results of analysis, graphics and statistics. There are pieces
of code which is repetitively used in the program and are called as SAS macros.
2: Tree Plot of SAS EM 5
All necessary tools to monitor Metrics of BI
is equipped by the organization and powerful analytics are produced and reports
that are comprehensive are used for decision makers to take the necessary good
decisions. A number of modules of SAS for web, marketing analytics and social
media can be used to create customer specific profiles and prospects. It also
predicts the behaviors of customer and concurrently, optimizes the
communications. SAS Enterprise Risk Management is a set of product designs
which are primarily designed for banks and organizations which are financially
managed or serviced. Data and information in SAS can be published in PDF,
Excel, HTML and various other formats with the help of the Output Delivery
System. This was first introduced in the year 2007. SAS point-and-click
interface is supported with directions given in SAS guide. Code is generated
manipulate and change data or perform automatic analysis. This does not require
experience in the field of SAS programming 2.
Statistica Data Miner:
Software package for advanced analysis
which was developed by StatSoft and was later bought/acquired by Dell. This is
actually a derivation from a number of software packages and other add-ons
which was developed by StatSoft in mid ’80s. This software has data management,
data mining procedures, data visualization and an array of analysis of data
along with a variety of exploratory techniques, clustering, predictive modeling
and classification. Integration with the open source, free R programming
environment is an additional available technique. There are many different
analytical techniques packages that are present in six product lines. There are
many reasons to go with statistica data miner. Following are a few listed
is highly scalable and using trees and leading-edge algorithm to process the
data miner has an ability to attach to different file formats and databases.
has highly skilled Data management and preparation
through multithreading in statistica data miner leverages multicore architecture
of CPU and this in turn results in high performance including on inexpensive
miner can easily adapt to meet demanding and very specific requirements
analysis by using is advantageous open architecture. And hence this is highly
also provides a wizard, straightforward style miner recipes interfaces for user
that helps in guiding the user with stepwise plain English prompts procedure
which is used to create a model as required or creates a model alternatively in
a single step by having all prompts bypassed and making use of intelligent
defaults randomly selected as per the program.
has access to different format of distributed file system and also has access
Data Miner is a very powerful and provides a variety of selection of data
mining solutions which is highly user-friendly interface and deployed engine.
This statistica data miner is not
treated as a very serious software by the business management but however most
of the upper management still decided to stay with SAS. StatSoft however
focuses continuously on selling top down so that they are viewed by the
management in a more traditional business way. This approach simulated
influences in forest like/similar climate and fire. These two approaches could
not be directly integrated as there was a lot of chaos in the data that was
known and much data that remained unknown 3.
So, the bottom up and top down
approaches which used simple straight assumptions were connected which helped
in creating a very successful forest growth model in this entire world.
SPSS abbreviation is Statistical Package
for the Social Sciences. After being launched in 1968 this was acquired by IBM
in later 2009. All sorts of data and analyzed and edited using SPSS software.
The input data for this software may come for sources like a customer database,
Google Analytics, scientific research, or server log files that are available
of a particular website. All kinds of file formats can be opened using this
software, like relational databases, plain text files, Stata and SAS etc 4.
This software is very user friendly and
easy to use. It is very scalable and flexible pattern makes IBM SPSS
approachable to users who are highly skilled and fits into projects of a
variety of different sizes and complexity to guide the user and the
organization to notice or learn new opportunities, minimize risk and improve
Fig 3: Histogram and Web-Gaph in SPSS(Clementine) 5
SPSS is also used by data miners, health
researchers, government, market researchers, survey companies, education
researchers, marketing organizations and others. There are many features of
SPSS. Features are accessible using command syntax language and also via pull
down menus. Benefits of command syntax programming are reproducible output,
handling data complex manipulations simplifying repetitive tasks and analyses.
The pull-down menu similarly creates command syntax and can be displayed via
output form. By using Production Job Facility programs are ran unattended and
comparison between SAS, SPSS and Statistica Miner in Table 1.
Command Line and GUI
Command Line and GUI
Statistica Visual Basic (SVB)
Support for Analysis of Variance Method
Support all the variants
Support all the variants
Model is not supported
Support for Various Regression Method
Supports all the regression methods
all the regression methods
is not supported
Support for Time Series Analysis Method
Supports ARIMA, GARCH, Unit Root test, Cointegration Test, VAR,
Support for various Charts and Diagrams
Supports all kinds of charts like bar, box plot etc.
Ease to Learn
Difficult to learn because of complex command line structure
to learn then SAS
of all three because of its GUI
Windows, Linux, Unix
Windows, Mac OS, Linux, Unix, Cloud
Table 1: Statistical Comparison 2
Tool For Research
The software package I will be selecting
for my data mining problem will be SAS.
SAS is a software
suite that can mine, alter, manage and retrieve data from a variety of sources
and perform statistical analysis on it. SAS provides a graphical
point-and-click user interface for non-technical users and more advanced options
through the SAS language.
SAS programs have
DATA steps, which retrieve and manipulate data, and PROC steps, which analyze
the data. Each step consists of a series of statements
The DATA step has
executable statements that result in the software taking an action, and
declarative statements that provide instructions to read a data set or alter
the data’s appearance. The DATA step has two phases: compilation and execution.
In the compilation phase, declarative statements are processed and syntax errors
are identified. This 1/0 approach is in harmony with
my data mining problem of weather prediction.