Iycee Charles de Gaulle Summary Abstract— fact of business life. A variety

Abstract— fact of business life. A variety


Abstract— Data Engineering and analytics is not a
buzzword it is a new fact of business life. A variety of corporations rely on
data analytics to make business decisions. Increasingly popularity of data
analytics, more and more technology tools appeared in market. Top data analysis
packages are SAS, SPSS and Statstica Data Miner in aspects of user
friendliness, data analysis capability and popularity.

Choosing data analytics tool is a tradeoff of costs and
benefits. However, sometimes people can be as partisan about their favorite statistical
package as they can be about their favorite Soccer team. These emotional
attachments could cause big trouble. Each statistical package has its own
strengths and weaknesses; the choice of data analytics tool should be based on
rationales, rather than individual preference.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now

Selecting a data analytics tool is not just about picking a
single package and sticking on it, but a process of learning and utilizing a
toolkit of package. Analytics tool should not only satisfy current needs, but
also address the strategic objectives of whole project. Knowing a single
package is not enough; a combination of several tools is always the best way to
solve problems.

Keywords— SAS,
SPSS, Stastica Data Miner, Data Engineering, Data Packages.

I.    Introduction



Software system was developed for
management of data, business intelligence, advanced analytics by the SAS
institute and was called as SAS software. This software can alter, mine,
retrieve and manage data from different area of sources and performs analysis
on the same, preferably statical analysis. SAS supports graphical user
interface. It provides point and click user friendly interface for
non-technical people 1. 

Programs of SAS has PROC and DATA steps.

PROC steps analyze the data and DATA steps retrieve and manipulate data. Each
step has a series of statements. A data analytics tool which is enormously used
in Machine learning, Data Science, and in applications of Business Intelligence
is called as SAS analytics.

DATA steps: compilation and execution
and the two phases of this step. This has statements which are executable which
results in an action being taken by the software. It also has declarative
statements which has a set of instructions to alter and read the data. All the
executable statements are executed sequentially. PROC steps: the statements in
this are called as PROC statements which later call the function or procedure

1: SAS Enterprise Miner Diagram 5

These functions later analyze the data
and produces the results of analysis, graphics and statistics. There are pieces
of code which is repetitively used in the program and are called as SAS macros.

2: Tree Plot of SAS EM 5

 All necessary tools to monitor Metrics of BI
is equipped by the organization and powerful analytics are produced and reports
that are comprehensive are used for decision makers to take the necessary good
decisions. A number of modules of SAS for web, marketing analytics and social
media can be used to create customer specific profiles and prospects. It also
predicts the behaviors of customer and concurrently, optimizes the
communications. SAS Enterprise Risk Management is a set of product designs
which are primarily designed for banks and organizations which are financially
managed or serviced. Data and information in SAS can be published in PDF,
Excel, HTML and various other formats with the help of the Output Delivery
System. This was first introduced in the year 2007. SAS point-and-click
interface is supported with directions given in SAS guide. Code is generated
manipulate and change data or perform automatic analysis. This does not require
experience in the field of SAS programming 2.


Statistica Data Miner:

Software package for advanced analysis
which was developed by StatSoft and was later bought/acquired by Dell. This is
actually a derivation from a number of software packages and other add-ons
which was developed by StatSoft in mid ’80s. This software has data management,
data mining procedures, data visualization and an array of analysis of data
along with a variety of exploratory techniques, clustering, predictive modeling
and classification. Integration with the open source, free R programming
environment is an additional available technique. There are many different
analytical techniques packages that are present in six product lines. There are
many reasons to go with statistica data miner. Following are a few listed
reasons 3. 

1.     It
is highly scalable and using trees and leading-edge algorithm to process the

2.     This
data miner has an ability to attach to different file formats and databases.

3.     It
has highly skilled Data management and preparation

4.     Optimization
through multithreading in statistica data miner leverages multicore architecture
of CPU and this in turn results in high performance including on inexpensive

5.     This
miner can easily adapt to meet demanding and very specific requirements
analysis by using is advantageous open architecture. And hence this is highly

6.     It
also provides a wizard, straightforward style miner recipes interfaces for user
that helps in guiding the user with stepwise plain English prompts procedure
which is used to create a model as required or creates a model alternatively in
a single step by having all prompts bypassed and making use of intelligent
defaults randomly selected as per the program.

7.     This
has access to different format of distributed file system and also has access
to Hadoop.

8.     Statistica
Data Miner is a very powerful and provides a variety of selection of data
mining solutions which is highly user-friendly interface and deployed engine.

This statistica data miner is not
treated as a very serious software by the business management but however most
of the upper management still decided to stay with SAS. StatSoft however
focuses continuously on selling top down so that they are viewed by the
management in a more traditional business way. This approach simulated
influences in forest like/similar climate and fire. These two approaches could
not be directly integrated as there was a lot of chaos in the data that was
known and much data that remained unknown 3.

So, the bottom up and top down
approaches which used simple straight assumptions were connected which helped
in creating a very successful forest growth model in this entire world.



SPSS abbreviation is Statistical Package
for the Social Sciences. After being launched in 1968 this was acquired by IBM
in later 2009. All sorts of data and analyzed and edited using SPSS software.

The input data for this software may come for sources like a customer database,
Google Analytics, scientific research, or server log files that are available
of a particular website. All kinds of file formats can be opened using this
software, like relational databases, plain text files, Stata and SAS etc 4.

This software is very user friendly and
easy to use. It is very scalable and flexible pattern makes IBM SPSS
approachable to users who are highly skilled and fits into projects of a
variety of different sizes and complexity to guide the user and the
organization to notice or learn new opportunities, minimize risk and improve

Fig 3: Histogram and Web-Gaph in SPSS(Clementine) 5


SPSS is also used by data miners, health
researchers, government, market researchers, survey companies, education
researchers, marketing organizations and others. There are many features of
SPSS. Features are accessible using command syntax language and also via pull
down menus. Benefits of command syntax programming are reproducible output,
handling data complex manipulations simplifying repetitive tasks and analyses.

The pull-down menu similarly creates command syntax and can be displayed via
output form. By using Production Job Facility programs are ran unattended and
interactively 4.

II.   Comparison


comparison between SAS, SPSS and Statistica Miner in Table 1.






Statistica       Miner


Programming oriented

Interactive Windows

Interactive Windiws


SAS Institute




Command Line and GUI

Command Line and GUI


Written in




Scripting Language

SAS Language

Python, SaxBasic

Statistica Visual Basic (SVB)

Support for Analysis of Variance Method

Support all the variants

Support all the variants

Model is not supported

Support for Various Regression Method

Supports all the regression methods

all the regression methods

is not supported

Support for Time Series Analysis Method

Supports ARIMA, GARCH, Unit Root test, Cointegration Test, VAR,
Multivariate GARCH

Only supports

Only support

Support for various Charts and Diagrams

Supports all kinds of charts like bar, box plot etc.

all charts

all charts

Software Type

Standalone Executive

Standalone Executive

Standalone Executive

Software Source




Ease to Learn

Difficult to learn because of complex command line structure

to learn then SAS

of all three because of its GUI

System Support

Windows, Linux, Unix

Windows, Mac OS, Linux, Unix, Cloud


Table 1: Statistical Comparison 2


III.   Selected
Tool For Research


The software package I will be selecting
for my data mining problem will be SAS.

SAS is a software
suite that can mine, alter, manage and retrieve data from a variety of sources
and perform statistical analysis on it. SAS provides a graphical
point-and-click user interface for non-technical users and more advanced options
through the SAS language.

SAS programs have
DATA steps, which retrieve and manipulate data, and PROC steps, which analyze
the data. Each step consists of a series of statements

The DATA step has
executable statements that result in the software taking an action, and
declarative statements that provide instructions to read a data set or alter
the data’s appearance. The DATA step has two phases: compilation and execution.

In the compilation phase, declarative statements are processed and syntax errors
are identified. This 1/0 approach is in harmony with
my data mining problem of weather prediction.