Iycee Charles de Gaulle Summary Abstract— fact of business life. A variety

Abstract— fact of business life. A variety

 

Abstract— Data Engineering and analytics is not a
buzzword it is a new fact of business life. A variety of corporations rely on
data analytics to make business decisions. Increasingly popularity of data
analytics, more and more technology tools appeared in market. Top data analysis
packages are SAS, SPSS and Statstica Data Miner in aspects of user
friendliness, data analysis capability and popularity.

Choosing data analytics tool is a tradeoff of costs and
benefits. However, sometimes people can be as partisan about their favorite statistical
package as they can be about their favorite Soccer team. These emotional
attachments could cause big trouble. Each statistical package has its own
strengths and weaknesses; the choice of data analytics tool should be based on
rationales, rather than individual preference.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Selecting a data analytics tool is not just about picking a
single package and sticking on it, but a process of learning and utilizing a
toolkit of package. Analytics tool should not only satisfy current needs, but
also address the strategic objectives of whole project. Knowing a single
package is not enough; a combination of several tools is always the best way to
solve problems.

Keywords— SAS,
SPSS, Stastica Data Miner, Data Engineering, Data Packages.

                                                                                                                                                            
I.    Introduction

 

SAS:

Software system was developed for
management of data, business intelligence, advanced analytics by the SAS
institute and was called as SAS software. This software can alter, mine,
retrieve and manage data from different area of sources and performs analysis
on the same, preferably statical analysis. SAS supports graphical user
interface. It provides point and click user friendly interface for
non-technical people 1. 

Programs of SAS has PROC and DATA steps.

PROC steps analyze the data and DATA steps retrieve and manipulate data. Each
step has a series of statements. A data analytics tool which is enormously used
in Machine learning, Data Science, and in applications of Business Intelligence
is called as SAS analytics.

DATA steps: compilation and execution
and the two phases of this step. This has statements which are executable which
results in an action being taken by the software. It also has declarative
statements which has a set of instructions to alter and read the data. All the
executable statements are executed sequentially. PROC steps: the statements in
this are called as PROC statements which later call the function or procedure
named.

Fig
1: SAS Enterprise Miner Diagram 5

These functions later analyze the data
and produces the results of analysis, graphics and statistics. There are pieces
of code which is repetitively used in the program and are called as SAS macros.

Fig
2: Tree Plot of SAS EM 5

 All necessary tools to monitor Metrics of BI
is equipped by the organization and powerful analytics are produced and reports
that are comprehensive are used for decision makers to take the necessary good
decisions. A number of modules of SAS for web, marketing analytics and social
media can be used to create customer specific profiles and prospects. It also
predicts the behaviors of customer and concurrently, optimizes the
communications. SAS Enterprise Risk Management is a set of product designs
which are primarily designed for banks and organizations which are financially
managed or serviced. Data and information in SAS can be published in PDF,
Excel, HTML and various other formats with the help of the Output Delivery
System. This was first introduced in the year 2007. SAS point-and-click
interface is supported with directions given in SAS guide. Code is generated
manipulate and change data or perform automatic analysis. This does not require
experience in the field of SAS programming 2.

 

Statistica Data Miner:

Software package for advanced analysis
which was developed by StatSoft and was later bought/acquired by Dell. This is
actually a derivation from a number of software packages and other add-ons
which was developed by StatSoft in mid ’80s. This software has data management,
data mining procedures, data visualization and an array of analysis of data
along with a variety of exploratory techniques, clustering, predictive modeling
and classification. Integration with the open source, free R programming
environment is an additional available technique. There are many different
analytical techniques packages that are present in six product lines. There are
many reasons to go with statistica data miner. Following are a few listed
reasons 3. 

1.     It
is highly scalable and using trees and leading-edge algorithm to process the
results.

2.     This
data miner has an ability to attach to different file formats and databases.

3.     It
has highly skilled Data management and preparation

4.     Optimization
through multithreading in statistica data miner leverages multicore architecture
of CPU and this in turn results in high performance including on inexpensive
hardware.

5.     This
miner can easily adapt to meet demanding and very specific requirements
analysis by using is advantageous open architecture. And hence this is highly
customizable

6.     It
also provides a wizard, straightforward style miner recipes interfaces for user
that helps in guiding the user with stepwise plain English prompts procedure
which is used to create a model as required or creates a model alternatively in
a single step by having all prompts bypassed and making use of intelligent
defaults randomly selected as per the program.

7.     This
has access to different format of distributed file system and also has access
to Hadoop.

8.     Statistica
Data Miner is a very powerful and provides a variety of selection of data
mining solutions which is highly user-friendly interface and deployed engine.

This statistica data miner is not
treated as a very serious software by the business management but however most
of the upper management still decided to stay with SAS. StatSoft however
focuses continuously on selling top down so that they are viewed by the
management in a more traditional business way. This approach simulated
influences in forest like/similar climate and fire. These two approaches could
not be directly integrated as there was a lot of chaos in the data that was
known and much data that remained unknown 3.

So, the bottom up and top down
approaches which used simple straight assumptions were connected which helped
in creating a very successful forest growth model in this entire world.

 

SPSS:

SPSS abbreviation is Statistical Package
for the Social Sciences. After being launched in 1968 this was acquired by IBM
in later 2009. All sorts of data and analyzed and edited using SPSS software.

The input data for this software may come for sources like a customer database,
Google Analytics, scientific research, or server log files that are available
of a particular website. All kinds of file formats can be opened using this
software, like relational databases, plain text files, Stata and SAS etc 4.

This software is very user friendly and
easy to use. It is very scalable and flexible pattern makes IBM SPSS
approachable to users who are highly skilled and fits into projects of a
variety of different sizes and complexity to guide the user and the
organization to notice or learn new opportunities, minimize risk and improve
efficiency.

Fig 3: Histogram and Web-Gaph in SPSS(Clementine) 5

 

SPSS is also used by data miners, health
researchers, government, market researchers, survey companies, education
researchers, marketing organizations and others. There are many features of
SPSS. Features are accessible using command syntax language and also via pull
down menus. Benefits of command syntax programming are reproducible output,
handling data complex manipulations simplifying repetitive tasks and analyses.

The pull-down menu similarly creates command syntax and can be displayed via
output form. By using Production Job Facility programs are ran unattended and
interactively 4.

                                                                                                                                                              
II.   Comparison

 

Statistical
comparison between SAS, SPSS and Statistica Miner in Table 1.

 

 

 

 
SAS

 
SPSS

Statistica       Miner

 
Division

Programming oriented

Interactive Windows

Interactive Windiws

 
Developer

 
SAS Institute

 
IBM

 
Dell

 
Interface

Command Line and GUI

Command Line and GUI

 
GUI

 
Written in

 
C

 
Java

 
C++

 
Scripting Language

 
SAS Language

 
R,
Python, SaxBasic

R,
Statistica Visual Basic (SVB)

Support for Analysis of Variance Method

 
 
Support all the variants

 
Support all the variants

 
Mixed
Model is not supported

Support for Various Regression Method

Supports all the regression methods

 
Supports
all the regression methods

 
Quantile
is not supported

 
 
Support for Time Series Analysis Method

Supports ARIMA, GARCH, Unit Root test, Cointegration Test, VAR,
Multivariate GARCH

 
 
Only supports
ARIMA, GARCH

 
 
Only support
ARIMA

Support for various Charts and Diagrams

Supports all kinds of charts like bar, box plot etc.

 
 
Support
all charts

 
 
Support
all charts

 
Software Type

 
Standalone Executive

 
Standalone Executive

 
Standalone Executive

 
Software Source

 
Freeware

 
Proprietary
Software

 
Proprietary
Software

 
 
Ease to Learn

 
Difficult to learn because of complex command line structure
 

 
 
Easier
to learn then SAS

 
 
Easiest
of all three because of its GUI

 
Operating
System Support

 
 
Windows, Linux, Unix

Windows, Mac OS, Linux, Unix, Cloud

 
 
Windows

Table 1: Statistical Comparison 2

 

                                                                                                                                          
III.   Selected
Tool For Research

 

The software package I will be selecting
for my data mining problem will be SAS.

SAS is a software
suite that can mine, alter, manage and retrieve data from a variety of sources
and perform statistical analysis on it. SAS provides a graphical
point-and-click user interface for non-technical users and more advanced options
through the SAS language.

SAS programs have
DATA steps, which retrieve and manipulate data, and PROC steps, which analyze
the data. Each step consists of a series of statements

The DATA step has
executable statements that result in the software taking an action, and
declarative statements that provide instructions to read a data set or alter
the data’s appearance. The DATA step has two phases: compilation and execution.

In the compilation phase, declarative statements are processed and syntax errors
are identified. This 1/0 approach is in harmony with
my data mining problem of weather prediction.