Iycee Charles de Gaulle Summary The Process Of Data Mining Biology Essay

The Process Of Data Mining Biology Essay

In recent times, our capablenesss of bring forthing and roll uping informations have been increasing quickly. The widespread usage of saloon codification for most commercial merchandises, the cybernation of many concern and authorities minutess and the progresss in informations aggregation tools have provided us with a big sum of informations. Millions of informations base have been used in concern direction, authorities disposal and in many other sort of applications. It is noted that the figure of such informations base are turning quickly because of the handiness of powerful and low-cost database systems. This sole growing in informations and database has generated an pressing demand for new techniques and tools that can easy transform the information into utile information and cognition.

By and large, informations excavation ( sometimes called information or knowledge find ) is the procedure of analysing informations from different positions and sum uping it into utile information – information that can be used to increase gross, cuts costs, or both. Data excavation package is one of a figure of analytical tools for analysing informations. It allows users to analyse informations from many different dimensions or angles, categorise it, and sum up the relationships identified. Technically, informations excavation is the procedure of happening correlativities or forms among tonss of Fieldss in big relational databases.

Data excavation is chiefly used today by companies with a strong consumer focal point – retail, fiscal, communicating, and selling organisations. It enables these companies to find relationships among “ internal ” factors such as monetary value, merchandise placement, or staff accomplishments, and “ external ” factors such as economic indexs, competition, and client demographics. And, it enables them to find the impact on gross revenues, client satisfaction, and corporate net incomes. Finally, it enables them to “ bore down ” into drumhead information to position item transactional informations.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

With informations excavation, a retail merchant could utilize point-of-sale records of client purchases to direct targeted publicities based on an person ‘s purchase history. By mining demographic informations from remark or guarantee cards, the retail merchant could develop merchandises and publicities to appeal to specific client sections.

For illustration, Blockbuster Entertainment mines its picture rental history database to urge leases to single clients. American Express can propose merchandises to its cardholders based on analysis of their monthly outgos.

WalMart is open uping monolithic informations excavation to transform its supplier relationships. WalMart captures point-of-sale minutess from over 2,900 shops in 6 states and continuously transmits this information to its monolithic 7.5 TB Teradata informations warehouse. WalMart allows more than 3,500 providers, to entree informations on their merchandises and execute informations analyses. These providers use this information to place client purchasing forms at the shop show degree. They use this information to pull off local shop stock list and place new selling chances. In 1995, WalMart computing machines processed over 1 million complex information questions.

The National Basketball Association ( NBA ) is researching a information excavation application that can be used in concurrence with image recordings of hoops games. The Advanced Scout package analyzes the motions of participants to assist managers orchestrate dramas and schemes. For illustration, an analysis of the play-by-play sheet of the game played between the New York Knicks and the Cleveland Cavaliers on January 6, 1995 reveals that when Mark Price played the Guard place, John Williams attempted four leap shootings and made each one! Advanced Scout non merely finds this form, but explains that it is interesting because it differs well from the mean shooting per centum of 49.30 % for the Cavaliers during that game.

By utilizing the NBA cosmopolitan clock, a manager can automatically convey up the picture cartridge holders demoing each of the leap shootings attempted by Williams with Price on the floor, without necessitating to comb through hours of video footage. Those cartridge holders show a really successful pick-and-roll drama in which Price draws the Knick ‘s defence and so finds Williams for an unfastened leap shooting.

The most normally used techniques in informations excavation are:

Artificial nervous webs: Non-linear prognostic theoretical accounts that learn through preparation and resemble biological nervous webs in construction.

Decision trees: Arboreal constructions that represent sets of determinations. These determinations generate regulations for the categorization of a dataset. Specific determination tree methods include Classification and Regression Trees ( CART ) and Chi Square Automatic Interaction Detection ( CHAID ) .

Familial algorithms: Optimization techniques that use procedures such as familial combination, mutant, and natural choice in a design based on the constructs of development.

Nearest neighbour method: A technique that classifies each record in a dataset based on a combination of the categories of the K record ( s ) most similar to it in a historical dataset ( where K A? 1 ) . Sometimes called the k-nearest neighbour technique.

Rule initiation: The extraction of utile if-then regulations from informations based on statistical significance.

Q: 1

Using the Law of set Algebra, Simply the Following

A a‹‚ ( Aˆa‹? B )

Sol:

= A a‹‚ ( Aˆa‹? B )

= ( A a‹‚ Aˆ ) a‹? ( A a‹‚ B ) ( distributive jurisprudence )

= o a‹? ( A a‹‚ B ) ( complement jurisprudence )

= ( A a‹‚ B ) ( individuality jurisprudence )

( Aˆ a‹? B ) a‹‚ ( A a‹‚ B )

Sol:

= ( Aˆ a‹? B ) a‹‚ ( A a‹‚ B )

= ( A a‹‚ B ) a‹‚ ( A a‹‚ B ) ( demorgan ‘s jurisprudence )

= o ( complement jurisprudence )

( A a‹? B ) a‹‚ ( A a‹? B )

Sol:

= ( A a‹? B ) a‹‚ ( A a‹? B )

=Aa‹? ( B a‹‚ B ) ( distributive jurisprudence )

=Aa‹? o ( complement jurisprudence )

= A ( individuality jurisprudence )

( Aˆ a‹‚ B ) a‹? ( A a‹? B )

Sol:

= ( Aˆ a‹‚ B ) a‹? ( A a‹? B )

= ( Aa‹? B ) a‹? ( Aa‹? B ) ( demorgan ‘s jurisprudence )

=Aµ

( Aa‹? Ba‹? C ) a‹‚ ( Aa‹? Ba‹? C ) a‹‚ ( A a‹? B )

Sol:

= ( Aa‹? Ba‹? C ) a‹‚ ( Aa‹? Ba‹? C ) a‹‚ ( A a‹? B )

= ( Aa‹? B ) a‹? ( Ca‹‚ C ) a‹‚ ( Aa‹? B ) ( distributive jurisprudence )

= ( Aa‹? B ) a‹?oa‹‚ ( Aa‹? B ) ( complement jurisprudence )

= ( Aa‹? B ) a‹‚ ( Aa‹? B ) ( complement jurisprudence )

=Aa‹? ( Ba‹‚ B ) ( distributive jurisprudence )

=Aa‹?o ( complement jurisprudence )

=A ( individuality jurisprudence )

( Aa‹? Ba‹? C ) a‹‚ ( Aa‹? ( Ba‹‚ C ) )

Sol:

= ( Aa‹? Ba‹? C ) a‹‚ ( ( Aa‹? ( Ba‹‚ C ) )

= ( Aa‹? Ba‹? C ) a‹‚ ( ( Aa‹? B ) a‹‚ ( Aa‹? C ) ) ( distributive jurisprudence )

= ( Aa‹? Ba‹? C ) a‹‚ ( Aa‹? B ) a‹‚ ( Aa‹? C ) ( associatory jurisprudence )

= ( ( Aa‹? B ) a‹? ( C a‹? o ) ) a‹‚ ( Aa‹? C ) ( distributive jurisprudence )

= ( ( Aa‹? B ) a‹? o ) a‹‚ ( Aa‹? C ) ( individuality jurisprudence )

= ( Aa‹? B ) a‹‚ ( Aa‹? C ) ( individuality jurisprudence )

=Aa‹? ( B a‹‚ C ) ( distributive jurisprudence )

g )

(

And the last measure relies on the individuality: ,

Q: 2

Two sets A and B belongs to the same Universal Set U, the difference A-B between the two sets is a 3rd set whose elements are those elements of Angstrom that are non in B

Satisfy yourself that A-B = Aa‹‚B, so verify the following utilizing venn diagram:

U-A = Angstrom

( A-B ) U B = AUB

Ca‹‚ ( A-B ) =Ca‹‚A -Ca‹‚B

( AUB ) U ( B-A ) = AUB

Sol:

Let x be any arbitrary component of the set U-A.

= x N” U-A

= x N” U and xN” A

= x N” Aaˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦.1

Let y be any arbitrary component of the set Angstrom

=y N” A

=y N” Aaˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦aˆ¦..2

From 1 and 2

U-A = Angstrom

U-A

A

( A-B ) UB = AUB

Sol:

Let x be any arbitrary component of the set ( A-B ) UB

=x N” ( A-B ) U B

=x N” A and x N” B or x N” B

=x N” A or ( ten N” B and x N” B )

=x N” A or ( xN”B and ten N” B )

= x N” A or ten N” ( B a‹‚ B )

= x N” A or ten N” ( B U B )

= x N” A or ten N” ( B )

= x N” ( A U B ) … … … … 1

Let y be any arbitrary component of the set ( A U B )

= Y N” ( AUB )

= Y N” A or ten N” B

= Y N” ( AUB ) … … … … .2

From 1 and 2 we get ( A-B ) UB = AUB

A-B

( A-B ) UB

AUB

( AUB ) U ( B-A ) = AUB

Let x be any arbitrary component of the set ( AUB ) U ( B-A )

=x N” ( AUB ) U ( B-A )

= x N” A or ten N” B or x N” B and x N” A

=x N” A and x N” A or x N” B or x N” B

= x N” ( A and x N” A ) or ten N” ( B UB )

= x N” ( A or A ) or ten N” ( B UB )

=x N” ( A ) or ten N” ( B )

= x N” ( A U B ) … … … … … … ..1

Let y be any arbitrary component of the set AUB

= Y N” ( A U B )

=y N” A or Y N” B

=y N” ( A or B )

=y N” ( A U B ) … … … … … … … ..2

From 1 and 2 we get ( AUB ) U ( B-A ) = AUB

( AUB )

B-A

( AUB ) U ( B-A )

( AUB )

C a‹‚ ( A-B ) = C a‹‚ A – C a‹‚B

Let x be any arbitrary component of the set C a‹‚ ( A-B )

=x N” { C a‹‚ ( A-B ) }

=x N” C and ( x N” A and x N” B )

= ( ten N” C and x N” A ) and ( ten N” C and x N” B )

=x N” ( Ca‹‚A ) – ten N” ( Ca‹‚B )

= x N” ( Ca‹‚A ) – ( Ca‹‚B ) … … ..1

A-B

Ca‹‚ ( A-B )

C a‹‚ A

Ca‹‚A

C a‹‚ A – C a‹‚B

Q: 3

In transporting out a study of the efficiency of the visible radiations, brakes and guidance of motor vehicles, 100 vehicles, 100 vehicles were found to be faulty as follows:

35 had faulty visible radiations

40 had faulty brakes

41 had faulty guidance

8 had faulty visible radiations and brakes

7 had faulty visible radiations and maneuvering

6 had faulty brakes and maneuvering

Use a Venn diagram to find

How many vehicles had faulty visible radiations, brakes and maneuvering?

How many vehicles had faulty visible radiations merely?

Sol:

Let L, B and S represents Light, Brakes and Steering severally.

N ( La‹? Sa‹? B ) = N ( L ) + n ( S ) + n ( B ) -n ( La‹‚ S ) -n ( Sa‹‚ B ) -n ( La‹‚ B ) + N ( La‹‚ Sa‹‚ B )

100 = 35+40+41-8-7-6+ N ( La‹‚ Sa‹‚ B )

N ( La‹‚ Sa‹‚ B ) = 5

The figure of vehicles holding faulty visible radiations, brakes and guidance is 5.

Defective visible radiation merely.

= N ( L ) – N ( Ba‹‚ S ) -n ( La‹‚ S ) + n ( La‹‚ Sa‹‚ B )

=35-8-7+5

=25

The figure of vehicles holding faulty visible radiation merely is 25.

Q: 4

In a study of 1000 families, each house had at least of the contraptions rinsing machine, vacuity cleaner or icebox. 400 had no icebox, 380 had no vacuity cleaner and 542 no lavation machine. 294 had both a vacuity cleaner and rinsing machine, 277 both a icebox and a vacuity cleaner, 190 both a icebox and a washing machine. How many families had all three contraptions? How many had merely a vacuity cleaner?

Sol:

Let R denotes icebox, V denotes vacuity cleaner and W denotes Washing Machine.

Given that

N ( Ra‹?Va‹?W ) = 1000

N ( R ) = 400

N ( V ) = 380

N ( W ) = 542 aˆ¦aˆ¦aˆ¦ 1

N ( Va‹‚W ) = 294

N ( Ra‹‚V ) = 277

N ( Ra‹‚W ) = 199

To Prove:

N ( Ra‹‚Va‹‚W )

How many family have merely vaccum.

Proof: We know that

N ( R ) = 1000 – N ( R )

N ( V ) = 1000 – N ( V ) aˆ¦aˆ¦aˆ¦2

N ( W ) = 1000 – N ( W )

From equation 1 we have

N ( R ) = 1000 – 400

= 600

N ( V ) = 1000 – 380

= 620

N ( W ) = 1000 – 542

= 458

Besides,

N ( Aa‹?Ba‹?C ) = N ( R ) + n ( V ) + n ( W ) -n ( Ra‹‚V ) -n ( Ra‹‚W ) – ( Va‹‚W ) +n ( Ra‹‚Va‹‚W )

1000 = 600+620+456-277-190-294+ N ( Aa‹‚Ba‹‚C )

N ( Ra‹‚Va‹‚W ) = 1000-600-620-458+277+190+294

N ( Ra‹‚Va‹‚W ) = 1761 – 1676

N ( Ra‹‚Va‹‚W ) = 83.

Merely 82 family had all three contraptions.

( B )

We know that

V = ( Ra‹‚V ) – ( Ra‹‚ Va‹‚ W ) + ( Va‹‚W ) – ( Ra‹‚Va‹‚W ) + ( Ra‹‚Va‹‚W )

N ( V ) = n ( Ra‹‚V ) -n ( Ra‹‚ Va‹‚ W ) +n ( Va‹‚W ) -n ( Ra‹‚Va‹‚W ) +n ( Ra‹‚Va‹‚W )

Substitute the values and we get the figure of vacuity cleansing agent.

= N ( V ) – N ( Ra‹‚V ) +n ( Ra‹‚ Va‹‚ W ) -n ( Va‹‚W )

= 620-277+83-294

= 132

The consequence shows that merely 132 house clasp have vacuum cleaner.

Q: 5

Verify, utilizing Venn diagrams, de Morgan ‘s Torahs. degree Celsius

( Aa‹? B )

( Aa‹‚B )

Figure: 1

Sol:

Here we have to turn out that

degree Celsiuss c degree Celsiuss

( Aa‹? B ) = A a‹‚ B

degree Celsiuss

A

Figure: 2

degree Celsiuss

Bacillus

Figure: 3

degree Celsiuss c

A a‹‚ B

Figure: 4

From figure 1,2, 3 and 4 we have

degree Celsiuss

( Aa‹? B ) = =

degree Celsiuss

A = ||

degree Celsiuss

B = //

degree Celsiuss c

A a‹‚ B = ////

From the figure we conclude that

degree Celsiuss c degree Celsiuss

( Aa‹? B ) = A a‹‚ B

This is because from figure 1

degree Celsiuss

( Aa‹? B ) is the shaded part and from figure 2 //// part is same as the shaded part of figure 1.

degree Celsiuss c degree Celsiuss

( A a‹‚ B ) = A a‹? B

n degree Celsius Ns c

( a‹? Ai ) = a‹‚ Ai

i=1 i=1

and

n degree Celsius Ns c

( a‹‚ Ai ) = a‹? Ai

i=1 i=1

A a‹? A = A ( INOVOLUTION LAW )

A a‹‚ A = A ( INDEMPOTENCY LAW )

All the four conditions are verified from venn diagram hence Demorgan ‘s jurisprudence is proved.

Q: 6 of page 375

Formulae has been used to Generate Tally

( degree Celsius ) Relative frequence is defined as the figure of successful tests to th.e entire figure of tests. Relative frequence is a really critical construct and can be used in chance peculiarly when anticipations can non be made merely by looking at the state of affairs, By utilizing comparative frequence old can be used to do anticipations )

( vitamin D ) frequence chart

To happen the comparative frequences, divide each frequence by the entire figure of informations in the sample, in this instance, 50. Relative frequences can be written as fractions, per centums, or decimals.

The comparative frequence is obtained by utilizing the expression

vitamin E ) Relative frequence chart

degree Fahrenheit )

The distribution of consequences can be shown a comparative frequence curve. From the graph the category scope and their comparative frequences matching to them can be seen.

FREQUENCY DISTRIBUTIVE

Q: 7

The efficiency of a new computing machine operating system is being tested on a mainframe computing machine. A sum of 40 tallies are carried out and in each run the same figure of occupations each chosen to be representative of the peculiar computing machine environment, is submitted as a batch to the machine. For each tally the throughput rate, measured in occupations per minute, is determined. The consequences of the 40 tallies are as follows:

3.22

3.18

3.25

3.24

3.28

3.21

3.26

3.19

3.30

3.23

3.14

3.22

3.35

3.23

3.27

3.23

3.26

3.37

3.24

3.25

3.34

3.19

3.27

3.28

3.28

3.26

3.18

3.29

3.31

3.30

3.17

3.23

3.25

3.20

3.29

3.22

By building a tally chart, group the information into the categories 3.12-3.16, 3.16-3.20, 3.20-3.24, aˆ¦ . Etc.

Calculate the mean and standard divergence of the sorted informations, utilizing the cryptography method.

Pull a cumulative frequence polygon and utilize it to gauge the average and semi interquartile scope of the informations.

Estimate the % of tallies with throughput rates which lie outside the interval which extends from one criterion divergence below the mean to one criterion divergence above the mean.

Sol:

Raw informations

3.22

3.18

3.25

3.24

3.28

3.21

3.26

3.19

3.30

3.23

3.14

3.22

3.35

3.23

3.27

3.23

3.26

3.37

3.24

3.25

3.34

3.19

3.27

3.28

3.28

3.26

3.18

3.29

3.31

3.30

3.17

3.23

3.25

3.20

3.29

3.22

N= 40

Min. Value = 3.14

Max. Value = 3.37

No

Class L

Class Uracil

Frequency

Class mid points

Class interval

xf

x*x*f.

Cum freq.

1

3.12

3.16

2

3.14

0.04

6.28

19.72

2

2

3.16

3.20

5

3.18

0.04

15.90

50.56

7

3

3.20

3.24

10

3.22

0.04

32.20

103.68

17

4

3.24

3.28

11

3.26

0.04

35.86

116.90

28

5

3.28

3.32

8

3.30

0.04

26.40

87.12

36

6

3.32

3.36

3

3.34

0.04

10.02

33.47

39

7

3.36

3.40

1

3.38

0.04

3.38

11.42

40

8

3.40

3.44

0

3.42

0.04

0.00

0.00

40

40

130.04

422.88

Mean

3.251

Variance = 0.002999

Standard divergence = 0.054763126

Median is that value matching to a cumulative frequence

of N/2.

20

Median category is the category incorporating the median,

and is hence the lowest category whose cumulative frequence exceeds N/2

i.e category 4

Fc

Cum frequence of category below average category

17

frequency modulation

Frequency of average category

11

Liter

Upper category boundary of category instantly below

average category

3.24

degree Celsiuss

Class interval

0.04

Median Q2

3.25091

Lower quartile Q1 is that value matching to a cumulative frequence

of N/4.

10

Lower quartile category is the category incorporating the lower quartile

and is hence the lowest category whose cumulative frequence exceeds N/4

i.e category 3

Fc1

Cum frequence of category below lower quartile category

7

fq1

Frequency of lower quartile category

10

L1

Upper category boundary of category instantly below

lower quartile category

3.20

degree Celsiuss

Class interval

0.04

Lower quartile Q1

3.212

Upper quartile Q3 is that value matching to a cumulative frequence

of 3N/4.

30

Upper quartile category is the category incorporating the upper quartile

and is hence the lowest category whose cumulative frequence exceeds 3N/4

i.e category 5

Fc3

Cum frequence of category below upper quartile category

28

fq3

Frequency of lower quartile category

8

L3

Upper category boundary of category instantly below

upper category

3.28

degree Celsiuss

Class interval

0.04

Upper quartile Q3

3.29

Measures of spread

Mean – 2s

3.141473747

Interquartile scope

0.078

Mean -s

3.196236874

( Range within which the in-between 50 % of readings prevarications )

Mean

3.251

Mean + s

3.305763126

Semi-interquartile scope

0.039

Mean +2s

3.360526253

average + 3s

3.415289379

Q: 8

A new intercrossed apple is developed with the purpose of of bring forthing big apples than a peculiar old loanblend. In a sample of 1000 apples the distribution of weights of the apples was as follows.

Weight ( g )

Frequency

0-50

20

50-100

42

100-150

106

150-200

227

200-250

205

250-300

241

300-350

106

350-400

53

Apples can merely be sold to a peculiar retail mercantile establishment with a weight greater than 218g. What propration of the new loanblend would be rejected by this retail mercantile establishment?

How many gms, above this weight of 218g is the average weight of apples?

What is the difference in weights in units of the standard divergence of apple weights?

Sol:

Weight ( g )

Frequency

californium

Class mid-point ( x )

fixi

0-50

20

20

25

500

50-100

42

62

75

3150

100-150

106

168

125

13250

150-200

227

395

175

39725

200-250

205

600

225

46125

250-300

241

841

275

66275

300-350

106

947

325

34450

350-400

53

1000

375

19875

I?fixi =

223350

I?fi =

1000

Mean =

223.35

( I ) From the tabular array above we can deduce, that figure of apples weighing less than 150g are 168.

Therefore, proportion of apple that can non be sold to supermarket are

168/1000 = 0.168 = 16.8 %

( two ) Mean weight of apples are 223.35g

( three ) Standard divergence = 78.9g

Difference between 150g and average weight is 223.35 – 150 = 73.35g i.e 73.35/78.9 g = 0.93 of standard divergence.

Q:9

The pH degree in a river is monitored five times a twenty-four hours. The undermentioned 20 sets of five readings were obtained on 20 back-to-back yearss.

6.7

6.3

6.2

6.1

7.0

7.0

7.1

6.9

6.8

6.2

6.3

6.4

6.3

6.2

7.1

7.0

7.0

6.8

6.9

6.3

6.6

6.2

6.1

6.4

7.1

6.8

7.3

6.8

7.0

6.4

6.9

6.1

6.0

6.4

6.8

6.5

7.1

6.3

7.1

6.4

6.6

6.6

6.5

6.8

6.7

6.9

7.0

6.4

6.3

6.4

6.3

6.4

6.5

6.4

6.3

6.1

5.9

5.9

5.8

6.1

6.3

6.3

6.5

6.5

6.2

7.0

5.9

5.9

5.9

6.1

6.2

6.6

6.7

6.6

6.2

6.9

6.0

6.1

5.9

6.0

6.1

6.6

6.8

6.3

6.6

6.6

6.0

6.3

5.8

6.2

6.4

6.3

6.3

6.1

7.0

6.4

5.9

6.1

6.1

6.3

Using your reckoner obtain the mean and standard divergence of these 100 readings.

Obtain the average pH degrees for each of the 20 back-to-back yearss and secret plan so on a chart demoing pH as a map of twenty-four hours.

A warning should be flagged if a average pH degree on any given twenty-four hours lies outside the scope mA± 1.96m where m is the mean of the 100 readings and sm = s/a?s5 where s is the standard divergence of the 100 readings. Identity those yearss on which a warning would be flagged.

Sol:

Class interval

ten

Frequency

ten

fx

( xi – ten ) * ( xi-x )

fi ( eleven – ten ) * ( xi-x )

5.5-6.0

5.75

9

-1

-9

0.53

4.77

6.0-6.5

6.25

49

0

0

0.052

2.54

6.5-7.0

6.75

28

1

28

0.072

2.01

7.0-7.5

7.25

14

2

28

0.77

10.1

a?‘f = 100

a?‘fx = 47.0

a?‘fi ( eleven – ten ) * ( xi-x ) = 19.42

Mean = A + a?‘fx * I

a?‘f

= 6.25 + 47/100 *0.5

= 6.25 + 0.23

Mean = 6.48

Standard divergence = a?s ( I? ) * ( I? )

( I? ) * ( I? ) = 1/N *a?‘fi ( xi – ten ) * ( xi-x )

= 1/100 * ( 19.42 )

= 0.19

Standard divergence = a?s ( 0.19 )

= 0.43.

Mean of 20 back-to-back yearss are:

X1 = amount of first two rows/ no of observation

= 6.7+6.3+6.2+6.1+7.0+7.0+7.1+6.9+6.8+6.2+6.3+6.4+6.3+6.2+7.1+7.1+7.0+7.0+6.8+6.9+6.3/20

= 132.60/20

= 6.63

X2 = amount of 2nd and 3rd rows/ no of observation

=133.1/20

= 6.65

X3 = amount of 3rd and 4th rows/ no of observation

=132.30/20

=6.61

X4 = amount of 4th and 5th rows/ no of observation

= 131.8/20

=6.59

X5 = amount of 5th and 6th rows/ no of observation

=127.9/20

=6.39

X6= amount of 6th and 7th rows/ no of observation

=124.30/20

=6.21

X7 = amount of 7th and 8th rows/ no of observation

=125.30/20

=6.29

X8 = amount of 8th and 9th rows/ no of observation

=126.5/20

=6.32

X9= amount of 9th and 10th rows/ no of observation

=126.2/20

=6.31

Scope

= m-1.96Sm a‰¤ Xa‰¤ mA±1.96Sm

= 6.48- ( 1.96* ( 0.43 ) / a?s5 ) a‰¤ Ten a‰¤6.48+ ( 1.96* ( 0.43 ) / a?s5 )

=6.48-0.376a‰¤Xa‰¤6.48+0.376

= [ 6.10a‰¤Xa‰¤6.85 ]

The yearss outside this interval on which a warning would be flagged.

Q: 10

Standard sum of five different insect powders are found to kill 30 % , 45 % , 65 % , 85 % and 90 % severally of a fixed size of insects population. If one of the insect powders is chosen at random, what is the chance that it will kill:

At least 65 % of the insect population?

At most 45 % of the insect population?

Between 40 % and 80 % of the insect population?

If one of the insect powders is chosen at random, what is the chance that any one of the brace chosen will kill at least 85 % of the insect population?

Sol:

Let A, B, C, D and E denotes the events for different insect powders so the chance of choice of any one insectide is 1/5.

At least 65 % of insect population is

P ( A ) + P ( B ) + P ( C )

= 1/5 + 1/5 + 1/5

=3/5

At most 45 % of insects population is

P ( A ) + P ( B )

=1/5 + 1/5

=2/5

Between 40 % and 80 % of insect population.

P ( B ) + P ( C )

=1/5 + 1/5

=2/5

If two of the insect powders are chosen at random so the chance that any one of the brace chosen will kill at least 85 % of the insect population is

= 1 – Phosphorus ( Aa‹? B )

= 1 – [ P ( A ) + P ( B ) ]

= 1 – [ 1/5 + 1/5 ]

= 1 – 2/5

= 3/5

= 0.6

Q: 11

As portion of the safety process at an oil refinery a fractionating column is monitored by three independent warning systems, A, B and C. The chances that on any given twenty-four hours the warning systems will neglect are 0.1, 0.01 and 0.05 for A, B and C severally. If P ( N ) denotes the chance that n warning systems fail on a given twenty-four hours, obtain P ( 0 ) , P ( 1 ) , P ( 2 ) AND P ( 3 ) . Hence obtain the expected figure of warning systems which fail on any given twenty-four hours.

Sol:

Let D = warning system will neglect

E1 = warning system A is selected.

E2 = warning system B is selected.

E3 = warning system C is selected.

P ( D/E1 ) = 0.1

P ( D/E2 ) = 0.01

P ( D/E3 ) = 0.05

P ( D/E1 ) = 0.9

P ( D/E2 ) = 0.99

P ( D/E3 ) = 0.95

P ( N ) = chance that n warning system will neglect.

P ( 0 ) = chance of no warning system will neglect.

= P ( D/E1 ) * P ( D/E2 ) * P ( D/E3 )

= 0.9*0.99*0.95

= 0.84645

P ( 1 ) = chance of 1 warning system will neglect.

= P ( D/E1 ) * P ( D/E2 ) * P ( D/E3 ) + P ( D/E1 ) * P ( D/E2 ) * P ( D/E3 ) + P ( D/E1 ) * P ( D/E2 ) * P ( D/E3 )

= 0.9*0.01*0.95 + 0.1*0.99*0.95 + 0.9*0.99*0.05

=0.14715

P ( 2 ) = chance of 2 warning system will neglect.

= P ( D/E1 ) * P ( D/E2 ) * P ( D/E3 ) + P ( D/E1 ) * P ( D/E2 ) * P ( D/E3 ) + P ( D/E1 ) * P ( D/E2 ) * P ( D/E3 )

= 0.9*0.01*0.05 + 0.1*0.99*0.05 + 0.1*0.01*0.95

= 0.00635

P ( 3 ) = chance of 3 warning system will neglect.

= P ( D/E1 ) * P ( D/E2 ) * P ( D/E3 )

= 0.1*0.01*0.05

= 0.00005

So, X P ( X ) XP ( X )

0 0.84695 0

1 0.14715 0.14715

2 0.00635 0.1270

3 0.00005 0.00015

Expectation in E ( x ) = P ( 1 ) +P ( 2 ) +P ( 3 )

= 0.1600

Q: 12

A peculiar scientific monitoring device is powered by three batteries of the same type. The device is designed so that it will go on to work provided at least two of the three batteries function usually. The three batteries are renewed at a peculiar clip. The chance that a battery will neglect within the first 50 hours of operation is 0.25. The chance that a battery within the first 100 hours of operation is 0.70. Determine the chance that:

The device fails due to battery failure within the first 50 hours of continuance.

The batteries allow the equipment to run for longer than 100 hour.

The device fails due to battery failure within 50-100hr of continuance.

Sol:

Fail within first 50 hours = 0.25

Does non neglect within first 50 hours = 0.75.

The device fails due to battery failure within the first 50 hours of continuance.

P ( The device fails due to battery failure within the first 50 hours of continuance )

= ( 0.25*0.25*0.25 ) + 3 { ( 0.25*0.25 ) * ( 0.75 ) }

= { 0.015625+3 ( 0.0625*0.75 ) }

=0.015625+ 3 ( 0.046875 )

=0.015625+0.140625

=0.15625

The batteries allow the equipment to run for longer than 100 hour.

Fail within first 100 hours = 0.7

Does non neglect within 100 hours = 0.3

P ( The batteries allow the equipment to run for longer than 100 hour )

= ( 0.3*0.3*0.3 ) + 3 { ( 0.3 ) * ( 0.3 ) * ( 0.7 ) }

= { 0.027+ 3 ( 0.09*0.7 ) }

=0.027 + 3 ( 0.063 )

=0.027+0.189

=0.216

The device fails due to battery failure within 50-100hr of continuance.

P ( The device fails due to battery failure within 50-100hr of continuance )

= ( 0.7*0.7*0.7 ) + 3 { ( 0.7 ) * ( 0.7 ) * ( 0.3 ) }

= { 0.343+3 ( 0.49*0.3 ) }

=0.343+ 3 ( 0.147 )

=0.343+0.441

=0.784

SAS ANSWERS

Answer of Q.13

Screen Shot of the SAS Program before executing

Screen Shot of the SAS Program After Execution

Number of Observation: 15

Number of Variable: 3

After uncluttering log by utilizing ctrl + E COMMAND

Answer of Q. 14

4 variables

10 observations

Answers of Q.15

proc gchart data=dfw ;

rubric ‘Total Pounds of Mail by Date ‘ ;

vbar day of the month / sumvar=mail ;

tally ;

proc means data=dfw average min soap maxdec=2 ;

rubric ‘Average, Minimum & A ; Maximum Pounds of Mail ‘ ;

var mail ;

tally ;

quit ;

Answers of Q.16

B )

How many datasets in the library = 1

How many have been have been assigned in this SAS session? = 5

Answers of Q.17.

Active information libraries are IA, Maps, Sashelp, Sasuser and work.

To publish contents we used

proc contents data=ia._all_ nods ;

tally ;

Answers of Q.18.

proc contents data=ia.payroll2 ;

tally ;

The figure of observation is non displayed because it is a information position

There are 6 variables in position.

There are 148 observations of the position.

Answers of Q.19.

informations oct_dates ;

set ia.october ;

date=mdy ( month, twenty-four hours, twelvemonth ) ;

days_gone_by=today ( ) -date ;

tally ;

proc print data=oct_dates ;

tally ;

Answers of Q.20.

informations oct_seats ;

set oct_dates ;

empty_seats=capacity- ( boarded+nonrev ) ;

percent_full= 100- ( empty_seats/capacity*100 ) ;

tally ;

proc print data=oct_seats ;

tally ;

Answers of Q.21.

informations fillips ;

set ia.fltattnd ;

BonusAmt=8/100*salary ;

AnnivMo=month ( HireDate ) ;

maintain empid BonusAmt AnnivMo ;

tally ;

proc print data=bonus ;

tally ;

Answer of Q.22.

informations lowrev ;

set ia.lonpar ;

WHERE REVENUE lt 180000 ;

maintain DATE DEST DELAY REVENUE ;

tally ;

proc print data=lowrev ;

tally ;

Answers of Q.23.

informations mechs ;

set ia.payroll ;

length Manager $ 15 ;

if upcase ( jobcode ) =’ME1 ‘ so

do ;

Manager=’Miss Pearce ‘ ;

Raise= 5/100*salary ;

PenRise=3.5/100*salary ;

Total=Raise+PenRise ;

terminal ;

else if upcase ( jobcode ) =’ME2 ‘ so

do ;

Manager=’Mr Holt ‘ ;

Raise= 7.5/100*salary ;

PenRise=5/100*salary ;

Total=Raise+PenRise ;

terminal ;

else if upcase ( jobcode ) =’ME3 ‘ so

do ;

Manager=’Mr Fitz-William ‘ ;

Raise= 10/100*salary ;

PenRise=8/100*salary ;

Total=Raise+PenRise ;

terminal ;

maintain Jobcode Salary Manager Raise PenRise Total ;

tally ;

proc print data=mechs ;

tally ;

Q:24

A brace of just die is thrown. Find the chance P that the amount is 10 or greater if ( I ) a 5 appears on the first dice, ( two ) a 5 appears on at least one of the dies.

Sol:

Sample Space = { ( 1,1 ) , ( 1,2 ) , ( 1,3 ) , ( 1,4 ) , ( 1,5 ) , ( 1,6 ) , ( 2,1 ) , ( 2,2 ) , ( 2,3 ) , ( 2,4 ) , ( 2,5 ) , ( 2,6 ) , ( 3,1 ) , ( 3,2 ) , ( 3,3 ) , ( 3,4 ) , ( 3,5 ) , ( 3,6 ) , ( 4,1 ) , ( 4,2 ) , ( 4,3 ) , ( 4,4 ) , ( 4,5 ) , ( 4,6 ) , ( 5,1 ) , ( 5,2 ) , ( 5,3 ) , ( 5,4 ) , ( 5,5 ) , ( 5,6 ) , ( 6,1 ) , ( 6,2 ) , ( 6,3 ) , ( 6,4 ) , ( 6,5 ) , ( 6,6 ) }

N ( E ) = 2

So decreased sample infinite = 6

P ( E ) = 2/6

= 1/3

N ( E ) = 2

So decreased sample infinite = 11

P ( E ) = 2/11

= 2/11

Q: 25

The tabular array shows the chances of a difficult phonograph record clang utilizing a Brand X thrust within one twelvemonth.

Table Probabilities of difficult phonograph record clangs

Brand X

Not Brand X

Crash C

0.6

0.1

0.7

No clang C ‘

0.2

0.1

0.3

0.8

0.2

1.0

Using the information in the tabular array, province:

The chance of a clang for both Brand X and all other types of phonograph record.

Sol:

P ( clang for both trade name X And Others )

= 0.7/1.0

= 7/10

The chance of no clang.

Sol:

P ( no clang )

=0.3/1.0

=3/10

The chance of utilizing a Brand X phonograph record.

Sol:

P ( utilizing trade name Ten )

=0.8/1.0

=8/10

The chance of non utilizing Brand X.

P ( non utilizing trade name Ten )

=0.2/1.0

=2/10

=1/5

The chance of a clang and utilizing Brand X.

Sol:

P ( a clang and utilizing Brand X )

= ( ( 0.7/1.0 ) * ( 0.8/1.0 ) ) – 0.6/1.0

= ( ( 7/10 ) * ( 8/10 ) ) – ( 6/10 )

=

The chance of a clang, given that Brand X is used.

= 0.6/0.7

The chance of a clang, given that Brand X is non used.

= 0.1/0.7

Find the chance of the phonograph record being trade name X given that it crashed.

P ( E1 ) = 0.6

P ( A/E1 ) = 0.7

P ( E1/A ) = P ( E1 ) / P ( A/E1 )

= 0.6/0.7

=6/7

Find the chance of the phonograph record being trade name X given that it did non clang.

P ( E2 ) =0.2

P ( A/E2 ) = 0.8

P ( E2/A ) = P ( E2 ) / P ( A/E2 )

= 0.2/0.8

=2/8

=1/4

Find the chance of the phonograph record non being trade name X given that it crashed.

P ( E3 ) =0.1

P ( A/E3 ) =0.7

P ( E3/A ) =P ( E3 ) /P ( A/E3 )

=0.1/0.7

=1/7

Find the chance of the phonograph record non being trade name X given that it did non clang.

P ( E4 ) =0.1

P ( A/E4 ) =0.3

P ( E4/A ) =P ( E4 ) /P ( A/E4 )

= 0.1/0.3

=1/3

Q: 26

Show a determination tree from some sphere of involvement to you, depicting how it was constructed and its usage for determination devising and categorization.

Sol:

In informations excavation, a determination tree is a prognostic theoretical account which can be used to stand for both classifiers and arrested development theoretical accounts. In operations research, on the other manus, determination trees refer to a hierarchal theoretical account of determinations and their effects. The determination shaper employs determination trees to place the scheme most likely to make her end.

When a determination tree is used for categorization undertakings, it is more suitably referred to as a categorization tree. When it is used for arrested development undertakings, it is called arrested development tree.

In this, we concentrate chiefly on categorization trees. Categorization trees are used to sort an object or an case ( such as insurant ) to a predefined set of categories ( such as risky/non-risky ) based on their properties values ( such as age or gender ) .

Categorization trees are often used in applied Fieldss such as finance, selling, technology and medical specialty. The categorization tree is utile as an exploratory technique. However it does non try to replace bing traditional statistical methods and there are many other techniques that can be used classify or predict the rank of cases to a predefined set of categories, such as unreal nervous webs or support vector machines.

Figure 1.1 presents a typical determination tree classifier. This determination tree is used to ease the underwriting procedure of mortgage applications of a certain bank. As portion of this procedure the applier fills in an application signifier that include the undermentioned informations: figure of dependants ( DEPEND ) , loan-to-value ratio ( LTV ) , matrimonial position ( MARST ) , payment-to-income ratio ( PAYINC ) , involvement rate ( RATE ) , old ages at current reference ( YRSADD ) , and old ages at current occupation ( YRSJOB ) .

Based on the above information, the investment banker will make up one’s mind if the application should be approved for a mortgage. More specifically, this determination tree classifies mortgage applications into one of the following two categories:

Approved ( denoted as “ A ” ) The application should be approved.

Denied ( denoted as “ D ” ) The application should be denied.

Manual underwriting ( denoted as “ M ” ) An investment banker should manually analyze the application and make up one’s mind if it should be approved ( in some instances after bespeaking extra information from the applier ) .

The determination tree is based on the Fieldss that appear in the mortgage applications signifiers.

The above illustration illustrates how a determination tree can be used to stand for a lassification theoretical account. In fact it can be seen as an expert system, which partly automates the underwriting procedure and which was built manually by a cognition applied scientist after interrogating an experient investment banker in the company. This kind of adept question is called cognition evocation viz. obtaining cognition from a human expert ( or human experts ) for usage by an intelligent system. Knowledge evocation is normally hard because it is non easy to happen an available expert who is able, has the clip and is willing to supply the cognition applied scientist with the information he needs to make a dependable expert system. In fact, the trouble inherent in the procedure is one of the chief grounds why companies avoid intelligent systems. This phenomenon is known as the cognition evocation constriction.

A determination tree can be besides used to analyse the payment moralss of clients who received a mortgage. In this instance there are two categories:

YRSJOB

LTV

& lt ; 2 a‰?2

MARST

& lt ; 75 % a‰?75 % DIVORCED MARRIED

Meter

DEPEND

YRSADD

Single

A

Calciferol

& lt ; 1.5 a‰?1.5 & gt ; 0 =0

A

Meter

A

Calciferol

Fig 1.1 subventioning determination tree

Paid ( denoted as “ P ” ) – the receiver has to the full paid off his or her mortgage.

Not Paid ( denoted as “ N ” ) – the receiver has non to the full paid off his or her mortgage.

This new determination tree can be used to better the underwriting determination theoretical account presented in Figure 9.1. It shows that there are comparatively many clients pass the underwriting procedure but that they have non yet to the full paid back the loan. Note that every bit opposed to the determination tree presented in Figure, this determination tree is constructed harmonizing to informations that was accumulated in the database. Therefore, there is no demand to manually arouse cognition. In fact the tree can be grown automatically. Such a sort of cognition acquisition is referred to as cognition find from databases.

The usage of a determination tree is a really popular technique in informations mining.In the sentiment of many research workers, determination trees are popular due to their simpleness and transparence. Decision trees are self-explanatory ; there is no demand to be a information excavation expert in order to follow a certain determination tree. Categorization trees are normally represented diagrammatically as hierarchal constructions, doing them easier to construe than other techniques. If the categorization tree becomes complicated ( i.e. has many nodes ) so its straightforward, graphical representation become useless. For complex trees, other graphical processs should be developed to simplify reading.

YRSJOB

& lt ; 3 a‰? 3.5

I.RATE

PAYING

& lt ; 20 % & gt ; -20 % & lt ; 3 % a‰? 6 %

Nitrogen

Phosphorus

DEPEND

Nitrogen

Phosphorus

& gt ; 0 =0

Nitrogen

Phosphorus

Actual behavior of client

Features of Decision Trees

A determination tree is a classifier expressed as a recursive divider of the case infinite. The determination tree consists of nodes that form a frozen tree, intending it is a directed tree with a node called a “ root ” that has no entrance borders. All other nodes have precisely one entrance border. A node with outgoing borders is referred to as an “ internal ” or “ trial ” node. All other nodes are called “ foliages ” ( besides known as “ terminal ” or “ determination ” nodes ) .

In the determination tree, each internal node splits the case infinite into two ormore sub-spaces harmonizing to a certain distinct map of the input property values. In the simplest and most frequent instance, each trial considers a individual property, such that the case infinite is partitioned harmonizing to the properties value. In the instance of numeral properties, the status refers to a scope.

Each foliage is assigned to one category stand foring the most appropriate mark value. Alternatively, the foliage may keep a chance vector ( affinity vector ) bespeaking the chance of the mark property holding a certain value. Figure describes another illustration of a determination tree that grounds whether or non a possible client will react to a direct mailing.

Internal nodes are represented as circles, whereas foliages are denoted as trigons. Two or more subdivisions may turn from each internal node ( i.e. non a foliage ) .Each node corresponds with a certain characteristic and the subdivisions correspond with a scope of values. These scopes of values must give a divider of the set of values of the given feature.

Cases are classified by voyaging them from the root of the tree down to a foliage, harmonizing to the result of the trials along the way.

Specifically, we start with a root of a tree ; we consider the feature that corresponds to a root ; and we define to which ramify the ascertained value of the given characteristic corresponds. Then we consider the node in which the given subdivision appears. We repeat the same operations for this node etc. , until we reach a foliage. Note that this determination tree incorporates both nominal and numeral properties. Given this classifier, the analyst can foretell the response of a possible client ( by screening it down the tree ) , and understand the behavioral features of the full possible client population sing direct mailing. Each node is labelled with the property it tests, and its subdivisions are labeled with its matching values.

In instance of numeral properties, determination trees can be geometrically interpreted as a aggregation of hyperplanes, each orthogonal to one of the axes.