Predict Precipitation Using Gain Ratio Biology Essay
The population of the universe has been increasing well. The thickly settled states like India, earnestly dawdling behind to supply the basic needs to the people. Food is one of the basic needs that any state has to carry through. Agriculture is one of the major sectors on which one tierce of Indian population depends on.
The irrigation based states like India where the H2O has been the basic resource that forges the workss ‘ growing. The chief resource for the irrigation is rainfall which is scientifically a liquid signifier of precipitation. The atmospheric rain cloud clouds are responsible for this precipitation. Prediction of the precipitation is necessary, as it has to be considered during the fiscal planning of a state. The meteoric sections of every state are really acute in entering the datasets of precipitation which are immense in content. Hence, information excavation is found to be an disposed tool which would pull out the relation between the datasets and their properties.
A Supervised Learning in Quest is one such informations excavation algorithm which is finally a determination tree used to foretell the precipitation based on the historical information. The Supervised Learning in Quest determination tree utilizing addition ratio is a statistical analysis for set uping the relation between property set and precipitation which furnishes the anticipation with an truth of 77.78 % .Keywords- Data Mining ; Decision Tree ; Meteorology ; Precipitation ; Prediction ; Rainfall ; SLIQ ;
The growing of population is one of the major factors that affect adversely the growing of economic system of a state. It is indispensable to guarantee adequateness of substructure for supplying the basic demands of the turning population. The agricultural sector provides the most of the natural stuffs required for supplying merchandises to run into the basic demands.
It is obvious that the agricultural productiveness depends on H2O handiness wherein precipitation is the primary beginning of H2O. The precipitation is due to the thick beds of the clouds in the ambiance, which would hold attained the runing point [ 26 ] . The anticipation of the precipitation forms a footing for be aftering economic system with improved truth. Hence, there is a demand to suggest the theoretical accounts for bettering truth in the precipitation anticipation.A mathematical theoretical account is an abstract representation of a real-life job state of affairs. Many mathematical theoretical accounts which represent the real-life job state of affairss are complex.
Hence, work outing such complex theoretical accounts involve in executing a big figure of arithmetic and logical operations on related informations. The innovation of the computing machine improved truth and minimized the clip in executing those operations. The anticipation of precipitation is a complex and unsure phenomenon that consequences in the complex mathematical theoretical accounts.
The most of the anticipation theoretical accounts employ the immense historical informations. Here, informations excavation can be used for foretelling the precipitation more accurately.Data excavation tools can be employed in the Fieldss of anticipation constitute unreal nervous webs, familial algorithms, ruled based initiation, nearest neighbour method, memory based logical thinking, additive discriminate analysis and determination trees. The success rate for the anticipation of the precipitation by using different informations excavation tools reported in the literature is 43.6 % [ 29 ] . Recently, Prasad et. Als proposed to use Supervised Learning In Quest ( SLIQ ) determination tree utilizing Gini index for the anticipation of the precipitation which resulted in an truth of 72.3 % [ 2 ] .
This paper proposes to use SLIQ determination tree utilizing addition ratio that improves the truth from 72.3 % to 77.78 % .The remainder of the paper is organized as follows: Section II describes relevant work. Section III provides the information about Decision Trees. In subdivision IV, a brief description about the SLIQ Decision tree algorithm is discussed. Section V describes the regulations for determination tree.
Section VI describes the experimental consequences. In subdivision VII decisions are presented and eventually in subdivision VIII, the hereafter sweetenings are illustrated.
Research is a uninterrupted procedure.
If anyone imagines that the research on any field is completed and so he/she has to paraphrase his/her word of sentence. The research continues beyond this point. In the literature, there are many research findings which are reported for foretelling the precipitation with accurate possible rate. Some of them used the traditional methods of the unreal nervous webs for the anticipation while other methods include the recent developments like Image Processing, Linear Regression and Fuzzy logic and so on.Frank Silvio Marzano, Giancarlo Rivolta, Erika Coppola, Barbara Tomassetti and Marco Verdecchia used a to the full nervous web attack to the rainfall field Nowcasting from infrared and micro-cook passive-sensor imagination aboard [ 6 ] .
K.Richards and G.D. Sullivan, combined the characteristics of Bayesian strategy for texture analysis of the cloud images which are taken from the land [ 7 ] . C. Jareanpon, W.
Pensuwon, R.J. Frank and N. Davey formed radial footing map nervous web with a specially designed familial algorithm [ 8 ] . K. Ochiai, H. Suzuki, S. Suzuki, N.
Sonehara and Y. Tokunaga stated that the computational clip for larning with an acceleration algorithm can be reduced about 10 per centum by presenting a pruning algorithm [ 9 ] . I.F.
Grimes, E. Coppola, M. Verdecchia and G. Visconti presented an attack to cold cloud continuance imagination derived from meteosat thermal infrared imagination is used in concurrence with numerical conditions theoretical account analysis informations as an input to an ANN [ 10 ] . Thiago N. de Castro, Francisco Souza, Jose M.B.
Alves, Ricardo S.T. Ponss, Mosefran B.M. Firmino and Thiago M.
de Pereria forecasted seasonal Rainfall utilizing Neo-Fuzzy nerve cell theoretical account [ 11 ] . Tuan Zea Tan, Gary Kee Khoon Lee, Shie-Yui Liong, Tian Kuay Lim, Jiawei Chu and Terence Hung IEEE treated the series of rainfall as a uninterrupted clip series [ 12 ] . Jiansheng Wu Integrated additive arrested development with ANN. The additive arrested development infusions linear features of the rainfall [ 13 ] . Hui Qi, Ming Zhang and Roderick A. Scofield developed a Multi- Polynomial High Order Neural Network ( M-PHONN ) [ 14 ] . Wint Thida Zaw and Thinn Thu Naing stated that the Multi variables multinomial arrested development ( MPR ) is one of the statistical arrested development methods used to depict the complex nonlinear input and end product relationships [ 15 ] .
C. Kidd and V. Levizzani stated that the rainfall is spatially and temporally extremely variable [ 16 ] . Sanjay D.
Sawaitul, Prof. K.P. Wagh and Dr. P.N. Chatur used the parametric quantities of the conditions like air current way, wind velocity, humidness, rainfall and temperature and so on for the categorization and anticipation of the hereafter conditions by utilizing the back extension algorithm [ 17 ] . Soroosh Sorooshian, Kuo-lin Hsu, Bisher Imam and Yang Hong made planetary precipitation appraisal from satellite image by utilizing unreal nervous webs [ 18 ] .
Kesheng Lu and Lingzhi Wang used a bagging sampling technique is used to bring forth the preparation sets for combination theoretical account based on support vector machine for the rainfall anticipation [ 19 ] . Grant W. Petty and Witold F.
Krajewski discussed in their research methods based on infrared, seeable and inactive microwave radiation measurings [ 20 ] .
Decision tree is an advanced cognition find procedure with minimal clip complexness and has an easiness in the execution. It establishes relationship between the assorted datasets by detecting the concealed forms among the datasets which are immense and complex [ 3, 4 ] , [ 26, 27 ] .
As it is known fact that, “ The lone manner to acquire more truth is to make more research ” , which indicates that more and more research has to be done to derive more accurate consequences. However the research should be carried out by maintaining in head the cost factor. Hence, the scientists have been bettering the determination tree algorithms. The usage of determination trees have been raised from normal statistical analysis to an effectual tool in information excavation, text excavation, information retrieval and pattern acknowledgment and so on.The properties referred in Table I are humidity, temperature, force per unit area, wind velocity and dew point. The sum of H2O vapour in the air is referred as humidness is unseeable in nature. The temperature is the grade of hot or coldness of a organic structure or environment.
The temperature is measured in grade centigrade ( oC ) . Atmospheric force per unit area is the force per unit country exerted against the land surface by the weight of air above the land surface and it is measured in bars. The speed at which air current is fluxing is referred as the air current velocity which it is measured in metres per second by an wind gauge. Pressure gradient, Rossby waves, jet watercourses and local conditions conditions chiefly affect the air current velocity which leads to the devastation. Dew point is the temperature at which the air nowadays in the ambiance can no longer keep all of the H2O vapour which is assorted with it and some of the H2O vapour must distill into liquid H2O.As it is an established fact that the precipitation by and large depends on the assorted properties like humidness, temperature, force per unit area and air current velocity and so on. Let us see a dataset with the similar properties viz. humidness, temperature, force per unit area, wind Speed and dew point which influence the rainfall and category label as given in Table I.
A determination tree is constructed as shown in Fig.1, for the informations given in Table 1.The Table 1 shows 30 yearss informations of humidness, temperature, force per unit area, wind velocity and dew point along with the category label. This is a portion of informations from Indian Meteorological Department ( IMD ) for 15 old ages.
The determination tree is an upside-down tree with root node stand foring the full dataset which is partitioned into assorted subdivisions. The foliages of the subdivisions represent category label as shown in Fig.1.
( H )
( T )
( P )
( W )
( D )
1972410051421Rain2852610041621No Rain3912710041421Rain4822710061620Rain5812610071819No Rain6952610071820Rain7952610071620Rain8932610081821Rain9872410051321Rain10882410051121Rain11802610051421Rain12892610051421Rain13862710061421No Rain14862810071022Rain15942710061421Rain16882610041321No Rain17922710051321Rain18862710071121Rain19822710061121Rain20762710071419No Rain21792710081120No Rain22752710081320No Rain23842710071320No Rain24882610061121Rain25862510051619Rain26782810061321No Rain27792710081319No Rain2880281008820No Rain2984291009621No Rain3076271009622RainTABLE II Notations Used in Showing Sliq Alogrithm
CalciferolSet of developing tuples with associated category labelsDisk jockeyThe set of informations tuples in D fulfilling result J|D|The figure preparation tuples in DCThe category labelEntropy ( D )The information needed to sort a tuple in DSplitinfo ( V )Standardization to information addition.Split pointCenter of Vi and Vi+1VoltAn attribute listSixSet of values in property VVi+1Changed Class value in property VPiThe chance that a tuple in D belongs to category CiDiValuess which are greater than or equal to the Split pointDisk jockeyValuess which are less than the Split pointThe standard for partitioning dataset at a degree is explained in the following subdivision. Decision trees can be used for dataset whether it is uninterrupted or discontinuous. The class of dataset is taken into history which is called as the category label.
One of the properties becomes the root node for the determination tree whereas category label is the leaf node as shown in Fig.1. The cognition based excavation is non so effectual in set uping temporal attribute relationships.
SLIQ Decision Tree Algorithm
The determination tree classifier, SLIQ [ 1 ] can manage numeral every bit good as categorical properties. It employs a pre-sorting technique for cut downing the cost of measuring numeral properties during the tree-growth stage. Further, the SLIQ utilizing the Minimum Description Length ( MDL ) rule employs a tree pruning algorithm.
It is reported that the SLIQ algorithm is cheap in ensuing compact and accurate trees [ 1 ] . The SLIQ ensures scalability in sorting big datasets dwelling of a big figure of categories and properties.In the building of the determination tree addition ratio is evaluated at every consecutive center of the property values.
However, the efficiency of the SLIQ determination tree algorithm can be improved by measuring addition ratio merely at the centers of the properties where the category information alterations. The algorithm for the building of SLIQ determination tree for the anticipation of precipitation is presented below. The notations used are given in Table II.
Overview of SLIQ Decision tree growing and split points
Read dataset into the root node of the SLIQ determination treeGenerate an attribute list for each property of the datasetSort the property lists on property value in non-decreasing orderCalculate the information for the root node( 1 )Calculate the Info of attribute list ‘V ‘( 2 )Calculate the Gain for each property listGain ( V ) = Entropy ( D ) – Info ( V ) ( 3 )Compute split information for a set of values of property ‘Di ‘ and ‘Dj ‘Splitinfo ( V ) = ( 4 )Determine the Gain Ratio for the property values in attribute list ‘V ‘Gain Ratio ( V ) = Gain ( V ) / Splitinfo ( V ) ( 5 )Determine maximal addition ratio from among the addition ratios which become the footing for the best split as shown in Table III.Best Split =Max. Gain Ratio value of property ( 6 )Partition the root node into foliage nodes based on the best split pointRepeat the stairss 5 through 10 reading the root node as leaf node until all leaf nodes contain the same category labels.The primary metric for measuring the anticipation of precipitation is accuracy – the truth of a forecaster refers to how good a given anticipation can give the value of the predicted property for new or antecedently unobserved informations.
Accuracy = Correct anticipations / Entire anticipations ( 7 )The ideal end is to bring forth compact, accurate trees in a short clip with scalability – the SLIQ determination tree algorithm used for the anticipation of precipitation takes N input properties and N category labels as an input and produces the determination tree along with the regulations.The fake tree shown in Fig. 1 consists of 13 nodes and 7 out of them are picturing rain and the staying 6 are picturing no rain. The determination tree shown in Fig. 1 NR indicates no rain and R indicate rain.Fig. 1. Derive Ratio based Decision TreeTABLE III.
Gain Ratio Based Split Value for assorted properties
Rules for Decision Tree
Once the determination tree is constructed, there is a possibility that the tree is really big to understand. Hence, to simplify the apprehension of the big determination tree the regulations are generated.
Rule 1: If [ ( humidness & lt ; 86.
0 ) and ( force per unit area & lt ; 1007.0 ) and ( temperature & lt ; 27.5 ) and ( humidness & lt ; 83.0 ) ] Then ( Prediction = Rain )Rule 2: If [ ( humidness & lt ; 86.0 ) and ( force per unit area & lt ; 1007.0 ) and ( temperature & lt ; 27.
5 ) and ( humidness & gt ; = 83.0 ) ] Then ( Prediction = NoRain )Rule 3: If [ ( humidness & lt ; 86.0 ) and ( force per unit area & lt ; 1007.
0 ) and ( temperature & gt ; = 27.5 ) ] Then ( Prediction = NoRain )Rule 4: If [ ( humidness & lt ; 86.0 ) and ( force per unit area & gt ; =1007.
0 ) and ( dew-point & lt ; 20.5 ) ] Then ( Prediction=NoRain )Rule 5: If [ ( humidness & lt ; 86.0 ) and ( force per unit area & gt ; = 1007.0 ) and ( dew-point & gt ; = 20.5 ) and ( temperature & lt ; 27.5 ) ] Then ( Prediction = Rain )Rule 6: If [ ( humidness & lt ; 86.0 ) and ( force per unit area & gt ; = 1007.0 ) and ( dew-point & gt ; = 20.
5 ) and ( temperature & gt ; = 27.5 ) ] Then ( Prediction = NoRain )Rule 7: If [ ( humidness & gt ; = 86.0 ) and ( force per unit area & lt ; 1007.0 ) and ( temperature & lt ; 25.
5 ) ] Then ( Prediction = Rain )Rule 8: If [ ( humidness & gt ; = 86.0 ) and ( force per unit area & lt ; 1007.0 ) and ( temperature & gt ; = 25.
5 ) and ( humidness & lt ; 88.0 ) ] Then ( Prediction = NoRain )Rule 9: If [ ( humidness & gt ; = 86.0 ) and ( force per unit area & lt ; 1007.0 ) and ( temperature & gt ; = 25.5 ) and ( humidness & gt ; = 88.
0 ) and ( wind-speed & lt ; 13.5 ) and ( wind-speed & lt ; 13.0 ) ] Then ( Prediction = Rain )Rule 10: If [ ( humidness & gt ; = 86.0 ) and ( force per unit area & lt ; 1007.0 ) and ( temperature & gt ; = 25.5 ) and ( humidness & gt ; = 88.
0 ) and ( wind-speed & lt ; 13.5 ) and ( wind-speed & gt ; = 13.0 ) and ( temperature & lt ; 27.0 ) ] Then ( Prediction = NoRain )Rule 11: If [ ( humidness & gt ; = 86.0 ) and ( force per unit area & lt ; 1007.0 ) and ( temperature & gt ; = 25.
5 ) and ( humidness & gt ; = 88.0 ) and ( wind-speed & lt ; 13.5 ) and ( wind-speed & gt ; = 13.0 ) and ( temperature & gt ; = 27.
0 ) ] Then ( Prediction = Rain )Rule 12: If [ ( humidness & gt ; = 86.0 ) and ( force per unit area & lt ; 1007.0 ) and ( temperature & gt ; = 25.5 ) and ( humidness & gt ; = 88.0 ) and ( wind-speed & gt ; = 13.5 ) ] Then ( Prediction=Rain )Rule 13: If [ ( humidness & gt ; = 86.
0 ) and ( force per unit area & gt ; = 1007.0 ) ] Then ( Prediction = Rain )
The information taken for the preparation needs to be sorted during the initial phase of the tree growing stage of determination tree building [ 3 ] . As per the preparation informations, humidness is the first property. Take the humidness property and its corresponding category label as a brace, place the split points whenever there is a alteration in the category label. The better split point demands to be found for increasing the truth of anticipation. For every split point identified find the center for the changed category labels and continue until it reaches the terminal of the informations as shown in Table IV.From the Table IV it is clearly seeable that there is a alteration in the category label for the first clip at the 3rd place. Mark it as split point and take the center value of 2nd and 3rd category label values i.
e. center ( 76, 76 ) =76. Similarly the 2nd split point occurs at 4th place. Mark it as split point and take the center value of 3rd and 4th category label values i.e. center ( 76, 78 ) = 77. Continuing in this order there are nine disconnected points as the category label is altering at nine places.Repeat the process to happen out the split points for the attribute temperature shown in Table V, attribute force per unit area shown in Table VI, property air current velocity shown in Table VII and attribute dew point shown in Table VIII.
TABLE IV. Dataset screening on humidnessHumiditySplit PointClass75No Rain76767780.080.581.583.086.087.
588.0No Rain76Rain78No Rain79No Rain79No Rain80No Rain80Rain81No Rain82Rain82Rain84No Rain84No Rain85No Rain86No Rain86Rain86Rain86Rain87Rain88No Rain88Rain88Rain89Rain91Rain92Rain93Rain94Rain95Rain95Rain97RainTABLE V. Dataset screening on temperature
5Rain25Rain26No Rain26No Rain26No Rain26Rain26Rain26Rain26Rain26Rain26Rain27No Rain27No Rain27No Rain27No Rain27No Rain27No Rain27Rain27Rain27Rain27Rain27Rain27Rain27Rain28No Rain28No Rain28Rain29No RainTABLE VI. Dataset screening on force per unit area
51009No Rain1004Rain1005Rain1005Rain1005Rain1005Rain1005Rain1005Rain1005Rain1006No Rain1006No Rain1006Rain1006Rain1006Rain1006Rain1007No Rain1007No Rain1007No Rain1007Rain1007Rain1007Rain1007Rain1008No Rain1008No Rain1008No Rain1008No Rain1008Rain1009No Rain1009RainTABLE VII. Dataset screening on air current velocity
667910.511121313.51415161718No Rain6Rain8No Rain10Rain11No Rain11Rain11Rain11Rain11Rain13No Rain13No Rain13No Rain13No Rain13No Rain13Rain13Rain14No Rain14No Rain14Rain14Rain14Rain14Rain14Rain16No Rain16Rain16Rain16Rain18No Rain18Rain18RainTABLE VIII. Dataset screening on dew point
19No Rain19No Rain191919.52020.521No Rain19Rain20No Rain20No Rain20No Rain20No Rain20Rain20Rain20Rain21No Rain21No Rain21No Rain21No Rain21No Rain21Rain21Rain21Rain21Rain21Rain21Rain21Rain21Rain21Rain21Rain21Rain21Rain22Rain22RainNow, compare all the split points ‘ addition ratio values and the value which is maximal is the best split point for that property as shown in Table III. The addition value obtained for the property is to be divided by split info value of category label, in order to obtain the addition ratio value for that property and is shown in equation ( 9 ) .
Gain Ratio ( V ) = Gain ( V ) / Split info ( V ) ( 9 )Repeat the above process by taking the temperature property along with the category label, Pressure attribute along with the category label, wind velocity property along with the category label and eventually dew point property along with the category label to acquire the best split points. Choose the maximal addition ratio value and that itself becomes the root node. Based on the threshold value of the root node generates the tree. Repeat the process boulder clay it is terminated with a alone category label.The addition ratio is by and large used to mensurate the inequalities among the statistical informations and its frequences.
So far, its usage is limited for the analysis of wealth and income of the economic states. Due to the inequalities present in the chances, there may be some mistake. But, irrespective of its restriction nowadays, it has a broad assortment of the applications in statistical analysis.The addition ratio is used here for the building of the determination tree where the roots and sub-roots are classified. The usage of the addition ratio for the rainfall analysis is rather disposed because of the abnormalities present in the statistical information of the precipitation.
The precipitation informations is used does non follow an order in other words a consecutive way. This may be due to the inequalities of the present property with former property. This may alter to a great extent or to some extent depending on the Mother Nature.Some experiments have been conducted on existent informations to analyse the truth of the tree. We have used the dataset from the accuweather.com of Indian Meteorological Department. The end is to foretell the precipitation for rainfall. The dataset consists of 15 old ages of informations from the twelvemonth 1998 to 2012 containing of 5230 illustrations.
Out of 15 old ages data 9 old ages information is used as preparation dataset and the staying 6 old ages information is used as trial dataset.It has been found in Table IX, the differentiation between the success rate of anticipation and clip. It can besides be observed, that the maximal efficiency obtained is 74.
1 % on one twelvemonth dataset, 77.47 % for two old ages dataset, 77.38 % for three old ages dataset, 77.17 % for four old ages dataset, 77.39 % for a five 5 old ages dataset and 77.
78 % for a six old ages dataset. The mean efficiency has been found to be 77.78 % .
Though, this contributes a nice efficiency or success rate, the other methods of back extension nervous webs [ 7,8 ] , [ 12-15 ] , additive discriminate statistical analysis [ 16 ] and J48 are analyzed to choose the best acting method of anticipation of the precipitation.The published consequences for this dataset are: 64.3 % truth for backpropagation, 58 % for a additive discriminant and 68.6 % for J48. Using the same preparation and trial datasets, Since the mean truth utilizing SLIQ with addition ratio is 77.78 % as shown in Fig. 4, SLIQ utilizing addition ratio can be considered as the best acting method for the anticipation of precipitation.TABLE IX.
Result demoing the Accuracy and Time of Response
No. of records
In right anticipations
Accuracy ( % )
( Sec )
175441619125336677.392461981154144077.77847Fig 2. No. of Records Vs Correct PredictionsFig 3. No. of Records Vs wrong PredictionsFig 4.
No. of Records Vs Accuracy ( % )Fig 5. No.
of Records Vs Time ( Sec )The fluctuation of right anticipations with dataset is shown in Fig. 2. This indicates that there lies a additive relationship between right anticipations and figure of records in the dataset.The fluctuation of wrong anticipations with dataset is shown in Fig.3. This indicates that there lies a non additive relationship between wrong anticipations and figure of records in the dataset. From the above graph, the figure of wrong anticipations follows a diminishing tendency up to 600 records and thereafter additions non linearly. The fluctuation of truth with the dataset is plotted in Fig.
4. The fluctuation of clip of response towards dataset is plotted in Fig. 5.
The economic system of a state depends on agricultural productiveness which is the footing for explicating economic policy. The agricultural productiveness depends on the handiness of H2O.
The precipitation is the major beginning of H2O which depends on assorted properties like humidness, force per unit area, temperature, wind velocity, dew point and so on. Hence, the anticipation of precipitation becomes a hard undertaking as it has to see many parametric quantities. Many techniques such as nervous webs, unreal intelligence, used for anticipation of precipitation have less truth. So far, the maximal truth reported is 72.3 % . This survey employed SLIQ determination tree utilizing addition ratio as dividing standard.
For measuring the effectivity of this theoretical account the historical informations obtained from IMD is applied. It is found that the method proposed in this paper gives higher truth when compared to the other theoretical accounts.
In this paper, we highlighted addition ratio based SLIQ determination tree algorithm, which gives maximal truth. For future execution assorted other determination tree algorithms like CART, SPRINT, ELEGANT, EC4.
5 with extra parametric quantities can be developed. A determination tree must be developed for the dynamic manner of informations instead than inactive manner.