# Testing Hypothesis: Linear Regression Analysis Essay

Data Given:BalanceATMServicesDebitInterestCity17561340127489210115011010011831104013162214601418861730117406300315931081011169640042125186002155412610314741271011913650011218103101100612400122152031041377200316754004343720012557207104227615410314941120112144173003199510700210538410315268401211208610318387511317461120021616104112195862102634271045804100113204510116756710278984004173512701317841150011326168003205114410410447510118851061121790114013765430041645690143220003126611700489071011220414500224091680021338144102207612510217081331012138185014237512400214559511314878410411256410219891230122156145102A: Account Balance in dollarsB: Number of ATM transactions in the monthC: Number of other bank services usedD: Has a debit card (0=no, 1=yes)E: Receives interest on the account (0=no, 1=yes)F: City where banking is done 60 accountsHypothesis: The account balance of customers is positively related with the number of ATM transactions, number of bank services used and receiving of interest and is negatively related with possession of debit card by the customer. City where banking is done has meager effect on the account balance.Since according to the above hypothesis the case involves one dependent variable i.e. Account balance which is denoted as A the predictor variables include: Number of ATM transactions in the month denoted as B, Number of other bank services used denoted by C, possession of debit card by the customer denoted by D.

Reception of interest on the account is denoted by E where as city where banking is done is represented by F. Constructing a linear equation containing all the above-described variables the regression procedures will estimate a linear equation of the form:A = a + b1*B + b2*C+ b3*D+ b4*E+ b5*FA high number of predictors as in our case can increase the reliability of the criterion accuracy. As mentioned in the above equation “a” stand for the y intercept at which we are measuring the account balances of the customers considering all the values of the predictor variables as zero. The small letter b in the equation represents the regression coefficients, which provide us with the degree of relationship of the predictor with the criterion. These co-efficient are multiplied by their related predictor variables (i.e. B, C, D etc.).

Calculating the regression coefficient in case of the multiple regression analysis is complex. The multiple regression equation tells us about the degree and the nature of interdependence among all the predictors and their relationship with the criterion and all other predictors in the equation.Hence the Multiple Linear Regression – Estimated Regression Equation:Balance [A] = + 161.46+ 91.

75[B] + 66[C] + 128.45[D] + 210.42 [E] -65.56[F] + 4.83t + e[t]The first numerical value 161.46 is a constant and a predicted criterion value, keeping all the other predicting variables equal to zero. The values 91.

75 + 66 + 128.45 + 210.42 and -65.56 represents the regression coefficients. The balance in each customer’s account can be predicted by substituting their respective values in the equation above and multiplying them with their co-efficient. The table below shows the Multiple Linear Regression-Ordinary Least Squares.Multiple Linear Regression – Ordinary Least SquaresVariableParameterS.

D.T-STAT2-tail p-value1-tail p-valueH0: parameter = 0(Intercept)161.4632028199.5895680.8090.

422150.21107ATM91.7525140212.2574547.485400Services65.

9913970227.8711362.36770.021580.

01079Debit128.454394107.6546151.19320.

23810.11905Interest210.4210424118.

0529121.78240.080410.0402`City `-65.5618010247.

948424-1.36730.177290.08865t4.

8348537622.9772571.62390.110320.05516Multiple Linear Regression – Regression StatisticsMultiple R0.

783809774R-squared0.614357762Adjusted R-squared0.57070015F-TEST (value)14.07217975F-TEST (DF numerator)6F-TEST (DF denominator)53p-value1.67373470638665e-09Multiple Linear Regression – Residual StatisticsResidual Standard Deviation391.

0979332Sum Squared Residuals8106752.448Higher value of the Sum of Squares residual (SSres) represents points disperse more widely about the regression line. As in our case the value is 8106752.448 which represent that the points are widely dispersed about the regression line. The larger value of Ssres represents higher fluctuation around the regression line hence reducing the accuracy of the researcher predictions. Hence it is very useful for the researcher to have small SSres values.

Conclusion:The regression model estimated seems to support the hypothesis as the intercept of A i.e. the account balance has an intercept + 161.

46, while the regression coefficient of number of ATM transactions in the month or B is positive with the value of 91.75 times the value of B. The Co-efficient of C (Number of other bank services used) and D (possession of debit card) is also positive having values of 66 and 128.45 respectively. There is a contradiction with the hypothesis in case of D as we have negatively related the possession of debit card with the account balance. Lastly F (city where banking is done has a negative coefficient with a value of -65.

56.ReferencesWessa, P. (2008), Free Statistics Software, Office for Research Development and Education, version 1.1.23-r2, URL http://www.wessa.net