By: Ram on Jun 18, 2020
When do you use Multiple Regression?
Based on the scale of measurement, variables can be defined as Binary, Ordinal, Nominal, and Continuous (Ratio and Interval Scale) type. When a decision (or target/dependent) variable is continuous, one of the Statistical Methods available for building the model is multiple regression. These types of scenarios or problems are classified as Regression problems.
Some of the scenarios and ideas are list below. These examples are across functional areas and business verticals.
Multiple Regression Applications across Industries.
Industry Vertical |
Scenario |
Scenario Description |
|
Human Resource |
Salary Estimate |
Predicting or estimating the salary of a person based on a set of attributes such as years of experience, level of education, an industry of work, previous job salary etc. |
|
Human Resources |
Month of Stickiness |
Considering a high level of employee churn, multiple regression-based model to estimate months of stickiness (or job with a new employer) at the time of recruitment based on candidate attributes. |
|
Human Resources |
Resource Demand |
Causal Forecasting for Demand estimation for each of the technical skills. The level of bench in most of the big IT services providers is an important level to get project & deliver but also add to the cost. An accurate estimation of demand by skills could be important measures to manage requirements at the right cost. |
|
Real Estate |
House Price Prediction |
Predicting House Prices considering house, locality, and builder characteristics. |
|
|
Real Estate |
House Demand Forecast |
Developing a forecasting model to find the volume of houses on sales in a month given economic factors, seasonality, and other dimensions |
Banking/Financial Services |
Customer Value Estimation |
Considering customer level attributes, estimating customer value. |
|
Banking/Financial Services |
Spend Value at a Customer |
Spend on Credit Card is a strong indicator of customer engagement on the card and whether a credit card is a front of the wallet card. Predicting the Spend value of cardholders could help the product and marketing teams in engaging the customers with an appropriate treatment strategy. |
|
Banking/Financial Services |
Balance In Flow into Transaction or Saving Account |
Predicting the amount of balance expected to be deposited into customers’ transaction and saving account using customer level characteristics. |
|
Banking/Financial Services |
Drivers of Account Open Volume |
Building Marketing or Media Mix Model to find economic, advertisement spend (across media or channels), competitor and offer related variables impacting new account open volume in a week |
|
Banking/Financial Services |
Portfolio Loss Forecasting |
In portfolio risk estimation, Loan Over Line Equivalent Concept is estimated using Multiple Regression Framework and Account Variables such as Account Line and Outstanding Balance at observation, and Economic Factors are used as independent variables. Reference: https://www.philadelphiafed.org/-/media/research-and-data/publications/working-papers/2014/wp14-10.pdf |
|
Insurance/Financial Services |
Claim Amount Estimation |
Insurance providers charge a premium based on the estimated claim amount for the target group of the customers along with other factors. The claim could be against Motor, Home or Pet Policy. Also, the estimated claim amount could be used for operational cash reserve calculations.
https://www.casact.org/pubs/proceed/proceed87/87354.pdf |
|
Healthcare/Insurance |
Healthcare Cost |
The healthcare cost of an individual to healthcare insurer using previous claim history, demographic and other data available about the individual |
|
Retailer /CPG |
Sales Volume and Return on Investment Modeling |
Finding out drivers of retail product sales as a function of spend across media channels, economic factors, and competitor actions |
|
Bank |
Revenue Regression Model |
Predicting revenue of customers and identifying parameters that are linked to increased revenue of the customers. This helps business bankers in realigning the priority and focus. |
Overall Approach of Regression Model Development
Multiple Regression Algorithm: Concepts
A Multiple Regression Problem formulations is of the following form
Y = B0 + B1* X1 + B2 * X2 + B3 * X3…. + Error
Y is Target or Dependent Variable
X1, X2, and X3 etc are set of independent variables or features
B0 is an intercept and B1, B2 and B3 are coefficients for each of the independent variables.
The main aim of the model is to find the values of these parameter estimates. The method used for estimating parameter values is Ordinary Least Square (OLS). The method aims to find the values of these parameters such as that the overall error of the model is minimized.
One of the simplest examples of Multiple Regression is Simple Linear Regression in which only one independent Variable is considered and the form will be
Now, we explaining the detailed steps to find values of intercept (B0) and B1, parameter coefficient for X1 variable.
Parameter Estimation in Simple Regression
Review Multiple Regression Output
Most of the analytical tools (such as Python, SAS, R, and SPSS) gives similar output for a regression model.
A regression model output typically will have 3 parts in the output.
One of the key performance statistics for a regression model is R2 indicating % of variance explained by the model. But R2 only increases if you are adding more variables, so Adj R2 is evaluated to not select the complicated model.
The main objective of modeling is to find parameter estimates. Based on T Statistics and P-Value, the variable significance is evaluated. P-Value indicates evidence in favor of the null hypothesis. In the regression model, the null hypothesis is "Beta Coefficient or Parameter Estimate for a variable is Zero". So the lower P-value indicates, the variable can be kept in the model.
Model Selection