Recently read this article, courtesy from another fellow analytics enthusiast. This paper discuss about a popular statistical modelling myth, commonly seen in the private sector of Singapore. And that is the stepwise regression.
Based on my observation, most of the modellers in Singapore are actually using Stepwise Regression when modeling linear models. The idea of Stepwise Regression seems to appeal a lot of statisticians in Singapore because it takes variables in and variables out, having a lot of combination of variables in the model and choosing the best combination.
But after reading this paper, it seems like doing stepwise regression, that are certain pitfalls and one of them is the exaggeration of p-value. Now this is very scary, given that it is precisely p-value that we chose a particular variable to be inside the model.
This will be an interesting direction to figure out why that is so given the mechanism/process of selecting variables.
On another note, there is an interesting modelling technique that is inside the paper and that is LARS. What it does is, it chose variables by choosing the next best correlated variables to the error term. It would definitely take up a lot of time to model but am sure it makes a lot of sense to me. But I would feel that a simple correlation table has to be done for all the variables to determine multi-collinearity problem. But my take that such a possibility is very low.
For those that are interested in reading the paper, please refer to the link below.
www.nesug.org/proceedings/nesug07/sa/sa07.pdf
No comments:
Post a Comment