Journals Information
Mathematics and Statistics Vol. 13(5), pp. 354 - 364
DOI: 10.13189/ms.2025.130511
Reprint (PDF) (881Kb)
A Mixed Integer Optimization Framework for Significant Variable Selection in Linear Regression with Multicollinearity Control
Samah Abdellatif El-Danasoury 1,*, Mahmoud Rashwan 2, Nadia Makary Girgis 2
1 Institute of National Planning, Nasr City, Cairo, Egypt
2 Faculty of Economics and Political Science, Cairo University, Egypt
ABSTRACT
Variable selection in linear regression is a crucial process for selecting the most important variables that contribute to building an efficient model. This process primarily aims to improve model performance, reduce complexity, and enhance interpretability. Different methods have been proposed for variable selection; some are classical methods, and others are mathematical programming methods, which are preferred over classical methods because we can target various constructive objectives within the same model. This paper aims to propose, develop, test, and evaluate a mathematical programming model that selects a subset of explanatory variables in order to obtain a statistically significant linear regression model (LRM) with non-collinear variables. To construct an efficient LRM that is valid for interpretation, the LRM assumptions have to be satisfied. The proposed mathematical programming model will result in a significant linear regression model with significant variables by minimizing the Sum of Squares of Errors (SSE) and satisfying the significance of the overall LRM, ensuring no multicollinearity, as well as other assumptions considered as mathematical constraints in Chung's model (linearity and individual variables' significance), which aims to enhance the mathematical programming model. The proposed mathematical programming model is compared to Chung's mathematical programming model and the classical stepwise method for variable selection. Using simulated data and applying the three methods, the results show that the suggested model is more suitable for a small number of variables - especially in small and moderate sample sizes- for the case of the overall model significance, whereas adding a no-multicollinearity constraint improves the model's performance in selecting the appropriate variables regardless of the numbers of variables and across different sample sizes.
KEYWORDS
Multiple Linear Regression, Subset Selection, Significance of Linear Regression Model, No Multicollinearity, Mathematical Programming
Cite This Paper in IEEE or APA Citation Styles
(a). IEEE Format:
[1] Samah Abdellatif El-Danasoury , Mahmoud Rashwan , Nadia Makary Girgis , "A Mixed Integer Optimization Framework for Significant Variable Selection in Linear Regression with Multicollinearity Control," Mathematics and Statistics, Vol. 13, No. 5, pp. 354 - 364, 2025. DOI: 10.13189/ms.2025.130511.
(b). APA Format:
Samah Abdellatif El-Danasoury , Mahmoud Rashwan , Nadia Makary Girgis (2025). A Mixed Integer Optimization Framework for Significant Variable Selection in Linear Regression with Multicollinearity Control. Mathematics and Statistics, 13(5), 354 - 364. DOI: 10.13189/ms.2025.130511.