Saturday, January 25, 2025

The Benefits of Using Economically Significant Factors in Financial Data Science

Factor selection is one among our most vital considerations when creating financial models. So, as machine learning (ML) and data science change into more integrated into the world of finance, what aspects should we consider in our ML-driven investment models and the way should we choose from them?

These are open and important questions. Finally, ML models may also help not only with factor processing, but additionally with factor discovery and creation.

Factors in Traditional Statistical and ML Models: The (Very) Basics

Factor selection in machine learning is known as “feature selection.” Factors and characteristics help explain the behavior of a goal variable, while investment factor models describe the first drivers of portfolio behavior.

Perhaps the only of the various factor model construction methods is bizarre least squares (OLS) regression, through which portfolio return is the dependent variable and risk aspects are the independent variables. As long because the independent variables have a sufficiently low correlation, different models are statistically valid and explain portfolio behavior to various degrees. They show what percentage of portfolio behavior each model is chargeable for and the way sensitive a portfolio’s return is to every model’s behavior, expressed by the beta coefficient assigned to every factor.

Like their traditional statistical counterparts, ML regression models also describe the sensitivity of a variable to 1 or more explanatory variables. However, ML models can often account for nonlinear behavior and interaction effects higher than their non-ML counterparts and usually don’t provide direct analogs of OLS regression output, reminiscent of beta coefficients.

Graphic for “Handbook of AI and Big Data Applications in Investments”.

Why aspects should make economic sense

Although synthetic aspects are popular, economically intuitive and empirically validated aspects have benefits over such “statistical” aspects, no matter high-frequency trading (HFT) and other special cases. Most of us as researchers prefer the only possible model. So we regularly start with OLS regression or something similar, get convincing results, after which perhaps move on to a more sophisticated ML model.

However, in traditional regressions, the aspects should be sufficiently different or not highly correlated to avoid the issue of multicollinearity, which might disqualify a standard regression. Multicollinearity implies that a number of explanatory aspects of a model are too similar to supply comprehensible results. So in a standard regression, lower factor correlation – while avoiding multicollinearity – implies that the aspects are more likely to be economically different.

But multicollinearity often doesn’t apply to ML model construction because it does to OLS regression. This is because, unlike OLS regression models, ML model estimations don’t require inversion of a covariance matrix. Furthermore, ML models aren’t based on strict parametric assumptions or on homoscedasticity – error independence – or other time series assumptions.

Although ML models are relatively rule-free, a major amount of pre-modeling work could also be required to be sure that the inputs of a given model have each investment relevance and economic coherence and are unique enough to supply practical results without explanatory redundancies.

Although factor selection is important for any factor model, it is especially necessary when using ML-based methods. One strategy to select different but economically intuitive aspects within the pre-model phase is to make use of the least absolute shrinkage and selection operator (LASSO) technique. This gives modelers the power to collapse a big set of things right into a smaller set while ensuring significant explanatory power and maximum independence between aspects.

Another key reason for using economically significant aspects: They have a long time of research and empirical validation to back them up. The good thing about Fama-FrenchCarhart aspectsIs for instance well documented, and researchers have studied them in OLS regressions and other models. Therefore, their application in ML-driven models is intuitive. In fact, Chenwei Wu, Daniel Itano, Vyshaal Narayana and I write what would be the first research paper to use ML to stock aspects showed that Fama-French-Carhart takes aspects under considerationcoupled with two well-known ML frameworks – Random Forests and Association Rule Learning – can actually help explain asset returns and develop successful investment trading models.

Finally, by utilizing economically meaningful aspects, we will higher understand some kinds of ML results. For example, random forests and other ML models provide so-called relative feature importance values. These scores and ranks describe how much explanatory power each factor provides in comparison with the opposite aspects in a model. These values ​​are easier to know when the economic relationships between the varied aspects within the model are clearly presented.

Data Science certificate tile

Diploma

Much of the appeal of ML models comes from their relatively rule-free nature and the way well they keep in mind various inputs and heuristics. Nevertheless, some traffic rules should guide us when applying these models. By counting on economically meaningful aspects, we will make our ML-driven investment frameworks more comprehensible and be sure that only essentially the most complete and insightful models inform our investment process.

If you enjoyed this post, remember to subscribe.


Photo credit: ©Getty Images / PashaIgnatov


Latest news
Related news