Despite its efficacy, machine learning in health sciences faces limitations with regard to addiction prediction due to integrating diverse data sources, addressing biases, and interpreting complex models. This may reduce the effectiveness of predictive models in identifying at-risk individuals and informing intervention strategies. The current challenge lies in identifying the optimal number of features for model training and determining the influential factors for alcohol addiction. Therefore, this paper explores and proposes an enhanced feature engineering algorithm which not only ranks the feature importance, but also automatically extracts the optimal features for the prediction model, which in return improves the predictive power of kernel-based models. By using a feature aggregation approach, the features identified by different Relief-based algorithms (such as Relief, ReliefF and RReliefF) were merged into a unified set as a ranked feature list, and the Relief-based algorithms were integrated with the XGBoost boosting algorithm for the implementation of an automated feature selection process. The proposed method provided 11 influential features to be included as n_features in the predictive model. Three different families of classifiers, namely the Linear, Ensemble-based and Kernel-based classifiers were analysed in combination with the enhanced Relief-based algorithm to evaluate the response of the proposed model to the respective algorithms. In this context, the enhanced RReliefF algorithm improved the Kernel-based model by 7.47% in terms of the discriminative power and by 12.69% with regard to the predictive power, in comparison with the baseline model. These findings aided in resolving the limitations related to the manual optimal feature selection typical of the current feature engineering methods, thereby opening a new research avenue for automatic feature engineering in a low-code context. Overall, the proposed enhanced algorithm ensures technical correctness by leveraging ReliefF algorithm’s feature rankings effectively for an improved performance in the context of kernel-based models like Support Vector Machine (SVM), making them more accessible and actionable for clinicians and healthcare professionals working in alcohol addiction-related prevention and intervention.
Feature Aggregation, Automatic Feature Selection, Influential features, Kernel-based model, RReliefF.
Myat Noe WIN, Sri Devi RAVANA, Tutut HERAWAN, Liyana SHUIB, "Enhancing Kernel-based Model Predictive Power Through Enhanced Relief-based Algorithm for the Early Detection of Alcohol Use Disorder Among Secondary Students", Studies in Informatics and Control, ISSN 1220-1766, vol. 33(4), pp. 59-72, 2024. https://doi.org/10.24846/v33i4y202406