Historical Isolated Forest for detecting and adaptation concept drifts in nonstationary data streaming

Document Type : Original Article

Authors

1 Information System, faculty of computer and information, Menoufia university, Shebin Elkom, Menofia, Egypt

2 Information systems dept., Faculty of computers and information, Menofia university

3 Information SystemsDepartment Faculty of Computers and Information Menoufia University, Egypt

Abstract

Concept drift refers to sudden changes in the fundamental structure of the streaming data distribution over time. The core objective of concept drift research is to develop techniques and strategies for detect, understand, and adapt data streaming drifts. Data research has shown that if concept drift is not handled properly, machine learning in such an environment would provide subpar learning outcomes. In this paper, a historical Isolated Forest (HIF) is presented that depends on a decision tree, which split the data streaming into chunks and each chuck considers a region in the tree. HIF is employed to detect concept drifts and adapt this region with current changes. Which HIF stores previously generated models and employs the most similar model of each concept drift distribution as the current model until generate the best performance model. HIF doesn’t stop the main system model when retraining a new model, which HIF is divided into three primary parallel blocks: detection block, similarity block (online block), and retraining block (offline block). For several authentic data sets (three data sets), our suggested algorithm was verified and contrasted. the accuracy and execution speed were specifically assessed, and memory usage. The experimental results demonstrate that our modifications use fewer resources and have comparable or greater detection accuracy than the original IForestASD.

Keywords