Purpose – New opportunities for the use of predictive models emerged with Big Data and BDA. The use of these models to support decision-making, at different levels of an organization, increases its life cycle period, and, in this case, the data that will be applied to it, continuously depart from those used in its construction and validation. It is important to understand what has been discussed about the validity of models that are used for long periods. How do assess whether improvements in the models or the data collection and transmission processes imply a significant improvement in their performance?
Design/methodology/approach – To answer these questions, the work was restricted to binary classification models, due to their simplicity and wide use. The method used was based on a systematic literature review with content analysis and the use of numerical methods.
Findings – Evidence was found that, for different areas of knowledge, validation techniques follow what is traditional in the area. Many authors state the importance of a continuous validation process, however without detailing the criteria for this practice.
Practical implications- The indices used in the validation are estimates of their real value. Future or external data may change, modifying the performance of the chosen model. In an environment increasingly impacted by predictive algorithms, the quality assurance of these results must be uninterrupted.
Originality/value – The quantity and size of samples are fundamental parameters for the application of Statistical Process Monitoring (SPM). In the context of continuous validation, the work discusses alternatives for the definition of these elements, an aspect not yet properly addressed in the literature.
Paper type: General review.
Kappa, Statistical Process Monitoring, DS