SEMMA Data Framework

The SEMMA framework is a systematic approach for data mining and predictive modeling developed by SAS. It consists of five main stages: Sample, Explore, Modify, Model, and Assess. The process involves selecting a sample of the data, analyzing its characteristics and relationships, preparing the data for modeling, building and evaluating predictive models, and finally, evaluating the overall performance of the models.

Goals

To have a representative sample of the data to work with To understand the characteristics and relationships in the data To prepare the data for modeling by cleaning, transforming, and feature engineering To build and evaluate predictive models with good performance To identify areas for improvement in the models and data preparation process

Best pratices

Start with a large enough sample of the data to ensure representativeness. Take the time to thoroughly explore the data to gain insights and identify potential issues. Clean, transform, and engineer features in a way that maximizes the predictive power of the data. Use a variety of modeling techniques to find the best-performing model. Regularly assess the performance of the models to identify areas for improvement and make necessary adjustments.

Related content