CRISP-DM is a methodology for organizing ML projects. It was invented in the 90s by IBM. The steps of this procedure are:
- Business understanding: An important question is if do we need ML for the project. The goal of the project has to be measurable.
- Data understanding: Analyze available data sources, and decide if more data is required.
- Data preparation: Clean data and remove noise applying pipelines, and the data should be converted to a tabular format, so we can put it into ML.
- Modeling: training Different models and choose the best one. Considering the results of this step, it is proper to decide if is required to add new features or fix data issues.
- Evaluation: Measure how well the model is performing and if it solves the business problem.
- Deployment: Roll out to production to all the users. The evaluation and deployment often happen together - online evaluation.
It is important to consider how well maintainable the project is.
In general, ML projects require many iterations.
The notes are written by the community. If you see an error here, please create a PR with a fix. |