Automate the model lifecycle

As with any modern software development process, we eliminate manual steps where possible, to reduce the likelihood of errors happening. For ML solutions we make sure there is a defined process for moving a model into production and refreshing as needed. (Note that we do not apply this automation to the initial development and prototyping of the algorithms as this is usually an exploratory and creative activity.)
For an algorithm which has been prototyped, and accepted into production the life-cycle is:
  • Ingest the latest data.
  • Create training and test sets.
  • Run the training.
  • Check performance meets the required standard.
  • Version and redeploy the model.
In a fully automated lifecycle this process is repeated either on a schedule or triggered by the arrival of more recent data with no manual steps.
There are a variety of tools and techniques to help with this. Some of the tools we have found useful include:
  • MLFlow
  • AWS Sagemaker
  • GCP Vertex AI

Experience report

When creating a pricing estimation service for one of our clients we were working from a blank canvas in terms of ML architecture. We knew that the model was going to be consumed by a web application so we could deploy as a microservice, and that data came in weekly batches with no real-time need for training.

We took a lean approach using standard AWS services to create a platform able to ingest new data, retrain the model and serve the model as an API endpoint.

We used a combination of S3 with versioning as our model store, and S3 event notifications, Lambdas, Fargate and Amazon Load Balancer to automate the data ingest, provisioning and update of two models, using CloudWatch to log the operations. The process is fully automated and triggered by the arrival of a weekly data drop into an S3 bucket. Shaun McGee Product & delivery
Equal Experts, USA