Serving machine learning models
Survey of options for web services with machine learning models in Python.
It’s challenging to operate and maintain machine learning models in a production system. This post will survey the options for machine learning web services with Python. By the end of this post, you should have an understanding of your options and why you might pick one or another.
The main challenge is that the code and data must be compatible. For example, if you add a new feature to your model it needs changes to both the code and the model data. New code may crash with older models or vice versa.
Goals
- Continuous deployment – We shouldn’t do much manual work to deploy an update.
- Continuous integration – The model and code should be tested together before deployment. There should be no risk of model-code incompatibility on deployment.
- Risk-free reverting – When we detect a bug and roll back to an older version, there’s no risk of model-code incompatibility.
Assumptions
- git for code versioning
- Jenkins for CI/CD (or similar)
- You have tests for your model endpoints that are run in Jenkins
- You already have a file format to save and load your model
- You’re deploying using Docker or something similar
Options
git
You can simply commit the model to your repo in git. We started this way and it works for a while, but checking out a repo, branching, and updating can be very slow because it downloads every version of every file.
If you make a mistake like this and switch later, it’s still in your git history and will slow things down forever. Unless you rewrite git history or start over.
git-lfs (large file storage)
This greatly reduces time to checkout, branch, and update compared to git. You’re updating models with familiar commands like git add, git push, and git pull. It’s a good option for a small startup.
Notes on LFS providers
- If you use github, they have an account-level limit on LFS storage and you need to buy data packs when you run out of storage space. You’ll need to be proactive to ensure that your developers aren’t blocked on pushing commits.
- You can use different storage providers for git and git-lfs. The LFS provider can be operated through services like Artifactory or you can create an AWS Lambda to use S3 for storage. However, the configuration for a secondary provider for LFS can be complex, especially in a CI/CD pipeline like Jenkins – commands like
checkout scmdon’t work anymore.
Your team will all need to install git-lfs onto their machines then configure git to use it. Your repo will need to be configured as well and the .gitattributes file should be tracked in regular git.
lazydata
This is a Python module for data storage and it lazy-loads files from whatever provider you configure, like S3.
Advantages
- You can modify the code without downloading all of the data if necessary.
- S3 is fast and cheap.
- Less setup than other methods.
Disadvantages
- You’ll implement loading to happen on boot or on first request, which adds a little latency.
- You need to run additional commands like
lazydata pushafter runninggit pushto ensure that it’s pushed to S3. - If you have a pipeline with dev/staging/prod, lazydata would be using the same S3 bucket for all three environments which violates best practices for devops.
- Python only
If you’re adding lazydata to a repo, you need to install via pip, run lazydata init, configure the remote (S3), make sure the config file is added to git, then push. For adding new files, you’ll need to either write them from Python with track or else add them manually via the command line. When you modify anything tracked by lazydata you’ll need to do lazydata push.
When teammates just want to read the files, they don’t need to do anything beyond installing it. If you’re using requirements.txt or Pipfile it’s easy.
DVC (data version control)
Like lazydata, this can use S3 for versioning and storing your large files and keeping them in sync with git. DVC works more like a secondary git command though.
It’s also designed for a much more general machine learning workflow that includes training data and experimentation. That won’t help with deployment but may be important to you.
Advantages
- Language-agnostic
Disadvantages
- Need to run additional commands like
dvc add,dvc push,dvc pull. It’s slightly more work than lazydata.
The process for adding to a new repo is similar to lazydata – you install it via pip, run dvc init, setup the remote, and then commit the new files. When adding a file you run dvc add. And then when you commit via git you also run dvc push.
When teammates want to read the files they need to install it via requirements.txt/Pipfile and then run dvc pull after cloning the repo.
Amazon Sagemaker
Sagemaker works very differently – you aren’t taking your existing Flask service and standing it up so much as you change your model training and serving process to fit Sagemaker. They manage model versions in S3 for you and manage the hosting of the models as well.
Azure and Google Cloud have similar services.
Advantages
- Very easy if the provided model types are suitable for your problem
- Autoscales your model endpoint based on demand
Disadvantages
- AWS authentication for the model endpoint may not suit your needs
- You’ll need to understand the AWS ecosystem much more, especially for security
- It makes custom model serving a bit harder
The setup for this is more extensive – see Sagemaker docs.
Briefly mentioned
Pickle files
Pickle is the easy way to serialize machine learning models. Popular libraries like scikit-learn and Keras are compatible with Pickle. You should pin the versions of your dependencies because the format is version-specific for libraries like joblib.
scikit-learn pipelines
If you’re using scikit-learn in any way, I recommend using pipelines. They bundle together all preprocessing with the model itself and you save it as a single file. This eliminates a common source of error – different preprocessing for training and testing/serving.
ONNX
ONNX is an open source neural network serialization format. There are many efficient implementations of model serving using ONNX. I looked into it earlier this year but found only minimal support for text processing – it’s really mainly for image processing.
Weights database
You could serialize your model to a database but then you need to manage the format and versions yourself. The only advantage I’ve seen is that you can put word embeddings in a database and only load the few words in the input, rather than keeping the entire vocabulary in memory.
AWS Lambda
Lambda is a great way to achieve scalability and reduce costs for your web service. But you need to package up all of the non-standard Python dependencies into a small package. You probably won’t have room left in the package for the model itself. Luckily, you can download your model from S3 on demand. Lazydata is a good solution for this.
Recommendations
If Sagemaker meets your needs, that’s what I’d recommend because you get model versioning and autoscaling of your service.
If Sagemaker doesn’t meet your needs, I’d recommend dvc with S3 for storage.
If you’re in a rush and have github admin, git-lfs is reasonable.