Chapter 3.4 - Build and publish the model with BentoML and Docker in the CI/CD pipeline¶
Introduction¶
In this chapter, you will containerize and push the model to the container registry with the help of the CI/CD pipeline. You will use BentoML and Docker to containerize and publish the model and the pipeline to trigger the publishing.
The steps will be similar to the last chapter, but we will use the pipeline to automate the process.
In this chapter, you will learn how to:
- Grant access to the container registry on the cloud provider for the CI/CD pipeline
- Store the container registry credentials in the CI/CD configuration
- Create the CI/CD pipeline for publishing the model to the container registry
The following diagram illustrates the control flow of the experiment at the end of this chapter:
flowchart TB
dot_dvc[(.dvc)] <-->|dvc pull
dvc push| s3_storage[(S3 Storage)]
dot_git[(.git)] <-->|git pull
git push| repository[(Repository)]
workspaceGraph <-....-> dot_git
data[data/raw]
subgraph cacheGraph[CACHE]
dot_dvc
dot_git
end
subgraph workspaceGraph[WORKSPACE]
data --> code[*.py]
subgraph dvcGraph["dvc.yaml"]
code
end
params[params.yaml] -.- code
code <--> bento_model[classifier.bentomodel]
subgraph bentoGraph[bentofile.yaml]
bento_model
serve[serve.py] <--> bento_model
end
bento_model <-.-> dot_dvc
end
subgraph remoteGraph[REMOTE]
s3_storage
subgraph gitGraph[Git Remote]
repository <--> |...|action[Action]
end
registry[(Container
registry)]
action --> |bentoml build
bentoml containerize
docker push|registry
end
style workspaceGraph opacity:0.4,color:#7f7f7f80
style dvcGraph opacity:0.4,color:#7f7f7f80
style cacheGraph opacity:0.4,color:#7f7f7f80
style data opacity:0.4,color:#7f7f7f80
style dot_git opacity:0.4,color:#7f7f7f80
style dot_dvc opacity:0.4,color:#7f7f7f80
style code opacity:0.4,color:#7f7f7f80
style bentoGraph opacity:0.4,color:#7f7f7f80
style serve opacity:0.4,color:#7f7f7f80
style bento_model opacity:0.4,color:#7f7f7f80
style params opacity:0.4,color:#7f7f7f80
style s3_storage opacity:0.4,color:#7f7f7f80
style remoteGraph opacity:0.4,color:#7f7f7f80
style gitGraph opacity:0.4,color:#7f7f7f80
style repository opacity:0.4,color:#7f7f7f80
linkStyle 0 opacity:0.4,color:#7f7f7f80
linkStyle 1 opacity:0.4,color:#7f7f7f80
linkStyle 2 opacity:0.4,color:#7f7f7f80
linkStyle 3 opacity:0.4,color:#7f7f7f80
linkStyle 4 opacity:0.4,color:#7f7f7f80
linkStyle 5 opacity:0.4,color:#7f7f7f80
linkStyle 6 opacity:0.4,color:#7f7f7f80
linkStyle 7 opacity:0.4,color:#7f7f7f80
linkStyle 8 opacity:0.4,color:#7f7f7f80
Steps¶
Set up access to the container registry of the cloud provider¶
The container registry will need to be accessed inside the CI/CD pipeline to push the Docker image.
This is the same process you did for DVC as described in Chapter 8 - Reproduce the ML experiment in a CI/CD pipeline but this time for the container registry.
Update the Google Service Account and its associated Google Service Account Key to access Google Cloud from the CI/CD pipeline without your own credentials.
Tip
There is no need to update the value in the CI/CD pipeline configuration.
All changes are made at the Google Cloud level and the key file is not changed.
This guide has been written with Google Cloud in mind. We are open to contributions to add support for other cloud providers such as Amazon Web Services, Exoscale, Microsoft Azure or Self-hosted Kubernetes but we might not officially support them.
If you want to contribute, please open an issue or a pull request on the GitHub repository. Your help is greatly appreciated!
Add container registry CI/CD secrets¶
Add the container registry secret to access the container registry from the CI/CD pipeline. Depending on the CI/CD platform you are using, the process will be different:
Create the following new variables by going to the Settings section from the top header of your GitHub repository. Select Secrets and variables > Actions and select New repository secret:
GCP_CONTAINER_REGISTRY_HOST
: The host of the container registry (ex:europe-west6-docker.pkg.dev/mlops-surname-project/mlops-surname-registry
, from the variableGCP_CONTAINER_REGISTRY_HOST
in the previous chapter)
Save the variables by selecting Add secret.
Create the following new variables by going to Settings > CI/CD from the left sidebar of your GitLab project. Select Variables and select Add variable:
GCP_CONTAINER_REGISTRY_HOST
: The host of the container registry (ex:europe-west6-docker.pkg.dev/mlops-surname-project/mlops-surname-registry
, from the variableGCP_CONTAINER_REGISTRY_HOST
in the previous chapter)- Protect variable: Unchecked
- Mask variable: Checked
- Expand variable reference: Unchecked
Save the variables by selecting Add secret.
This guide has been written with Google Cloud in mind. We are open to contributions to add support for other cloud providers such as Amazon Web Services, Exoscale, Microsoft Azure or Self-hosted Kubernetes but we might not officially support them.
If you want to contribute, please open an issue or a pull request on the GitHub repository. Your help is greatly appreciated!
Update the CI/CD pipeline configuration file¶
You will adjust the pipeline to build and push the the docker image to the container registry. The following steps will be performed:
- Detect a new commit on the
main
branch - Authenticate to the cloud provider
- Build the Docker image
- Push the Docker image to the container registry
Update the .github/workflows/mlops.yaml
file with the following content.
Take some time to understand the deploy job and its steps:
.github/workflows/mlops.yaml | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
|
Check the differences with Git to validate the changes.
Execute the following command(s) in a terminal | |
---|---|
The output should be similar to this:
Update the CI/CD pipeline configuration file
Update the .gitlab-ci.yml
file to add a new stage to deploy the model on the Kubernetes cluster.
Take some time to understand the publish and deploy job and its steps.
.gitlab-ci.yml | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
|
Check the differences with Git to validate the changes.
Execute the following command(s) in a terminal | |
---|---|
The output should be similar to this:
Check the changes¶
Check the changes with Git to ensure that all the necessary files are tracked:
Execute the following command(s) in a terminal | |
---|---|
The output should look similar to this:
Commit the changes to Git¶
Commit the changes to Git.
Execute the following command(s) in a terminal | |
---|---|
Summary¶
Congratulations! You have successfully prepared the model for automated deployment in a production environment with the CI/CD pipeline!
New versions of the model will be published to the artifact registry automatically as soon as they are pushed to the main branch.
In this chapter, you have successfully:
- Automated the containerization and publication of the BentoML model artifact to the container registry
State of the MLOps process¶
- Notebook has been transformed into scripts for production
- Codebase and dataset are versioned
- Steps used to create the model are documented and can be re-executed
- Changes done to a model can be visualized with parameters, metrics and plots to identify differences between iterations
- Codebase can be shared and improved by multiple developers
- Dataset can be shared among the developers and is placed in the right directory in order to run the experiment
- Experiment can be executed on a clean machine with the help of a CI/CD pipeline
- CI/CD pipeline is triggered on pull requests and reports the results of the experiment
- Changes to model can be thoroughly reviewed and discussed before integrating them into the codebase
- Model can be saved and loaded with all required artifacts for future usage
- Model can be easily used outside of the experiment context
- Model publication to the artifact registry is automated
- Model is accessible from the Internet and can be used anywhere
- Model requires manual deployment on the cluster
- Model cannot be trained on hardware other than the local machine
- Model cannot be trained on custom hardware for specific use-cases
You will address these issues in the next chapters for improved efficiency and collaboration. Continue the guide to learn how.
Sources¶
Highly inspired by: