Chapter 3.5: Deploy and access the model on Kubernetes¶
Introduction¶
In this chapter, you will learn how to deploy the model on Kubernetes and access it from a Kubernetes pod using the previous Docker image.
This will allow the model to be used by other applications and services on a public endpoint accessible from anywhere.
In this chapter, you will learn how to:
- Create the Kubernetes cluster
- Validate kubectl can access the Kubernetes cluster
- Create the Kubernetes configuration files
- Deploy the Docker image on Kubernetes
- Access the model
Danger
The following steps will create resources on the cloud provider. These resources will be deleted at the end of the guide, but you might be charged for them. Kubernetes clusters are not free on most cloud providers and can be expensive. Make sure to delete the resources at the end of the guide.
The following diagram illustrates the control flow of the experiment at the end of this chapter:
flowchart TB
dot_dvc[(.dvc)] <-->|dvc pull
dvc push| s3_storage[(S3 Storage)]
dot_git[(.git)] <-->|git pull
git push| repository[(Repository)]
workspaceGraph <-....-> dot_git
data[data/raw]
subgraph cacheGraph[CACHE]
dot_dvc
dot_git
end
subgraph workspaceGraph[WORKSPACE]
data --> code[*.py]
subgraph dvcGraph["dvc.yaml"]
code
end
params[params.yaml] -.- code
code <--> bento_model[classifier.bentomodel]
subgraph bentoGraph[bentofile.yaml]
bento_model
serve[serve.py] <--> bento_model
end
bento_model <-.-> dot_dvc
end
subgraph remoteGraph[REMOTE]
s3_storage
subgraph gitGraph[Git Remote]
repository <--> |...|action[Action]
end
registry[(Container
registry)]
action --> |bentoml build
bentoml containerize
docker push|registry
subgraph clusterGraph[Kubernetes]
bento_service_cluster[classifier.bentomodel] --> k8s_fastapi[FastAPI]
end
registry --> |kubectl apply|bento_service_cluster
end
subgraph browserGraph[BROWSER]
k8s_fastapi <--> publicURL["public URL"]
end
style workspaceGraph opacity:0.4,color:#7f7f7f80
style dvcGraph opacity:0.4,color:#7f7f7f80
style cacheGraph opacity:0.4,color:#7f7f7f80
style data opacity:0.4,color:#7f7f7f80
style dot_git opacity:0.4,color:#7f7f7f80
style dot_dvc opacity:0.4,color:#7f7f7f80
style code opacity:0.4,color:#7f7f7f80
style bentoGraph opacity:0.4,color:#7f7f7f80
style serve opacity:0.4,color:#7f7f7f80
style bento_model opacity:0.4,color:#7f7f7f80
style params opacity:0.4,color:#7f7f7f80
style s3_storage opacity:0.4,color:#7f7f7f80
style remoteGraph opacity:0.4,color:#7f7f7f80
style gitGraph opacity:0.4,color:#7f7f7f80
style repository opacity:0.4,color:#7f7f7f80
style action opacity:0.4,color:#7f7f7f80
linkStyle 0 opacity:0.4,color:#7f7f7f80
linkStyle 1 opacity:0.4,color:#7f7f7f80
linkStyle 2 opacity:0.4,color:#7f7f7f80
linkStyle 3 opacity:0.4,color:#7f7f7f80
linkStyle 4 opacity:0.4,color:#7f7f7f80
linkStyle 5 opacity:0.4,color:#7f7f7f80
linkStyle 6 opacity:0.4,color:#7f7f7f80
linkStyle 7 opacity:0.4,color:#7f7f7f80
linkStyle 8 opacity:0.4,color:#7f7f7f80
linkStyle 9 opacity:0.4,color:#7f7f7f80
Steps¶
Install the Kubernetes CLI¶
Install the Kubernetes CLI (kubectl) on your machine.
Install kubectl with the Google Cloud CLI. You might need to follow the instructions in the terminal and the related documentation.
Execute the following command(s) in a terminal | |
---|---|
As per the instructions, you will need to install the gke-gcloud-auth-plugin
authentication plugin as well.
This guide has been written with Google Cloud in mind. We are open to contributions to add support for other cloud providers such as Amazon Web Services, Exoscale, Microsoft Azure or Self-hosted Kubernetes but we might not officially support them.
If you want to contribute, please open an issue or a pull request on the GitHub repository. Your help is greatly appreciated!
Create the Kubernetes cluster¶
In order to deploy the model on Kubernetes, you will need a Kubernetes cluster.
Follow the steps below to create one.
Enable the Google Kubernetes Engine API
You must enable the Google Kubernetes Engine API to create Kubernetes clusters on Google Cloud with the following command:
Tip
You can display the available services in your project with the following command:
Execute the following command(s) in a terminal | |
---|---|
Create the Kubernetes cluster
Create the Google Kubernetes cluster with the Google Cloud CLI.
Export the cluster name as an environment variable. Replace <my_cluster_name>
with a cluster name of your choice. It has to be lowercase and words separated by hyphens.
Warning
The cluster name must be unique across all Google Cloud projects and users. For example, use mlops-<surname>-cluster
, where surname
is based on your name. Change the cluster name if the command fails.
Execute the following command(s) in a terminal | |
---|---|
Export the cluster zone as an environment variable. You can view the available zones at Regions and zones. You should ideally select a zone close to where most of the expected traffic will come from. Replace <my_cluster_zone>
with your own zone (ex: europe-west6-a
for Zurich, Switzerland).
Execute the following command(s) in a terminal | |
---|---|
Create the Kubernetes cluster. You can also view the available types of machine with the gcloud compute machine-types list
command:
Info
This can take several minutes. Please be patient.
Execute the following command(s) in a terminal | |
---|---|
The output should be similar to this:
This guide has been written with Google Cloud in mind. We are open to contributions to add support for other cloud providers such as Amazon Web Services, Exoscale, Microsoft Azure or Self-hosted Kubernetes but we might not officially support them.
If you want to contribute, please open an issue or a pull request on the GitHub repository. Your help is greatly appreciated!
Validate kubectl can access the Kubernetes cluster¶
Validate kubectl can access the Kubernetes cluster:
The output should be similar to this:
Create the Kubernetes configuration files¶
In order to deploy the model on Kubernetes, you will need to create the Kubernetes configuration files. These files describe the deployment and service of the model.
Create a new directory called kubernetes
in the root of the project.
Create a new file called deployment.yaml
in the kubernetes
directory with the following content. Replace <docker_image>
with the Docker image you have created in the previous steps:
Tip
You can find the Docker image with the following command:
Create a new file called service.yaml
in the kubernetes
directory with the following content:
kubernetes/service.yaml | |
---|---|
The deployment.yaml
file describes the deployment of the model. It contains the number of replicas, the image to use, and the labels to use.
The service.yaml
file describes the service of the model. It contains the type of service, the ports to use, and the labels to use.
Deploy the containerised model on Kubernetes¶
To deploy the containerised Bento model artifact on Kubernetes, you will need to apply the Kubernetes configuration files.
Apply the Kubernetes configuration files with the following commands:
Execute the following command(s) in a terminal | |
---|---|
The output should be similar to this:
Open the cluster interface on the cloud provider and check that the model has been deployed.
Open the Kubernetes Engine on the Google cloud interface and click on your cluster to access the details.
This guide has been written with Google Cloud in mind. We are open to contributions to add support for other cloud providers such as Amazon Web Services, Exoscale, Microsoft Azure or Self-hosted Kubernetes but we might not officially support them.
If you want to contribute, please open an issue or a pull request on the GitHub repository. Your help is greatly appreciated!
Access the model¶
To access the model, you will need to find the external IP address of the service. You can do so with the following command:
Info
The external IP address of the service can take a few minutes to be available.
Execute the following command(s) in a terminal | |
---|---|
The output should be similar to this:
The LoadBalancer Ingress
field contains the external IP address of the service. In this case, it is 34.65.255.92
.
Try to access the model at the port 80
using the external IP address of the service. You should be able to access the FastAPI documentation page at http://<load balancer ingress ip>:80
. In this case, it is http://34.65.255.92:80
.
Check the changes¶
Check the changes with Git to ensure that all the necessary files are tracked:
Execute the following command(s) in a terminal | |
---|---|
The output should look similar to this:
Commit the changes to Git¶
Commit the changes to Git.
Execute the following command(s) in a terminal | |
---|---|
Summary¶
Congratulations! You have successfully deployed the model on Kubernetes with BentoML and Docker, accessed it from an external IP address.
You can now use the model from anywhere.
In this chapter, you have successfully:
- Created the Kubernetes configuration files and deployed the BentoML model artifact on Kubernetes
- Accessed the model
State of the MLOps process¶
- Notebook has been transformed into scripts for production
- Codebase and dataset are versioned
- Steps used to create the model are documented and can be re-executed
- Changes done to a model can be visualized with parameters, metrics and plots to identify differences between iterations
- Codebase can be shared and improved by multiple developers
- Dataset can be shared among the developers and is placed in the right directory in order to run the experiment
- Experiment can be executed on a clean machine with the help of a CI/CD pipeline
- CI/CD pipeline is triggered on pull requests and reports the results of the experiment
- Changes to model can be thoroughly reviewed and discussed before integrating them into the codebase
- Model can be saved and loaded with all required artifacts for future usage
- Model can be easily used outside of the experiment context
- Model publication to the artifact registry is automated
- Model is accessible from the Internet and can be used anywhere
- Model requires manual deployment on the cluster
- Model cannot be trained on hardware other than the local machine
- Model cannot be trained on custom hardware for specific use-cases
You will address these issues in the next chapters for improved efficiency and collaboration. Continue the guide to learn how.
Sources¶
Highly inspired by: