Chapter 3.7 - Use a self-hosted runner for the CI/CD pipeline¶
Introduction¶
Warning
This chapter is a work in progress. It focuses for now solely on GitHub. Please check back later for updates specific to using GitLab.
Thank you!
Training experiments locally can be challenging, as they often demand significantly more computational power than your personal machine can provide, or they may require specific hardware to operate.
As you may lack the necessary hardware or prefer not to use your local machine for training, you can shift the model training to the more powerful Kubernetes cluster by using a self-hosted runner.
In this chapter, you will learn how to:
- Create a self-hosted runner Docker container image
- Publish the runner image to the container registry and deploy it
- Configure the CI/CD to use the self-hosted runner for training on the Kubernetes cluster
The following diagram illustrates the control flow of the experiment at the end of this chapter:
flowchart TB
dot_dvc[(.dvc)] <-->|dvc pull
dvc push| s3_storage[(S3 Storage)]
dot_git[(.git)] <-->|git pull
git push| repository[(Repository)]
workspaceGraph <-....-> dot_git
data[data/raw]
subgraph cacheGraph[CACHE]
dot_dvc
dot_git
end
subgraph workspaceGraph[WORKSPACE]
data --> code[*.py]
subgraph dvcGraph["dvc.yaml"]
code
end
params[params.yaml] -.- code
code <--> bento_model[classifier.bentomodel]
subgraph bentoGraph[bentofile.yaml]
bento_model
serve[serve.py] <--> bento_model
end
bento_model <-.-> dot_dvc
end
subgraph remoteGraph[REMOTE]
s3_storage
subgraph gitGraph[Git Remote]
repository[(Repository)] <--> action[Action]
end
action --> |dvc pull
dvc repro
bentoml build
bentoml containerize
docker push|registry
s3_storage -.- |...|repository
subgraph clusterGraph[Kubernetes]
action -->|dvc pull
dvc repro| pod_runner["Runner"]
bento_service_cluster[classifier.bentomodel] --> k8s_fastapi[FastAPI]
end
pod_runner -->|cml publish| action
pod_runner -->|dvc push| s3_storage
registry[(Container
registry)] --> bento_service_cluster
action --> |kubectl apply|bento_service_cluster
end
subgraph browserGraph[BROWSER]
k8s_fastapi <--> publicURL["public URL"]
end
style workspaceGraph opacity:0.4,color:#7f7f7f80
style dvcGraph opacity:0.4,color:#7f7f7f80
style cacheGraph opacity:0.4,color:#7f7f7f80
style data opacity:0.4,color:#7f7f7f80
style dot_git opacity:0.4,color:#7f7f7f80
style dot_dvc opacity:0.4,color:#7f7f7f80
style code opacity:0.4,color:#7f7f7f80
style bentoGraph opacity:0.4,color:#7f7f7f80
style serve opacity:0.4,color:#7f7f7f80
style bento_model opacity:0.4,color:#7f7f7f80
style params opacity:0.4,color:#7f7f7f80
style remoteGraph opacity:0.4,color:#7f7f7f80
style gitGraph opacity:0.4,color:#7f7f7f80
style repository opacity:0.4,color:#7f7f7f80
style bento_service_cluster opacity:0.4,color:#7f7f7f80
style registry opacity:0.4,color:#7f7f7f80
style clusterGraph opacity:0.4,color:#7f7f7f80
style k8s_fastapi opacity:0.4,color:#7f7f7f80
style browserGraph opacity:0.4,color:#7f7f7f80
style publicURL opacity:0.4,color:#7f7f7f80
linkStyle 0 opacity:0.4,color:#7f7f7f80
linkStyle 1 opacity:0.4,color:#7f7f7f80
linkStyle 2 opacity:0.4,color:#7f7f7f80
linkStyle 3 opacity:0.4,color:#7f7f7f80
linkStyle 4 opacity:0.4,color:#7f7f7f80
linkStyle 5 opacity:0.4,color:#7f7f7f80
linkStyle 6 opacity:0.4,color:#7f7f7f80
linkStyle 7 opacity:0.4,color:#7f7f7f80
linkStyle 8 opacity:0.4,color:#7f7f7f80
linkStyle 9 opacity:0.4,color:#7f7f7f80
linkStyle 10 opacity:0.0
linkStyle 12 opacity:0.4,color:#7f7f7f80
linkStyle 15 opacity:0.4,color:#7f7f7f80
linkStyle 16 opacity:0.4,color:#7f7f7f80
linkStyle 17 opacity:0.4,color:#7f7f7f80
Steps¶
Create a self-hosted runner container image¶
Jobs in a CI/CD workflow are executed on applications known as runners. These can be physical servers, virtual machines (like the default runner used for our workflow so far), or container images, and may operate on a public cloud or on-premises within your own infrastructure.
We will create a custom Docker container image for a self-hosted runner, store it in the Container Registry and deploy it on our Kubernetes cluster. An instance of this runner will then listen for jobs from GitHub Actions and execute them.
This container image will include all the necessary dependencies to run the workflows.
Note
For our self-hosted Docker image storage, we opted to use the GitHub Container Registry because of its close integration with our existing GitHub environment. This decision allows us to restrict our CI/CD processes to the GitHub infrastructure while also demonstrating its use. However, we could have also used our existing Google Cloud Container Registry.
At the root level of your Git repository, create a docker
folder. The following table describes the files that you will create in this folder:
File | Description | Role |
---|---|---|
Dockerfile | Instructions for building a Docker container image | Package runner files and dependencies |
startup.sh | The entrypoint for the Docker image | Initialize the container when launched |
Create the Dockerfile¶
The Dockerfile
provides the instructions needed to create a custom Docker container image that incorporates the GitHub Actions runner along with the workflow files and all its necessary dependencies.
Replace <my_repository_url>
with your own git repository URL.
Create the startup script¶
This startup.sh
script will act as an entrypoint for the Docker image. It will be used to initialize our Docker container when launched from the image we are creating. The primary purpose of this script is to register a new self-hosted GitHub runner instance for our repository, each time a new container is started from the image.
Since we use the GitHub Container Registry, replace <my_username>
and <my_repository_name>
with your own GitHub username and repository name.
Note the GITHUB_RUNNER_LABEL
variable will be used to identify the runner in subsequent steps.
Authenticate with the GitHub Container Registry¶
Before proceeding, you will need to create a personal access token. This token will be used to authenticate you on the GitHub Container Registry, allowing you to push the image there.
Follow the Managing Personal Access Token - GitHub docs guide to create a personal access token (classic) named GHCR_PAT
with the write:package
scope.
Export your token in as a variable. Replace<my_personal_access_token>
with your own token.
Execute the following command(s) in a terminal | |
---|---|
Authenticate to the Container Registry. Replace <my_username>
with your own username.
Execute the following command(s) in a terminal | |
---|---|
Build and push the image to the container regsitry¶
With the entrypoint script ready, we can now build the Docker image before pushing it to the Container Registry.
To build the docker image, navigate to the docker
folder and run the following command. Make sure to adjust the my_username
and my_repository_name
variables in the tag of the Docker image to match your own your own GitHub username and repository name.
Execute the following command(s) in a terminal | |
---|---|
Note
Please note that the --platform
parameter is important to set if your machine does not use the x86_64 architecture (like Apple Silicon). This is necessary because the runner, on which the Docker image will be deployed, operates on a 64-bit Linux environment.
The output should be similar to this:
Push the docker image to the GitHub Container Registry:
Execute the following command(s) in a terminal | |
---|---|
Adjust image visibility¶
Make sure to set the image visibility to Public
in the GitHub Container Registry settings.
In your repository page, click on Packages on the right hand side, then on your github-runner package. In Package settings in the Danger Zone section, choose Change package visibility and set the package to public.
Configure security¶
It is important to understand that using a self-hosted runner allows other users to execute code on your infrastructure. Specifically, forks of your public repository will trigger the workflow when a pull request is created.
Consequently, other users can potentially run malicious code on your self-hosted runner machine by executing a workflow.
While our self-hosted runner will be set up in a containerized, isolated environment that limits the impact of any malicious code, unwanted pull requests in forks could still exhaust the computational resources for which you are responsible.
To mitigate these risks, it is advisable to secure your runner by disabling workflow triggers by forks. In the repository, go to Settings > Actions > General. In the Fork pull request workflows section, ensure the Run workflows from fork pull requests checkbox is disabled and click on Save.
Danger
Make sure to secure your runner and restrict access to the repository. For more information, see Self-hosted runner security.
More generally, it is recommended that you only use self-hosted runners with private repositories.
You can change the repository visibility in Settings > General. In the Danger Zone section, choose Change visibility and set the repository to private.
Set the self-hosted runner¶
We will now deploy our self-hosted GitHub runner to our Kubernetes cluster with the help of a YAML configuration file. As a reminder, the runner is used to execute the GitHub Action workflows defined in the repository.
The runner will use the custom Docker image that we pushed to the GitHub Container Registry. This image is identified by the label named GITHUB_RUNNER_LABEL
which is set to the value base-runner
.
Create a new file called runner.yaml
in the kubernetes
directory with the following content. Replace also <my_username>
and <my_repository_name>
with your own GitHub username and repository name.
Add Kubeconfig secret¶
To enable the registration of the self-hosted runner, authentication via a secret is required. Initially, you need to generate a Personal Access Token (PAT) to authenticate with the GitHub repository. This token will subsequently be used to create a secret, allowing the use of the kubectl
command on your machine.
Follow the Managing Personal Access Token - GitHub docs guide to create a personal access token (classic) named GH_RUNNER_PAT
with the repo
and read:org
scopes
Export your token in as a variable. Replace <my_repository_token>
with your own token.
Run the following command to create the secret:
Execute the following command(s) in a terminal | |
---|---|
The created secret is stored within the Kubernetes cluster itself. As such, the secret is securely kept within the cluster and can be accessed by Kubernetes components running in that cluster.
Deploy the runner¶
To deploy the runner to the Kubernetes cluster, run the following command:
Execute the following command(s) in a terminal | |
---|---|
This will deploy a GitHub runner pod named github-runner
in your current Kubernetes namespace. The runner will automatically register itself to the repository.
You can check the status of the pod with the following command:
Execute the following command(s) in a terminal | |
---|---|
The output should be similar to this:
Info
This can take several minutes.
You can connect to the pod once it is running with:
Execute the following command(s) in a terminal | |
---|---|
You can then check the runner logs with:
Execute the following command(s) in a terminal | |
---|---|
The output should be similar to this:
Exit the process by pressing Ctrl+C in the terminal, then exit the pod by entering exit
.
In addtion, in Settings > Actions > Runners, you should now be able to see the github-runner
runner listed with the Idle status.
Note
To remove the runner from the Kubernetes cluster, run the following command:
Execute the following command(s) in a terminal | |
---|---|
The runner will also automatically be unregistered from the repository.
Update the CI/CD configuration file¶
You will now update the CI/CD configuration file to initiate a runner on the Kubernetes cluster, which will be responsible for training the model. The trained model will be uploaded to the remote bucket using DVC, making it available for publishing and deployment.
Info
Kubernetes restricts the direct execution of Docker within a container due to security and architectural reasons. As a result, only the training and reporting steps will be executed on the self-hosted runner. The trained model will be accessible to the main runner running on a traditional virtual machine via the remote bucket using DVC. This environment, which supports Docker, allows the model artifact to be built, containerized, and stored in the container registry prior to deployment.
Tip
For those interested in fully utilizing a self-hosted runner, including the Dockerization of the trained model, tools like KubeVirt and Kaniko can be employed. These tools can be particularly beneficial for scenarios involving the use of a complete on-premise infrastructure or strong data privacy.
Additionally, since the experiment is now being trained directly from the CI/CD pipeline, the workflow will be modified to automatically push the results to the remote storage using DVC and to commit the updated lock file to the repository automatically.
As a result, when proposing changes to the model files in a branch, you no longer need to rundvc repro
locally before pushing the changes with git push
. After proposed changes are integrated into the main branch, you can obtain the updated dvc.lock
file and model by using git pull
and dvc pull
.
Update the .github/workflows/mlops.yaml
file.
Take some time to understand the new steps:
.github/workflows/mlops.yaml | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
|
Here, the following should be noted:
- the
train-report
job runs on the self-hosted runner on pull requests. It trains the model and DVC pushes the trained model to the remote bucket. - the
publish-and-deploy
runs on the main runner when merging pull requests. It retrieves the model with DVC, containerizes then deploys the model artifact.
Check the differences with Git to validate the changes.
Execute the following command(s) in a terminal | |
---|---|
The output should be similar to this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 |
|
Take some time to understand the changes made to the file.
Check the changes¶
Check the changes with Git to ensure that all the necessary files are tracked.
Execute the following command(s) in a terminal | |
---|---|
The output should look like this.
Push the CI/CD pipeline configuration file to Git¶
Push the CI/CD pipeline configuration file to Git.
Execute the following command(s) in a terminal | |
---|---|
Check the results¶
On GitHub, you can see the pipeline running on the Actions page.
On GitLab, you can see the pipeline running on the CI/CD > Pipelines page.
On Google Cloud Console, you can see that the self-hosted runner has been created on the Kubernetes Engine > Workloads page.
This guide has been written with Google Cloud in mind. We are open to contributions to add support for other cloud providers such as Amazon Web Services, Exoscale, Microsoft Azure or Self-hosted Kubernetes but we might not officially support them.
If you want to contribute, please open an issue or a pull request on the GitHub repository. Your help is greatly appreciated!
This chapter is done, you can check the summary.
Summary¶
Congratulations! You now can train your model on on a custom infrastructure with custom hardware for specific use-cases.
In this chapter, you have successfully:
- Created a self-hosted runner Docker container image
- Published the containerized runner image to the container registry
- Deployed the self-hosted runner on Kubernetes
- Trained the model on a specialized pod on the Kubernetes cluster
State of the MLOps process¶
- Notebook has been transformed into scripts for production
- Codebase and dataset are versioned
- Steps used to create the model are documented and can be re-executed
- Changes done to a model can be visualized with parameters, metrics and plots to identify differences between iterations
- Codebase can be shared and improved by multiple developers
- Dataset can be shared among the developers and is placed in the right directory in order to run the experiment
- Experiment can be executed on a clean machine with the help of a CI/CD pipeline
- CI/CD pipeline is triggered on pull requests and reports the results of the experiment
- Changes to model can be thoroughly reviewed and discussed before integrating them into the codebase
- Model can be saved and loaded with all required artifacts for future usage
- Model can be easily used outside of the experiment context
- Model publication to the artifact registry is automated
- Model can be accessed from a Kubernetes cluster
- Model is continuously deployed with the CI/CD
- Model can be trained on a custom infrastructure
- Model can be trained on a custom infrastructure with custom hardware for specific use-cases
You can now safely continue to the next chapter of this guide concluding your journey and the next things you could do with your model.
Sources¶
Highly inspired by:
- Adding self-hosted runners - GitHub docs
- GitHub Actions self-hosted runners on Google Cloud - github.blog
- Self-hosted runner security - GitHubdocs
- Security for self-managed runners - GitLab docs
- Install kubectl and configure cluster access - cloud.google.com
- Deploying to Google Kubernetes Engine - GitHub docs