Tools¶
Introduction to the tools used in this guide.
What are the tools used in this guide?¶
In this guide, you will use the following tools to demonstrate the MLOps process:
- Code management: Git
- Package management: pip
- Data management: DVC
- Model reproducibility: DVC
- Model tracking: DVC & CML
- Model orchestration: GitHub Actions or GitLab CI
- A Google Cloud account
- Model serving and distributing: BentoML and Docker
- Model deploying: Kubernetes
- Data annotation: Label Studio
Using another cloud provider? Read this!
This guide has been written with Google Cloud in mind. We are open to contributions to add support for other cloud providers such as Amazon Web Services, Exoscale, Microsoft Azure or Self-hosted Kubernetes but we might not officially support them.
If you want to contribute, please open an issue or a pull request on the GitHub repository. Your help is greatly appreciated!
You will go into details about each tool in the following parts of this guide.
Related tools¶
While this guide concentrates solely on the setup and utilization of the mentioned tools, it is worth noting that there are alternative tools available for each stage of the workflow.
Here is a list of related tools that can be explored as alternatives. Additionally, you can find another valuable compilation of tools at https://mlops.toys.
Data management¶
These are alternatives to DVC.
- LakeFS - Transform your data lake into a Git-like repository
- DagsHub - Open Source Data Science Collaboration
- DoltHub - DoltHub is where people collaboratively build, manage, and distribute structured data
- Delta Lake - An open-source storage framework that enables building a Lakehouse architecture with compute engines
Monitoring/tracking¶
These are alternatives to CML.
- GuildAi - An open source experiment tracking toolkit. Use it to build better machine learning models faster
- Aim - An open-source, self-hosted ML experiment tracking tool
- Evidently AI - A first-of-its-kind monitoring tool that makes debugging machine learning models simple and interactive
Data annotation¶
At the moment, Label Studio is the only solution that allows to annotate many kinds of data. Other competitors only allow a certain kind of data. Have a look at the awesome-data-labeling
Git repository for specific alternatives.
Model management/deployment¶
These are alternatives to BentoML.
- Kubeflow - The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable
- MLEM - The open-source tool to simplify your ML model deployments
- Cog - An open-source tool that lets you package machine learning models in a standard, production-ready container
End-to-end¶
These tools can be used to manage the entire lifecycle of the ML experiment. These tools were considered at the beginning of this document redaction. But as most of the tools are often opinionated and may lack the flexibility needed for the scope of this project, they were omitted.