Chapter 3.2 - Serve the model locally with BentoML¶
Introduction¶
Now that the model is using BentoML, enabling the extraction of metadata upon saving, you will serve the model with the help of FastAPI to create local endpoints for interacting with the model.
In this chapter, you will learn how to:
- Serve the model with BentoML and FastAPI
- Push the changes to DVC and Git
The following diagram illustrates the control flow of the experiment at the end of this chapter:
flowchart TB
dot_dvc[(.dvc)] <-->|dvc pull
dvc push| s3_storage[(S3 Storage)]
dot_git[(.git)] <-->|git pull
git push| gitGraph[Git Remote]
workspaceGraph <-....-> dot_git
data[data/raw]
subgraph remoteGraph[REMOTE]
s3_storage
subgraph gitGraph[Git Remote]
repository[(Repository)] --> action[Action]
action[Action] --> |...|request[PR]
request --> repository[(Repository)]
end
end
subgraph cacheGraph[CACHE]
dot_dvc
dot_git
end
subgraph workspaceGraph[WORKSPACE]
data --> code[*.py]
subgraph dvcGraph["dvc.yaml"]
code
end
params[params.yaml] -.- code
subgraph bentoGraph[" "]
bento_model[classifier.bentomodel]
serve[serve.py] <--> bento_model
fastapi[FastAPI] <--> |bentoml serve serve:classifierService| serve
end
bento_model <-.-> dot_dvc
code <--> bento_model
end
subgraph browserGraph[BROWSER]
localhost <--> fastapi
end
style workspaceGraph opacity:0.4,color:#7f7f7f80
style dvcGraph opacity:0.4,color:#7f7f7f80
style cacheGraph opacity:0.4,color:#7f7f7f80
style data opacity:0.4,color:#7f7f7f80
style dot_git opacity:0.4,color:#7f7f7f80
style dot_dvc opacity:0.4,color:#7f7f7f80
style code opacity:0.4,color:#7f7f7f80
style params opacity:0.4,color:#7f7f7f80
style s3_storage opacity:0.4,color:#7f7f7f80
style repository opacity:0.4,color:#7f7f7f80
style action opacity:0.4,color:#7f7f7f80
style request opacity:0.4,color:#7f7f7f80
style remoteGraph opacity:0.4,color:#7f7f7f80
style gitGraph opacity:0.4,color:#7f7f7f80
linkStyle 0 opacity:0.4,color:#7f7f7f80
linkStyle 1 opacity:0.4,color:#7f7f7f80
linkStyle 2 opacity:0.4,color:#7f7f7f80
linkStyle 3 opacity:0.4,color:#7f7f7f80
linkStyle 4 opacity:0.4,color:#7f7f7f80
linkStyle 5 opacity:0.4,color:#7f7f7f80
linkStyle 6 opacity:0.4,color:#7f7f7f80
linkStyle 7 opacity:0.4,color:#7f7f7f80
linkStyle 10 opacity:0.4,color:#7f7f7f80
linkStyle 11 opacity:0.4,color:#7f7f7f80
linkStyle 12 opacity:0.4,color:#7f7f7f80
Steps¶
Create the BentoML service¶
BentoML services allow to define the serving logic of machine learning models.
A BentoML service is a class that defines all the endpoints and the logic to serve the model using FastAPI.
Create a new file src/serve.py
and add the following code:
This service will be used to serve the model with FastAPI and will do the following:
- The model is loaded from the BentoML model store
- The
preprocess
function is loaded from the model's custom objects - The
postprocess
function is loaded from the model's custom objects - The
predict
method is decorated with@bentoml.api()
to create an endpoint - The endpoint accepts an image as input
- The endpoint returns a JSON response
- The image is pre-processed
- The predictions are made from the model
- The predictions are post-processed and returned as a JSON string
Serve the model¶
Serve the model with the following command:
Execute the following command(s) in a terminal | |
---|---|
BentoML will load the model, create the FastAPI app and start it. You can then access the auto-generated model documentation on http://localhost:3000.
The following endpoint has been created:
/predict
: Upload apng
orjpg
image and get a prediction from the model.
You can try out predictions by inputing some images to the model through the REST API!
Try out the prediction endpoint¶
The following images are available in the extra-data
repository that you will use in a future chapter: https://github.com/swiss-ai-center/a-guide-to-mlops/tree/extra-data/extra_data.
Here are some example you can use.
Warning
Please be aware that this model is for demonstration purposes. Some inputs may be incorrectly predicted.
Moon example¶
Download the following image of the moon on your computer.
Upload it to the /predict
endpoint and check the prediction.
The output should be similar to this:
Makemake example¶
Download the following image of Makemake on your computer.
Upload it to the /predict
endpoint and check the prediction.
The output should be similar to this:
Neptune example¶
Download the following image of Neptune on your computer.
Upload it to the /predict
endpoint and check the prediction.
The output should be similar to this: You may notice the model got it wrong and predicted Uranus instead!
Check the changes¶
Check the changes with Git to ensure that all the necessary files are tracked.
Execute the following command(s) in a terminal | |
---|---|
The output should look like this.
Commit the changes to Git¶
Commit the changes to Git.
Execute the following command(s) in a terminal | |
---|---|
Check the results¶
Congratulations! You now have a model served over a REST API!
This chapter is done, you can check the summary.
Summary¶
In this chapter, you have successfully:
- Served the model with BentoML and FastAPI
- Pushed the changes to Git
You did fix some of the previous issues:
- Model can be easily used outside of the experiment context
You could serve this model from anywhere. Additional services could submit predictions to your model. The usage of FastAPI creates endpoints that are automatically documented to interact with the model.
You can now safely continue to the next chapter.
State of the MLOps process¶
- Notebook has been transformed into scripts for production
- Codebase and dataset are versioned
- Steps used to create the model are documented and can be re-executed
- Changes done to a model can be visualized with parameters, metrics and plots to identify differences between iterations
- Codebase can be shared and improved by multiple developers
- Dataset can be shared among the developers and is placed in the right directory in order to run the experiment
- Experiment can be executed on a clean machine with the help of a CI/CD pipeline
- CI/CD pipeline is triggered on pull requests and reports the results of the experiment
- Changes to model can be thoroughly reviewed and discussed before integrating them into the codebase
- Model can be saved and loaded with all required artifacts for future usage
- Model can be easily used outside of the experiment context
- Model requires manual publication to the artifact registry
- Model is not accessible on the Internet and cannot be used anywhere
- Model requires manual deployment on the cluster
- Model cannot be trained on hardware other than the local machine
- Model cannot be trained on custom hardware for specific use-cases
You will address these issues in the next chapters for improved efficiency and collaboration. Continue the guide to learn how.
Sources¶
Highly inspired by: