Chapter 4.1 - Set up Label Studio¶
Introduction¶
In the previous chapters, we successfully deployed and accessed our model on Kubernetes, set up continuous deployment with a CI/CD pipeline, and trained the model on a Kubernetes pod. Now, we will focus on labeling new data to further improve our model's performance.
The quality of data is crucial for any machine learning model. The saying "garbage in, garbage out" holds true: if the data fed into the model is of poor quality, the predictions of the model will also be poor. Therefore, giving careful attention to the data labeling process is essential to guarantee high-quality, accurate data.
In Supervised Learning tasks, collecting and labeling data is usually not a one-time task but an iterative process. Just as developing a machine learning model involves multiple iterations of training and parameter adjustments, the data collection and labeling process also requires continuous refinement. As new data becomes available and the requirements of the model evolve, additional rounds of data labeling and quality checks are necessary to maintain and improve the model's performance.
Label Studio is an open-source data labeling tool that supports various data types, including text, images, audio, and video. In this chapter, we will guide you through setting up Label Studio in your environment. This includes installing the necessary dependencies, configuring the tool, and preparing it for data labeling tasks.
In this chapter, you will learn how to:
- Set up Label Studio to have a fully functional instance ready to label new data
- Import supplemental data for labeling
The new data will be used in subsequent chapters to retrain and improve your model.
The following diagram illustrates the control flow of the experiment at the end of this chapter:
flowchart TB
extra_data -->|upload| labelStudioTasks
subgraph workspaceGraph[WORKSPACE]
extra_data[extra-data/extra_data]
end
subgraph labelStudioGraph[LABEL STUDIO]
labelStudioTasks[Tasks]
end
Steps¶
Download the Data¶
Before configuring Label Studio, you will need to download additional data used for labeling.
Execute the following command(s) in a terminal | |
---|---|
The downloaded archive must be decompressed and renamed:
Execute the following command(s) in a terminal | |
---|---|
Finally, add the extra-data
folder to the .gitignore
file:
Execute the following command(s) in a terminal | |
---|---|
Check the differences with Git to validate the changes:
Execute the following command(s) in a terminal | |
---|---|
The output should be similar to this:
Install Label Studio¶
Next, we will install Label Studio in our environment. Add the main label-studio
dependency to the requirements.txt
file:
requirements.txt | |
---|---|
Check the differences with Git to validate the changes:
Execute the following command(s) in a terminal | |
---|---|
The output should be similar to this:
Install the package and update the freeze file.
Warning
Prior to running any pip commands, it is crucial to ensure the virtual environment is activated to avoid potential conflicts with system-wide Python packages.
To check its status, simply run pip -V
. If the virtual environment is active, the output will show the path to the virtual environment's Python executable. If it is not, you can activate it with source .venv/bin/activate
.
Execute the following command(s) in a terminal | |
---|---|
Commit the changes to Git¶
Commit the changes to Git.
Execute the following command(s) in a terminal | |
---|---|
Start Label Studio¶
You can now start label studio with the following command:
Label Studio will start on http://localhost:8080. Open the URL in your browser and sign up for an account.
Note
The account creation is completely offline and local. It is not related to any service or enterprise offer from Label Studio. This is only done once to create an ID locally.
Create a New Project¶
Once you have signed up, you can create a new project in Label Studio:
- Click Create Project to create a project.
-
Give your project a name (ex:
MLOps Guide
). -
Select the Data Import tab and click on the Upload File button. Select all the images from the
extra-data/extra_data
folder you downloaded earlier.Tip for WSL2 users
The Linux distribution is accessible through the
\\wsl.localhost\
address in the file explorer. The current directory can also be opened directly from the shell with theexplorer.exe .
command. -
Select the Labeling Setup tab and choose Image Classification under the Computer Vision menu.
-
Under Labeling Interface select Code and paste the following configuration:
Here we simply define the choices for the image classification task.
Info
You can read more about the Label Studio configuration in the official documentation.
The configuration should look like this:
-
Click Save to create the project.
Summary¶
Congratulations! You have successfully set up Label Studio in your environment and imported new data. You are now ready to start labeling your data!
State of the labeling process¶
- Labeling of supplemental data needs to be systematic and uniform
- Labeling of supplemental data is time intensive
- Model needs to be retrained using higher-quality data