Welcome to the Stray SDK documentation!
Rotated bounding box detection using Detectron2
This tutorial will show you:
How to produce rotated bounding box labels using labelme (https://github.com/wkentaro/labelme) for custom datasets
How to configure and set up training for rotated bounding box detection using Detectron2
How to visualize predictions and labels
We are using labelme (https://github.com/wkentaro/labelme) polygon annotation to label the container ships in the images.
Data labeling
Install labelme
Labelme offers various options for installing the labeling GUI, please refer to the instructions here: https://github.com/wkentaro/labelme
Creating polygons
Start labelme with labelme --autosave --nodata --keep-prev
. The GUI allows you to select the images to label one-by-one or based on a directory. It is highly recommended to place the images to label in a single directory, since a json file with the labels will be produced in the same location as the file.
The --autosave
flag enables auto saving when moving from image to image The --nodata
flag skips saving the actual image data in the json file that is produced for every image. Using --keep-prev
can be considered optional, but it is very useful if the images are for example consecutive frames from a video since the option copies the labels from the previously labeled image to the current image.
You can create the polygon annotations with the “Create polygons” option in the GUI. The polygons will be used to determine the minimum-area rotated bounding box in the next step as a part of the model training. The logic to do this is provided out-of-the-box in the stray
package that is used in the example scripts.
This is an example of a json file that is produced for each image (in this case the file is called 1.json
):
{
"version": "5.0.1",
"flags": {},
"shapes": [
{
"label": "ship",
"points": [
[
239.0990099009901,
420.2970297029703
],
[
423.25742574257424,
338.1188118811881
],
[
444.54455445544556,
345.049504950495
],
[
434.64356435643566,
365.84158415841586
],
[
253.45544554455444,
446.53465346534654
]
],
"group_id": null,
"shape_type": "polygon",
"flags": {}
}
],
"imagePath": "1.png",
"imageData": null,
"imageHeight": 480,
"imageWidth": 640
}
Installing Detectron2
Before we can visualize the annotations using visualization tools provided by Detectron2 and training the model, we need to install the package. Warning: this step may cause headaches.
Install PyTorch, OpenCV, and Detectron2
Before installing Detectron2, we need to have PyTorch installed. This means that we can’t provide a clean requirements.txt
file with all the dependencies as there is no way to tell pip
in which order to install the listed packages. Detectron also does not include the dependency in their install requirements for compatibility reasons.
Depending on whether you want to use a CPU or a GPU (if available) with Detectron2, install the proper version from https://pytorch.org/. The Detectron2 installation documentation also offers some background and debug steps if there are issues.
After installing torch
, install Detectron2 using the instructions in the Detectron2 installation documentation.
Install rest of the dependencies with pip install -r requirements.txt
.
Test the installation and visualize the dataset
The polygons are used to determine the rotated bounding boxes.
To see if everything works properly, you can run the visualization script (from stray/examples/detectron2) with python visualize_dataset.py <path-to-dataset>
to visualize the annotations. As you can see the polygons are turned into rotated bounding boxes in the data loading step.
Training the model
To run the training, run python train_rotated_bbox.py <path-to-dataset> --num-gpus <gpus>
. The script and the rotated_bbox_config.yaml
file contain various ways to configure the training, see the files for details. By default, the final and intermediate weights of the model are saved in the current working directory (model_*.pth
).
Predictions
Predictions from the trained model.
To visualize predictions from the trained model, run python visualize_predictions <path-to-dataset> --weights <path-to-pth-model>
.
Automatic pick and place with computer vision and RoboDK
The complete pick and place example.
This example shows how to build datasets with the Stray Robots Toolkit for picking and placing cardboard boxes of variable sizes within a RoboDK simulation environment. The souce code is available here: https://github.com/StrayRobots/stray/tree/main/examples/robodk.
The purpose of this simulation is to showcase how computer vision can be used in dynamic pick and place applications. In this case, our robot is tasked to move boxes rolling on one conveyor belt to the other.
The robot waits for boxes to come down the conveyor belt, where it detects the top corners of the box. We then compute a picking location at the center top of the cardboard box and command the robot to pick it up with its suction cup gripper. The robot then moves the box over to the other conveyor belt and starts over.
Project structure
The main files in this project are:
pick.py
Contains the main script containing the picking logic.
scan.py
This script is used to collect datasets for training the object detector.
detect.py
Contains the object detection logic.
simulation.py
Contains the simulation logic for the conveyor belt, reseting and spawning boxes etc.
train.py
Contains the logic for training a detection model.
convert_model.py
Contains the logic for converting a PyTorch Lightning checkpoint into a serialized model that can be run in production.
picking_setup.rdk
This is the RoboDK simulation environment file.
model.pt
>https://github.com/StrayRobots/strayA pretrained pytorch object detection model.
Installing dependencies
We recommend using Anaconda for package management. To create a new environment and install the dependencies, run:
conda create -n robodk python=3.8 && conda activate robodk
pip install -r requirements.txt
Follow the instructions here to install the Stray Robots Toolkit.
Collecting a dataset and training an object detector
As is common these days, the object detection algorithm used is learning based. To train this algorithm, we need to collect example data from the robot’s workspace and annotate it to teach our robot to recognize the objects we want to pick. The rest of this section will show you how to collect a custom dataset. Alternatively you can download a sample dataset to proceed to model training and the next section. The sample includes scans of 20 boxes.
First open the simulation in the RoboDK UI by opening RoboDK, then select File > Open… and open the picking_setup.rdk
file.
To collect data, we provide the script scan.py
which runs the simulation. It can be triggerred to stop the production line and scan the current state of the line. This is done by running the following command while the picking simulation is open in the RoboDK UI:
python scan.py <out>
To stop the line and scan the conveyor belt, press the s
key.
The scans are saved in the path given by out
. For each performed scan, a subdirectory will be created within that directory which will contain the captured color and depth images, along with the camera poses.
After scanning, we need to process the scans to compute the 3D representation from the captured camera and depth images. This is done using the stray integrate
command. Run it with:
stray integrate <out> --skip-mapping --voxel-size 0.005
As in this case, our camera is mounted on our calibrated robot, we use --skip-mapping
parameter to tell the system that we know the camera poses and that these do not have to be inferred.
We then annotate each of the scans by opening the up in the Stray Studio user interface. A scan can be opened with:
stray studio boxes/0001
In this case, we want to detect the top corners of each box. Therefore, we opt to annotate each scanned box with a rectangle annotation type. Here is what an annotated scan looks like:
Training a picking model
Once the dataset is collected (or downloading the sample) we can go ahead and train the model. Alternatively you can also use the pretrained model and proceed to the next section.
The model training can be run with
python train.py <out> --eval-folder <path-to-eval-data> --num-epochs <num-epochs> --primitive rectangle --num-instances 1
The main available paramaters are:
out
Path to the directory containing the scans
--eval-folder
Path to the directory containing evaluation scans, for testing purposes this can be the same as
out
--num-epochs
For how many iterations the model should be trained. 100 is enough for the example dataset of 20 scans.
--primitive
Which primitive type to use for determining the keypoints on the box. In this example we use
rectangle
.
--num-instances
How many instances of
primitive
should be detected. The instances should be labeled with unique instance ids in Studio, ranging from0
tonum-instances - 1
. In this case there is only one instance per scene/image.
--batch-size (default 4)
Batch size to use during training, adjust this as high as possible as long as there are no memory errors
For additional settings refer to the train.py
file.
The training is implemented using PyTorch Lightning. Logs of the training are saved to ./train_logs
but this can be adjusted with the --logs
flag. Intermediate versions of the model are saved to ./train_logs/default/version_x/checkpoints
.
Once the training is completed, we can pick one of the checkpoints from ./train_logs/default/version_x/checkpoints
and convert it into a serialized model that can be used in production and the example picking script.
The model can be converted with python convert_model.py <path-to-checkpoint>
and it will be saved as model.pt
.
Running the picking simulation with the trained model
Again, open the simulation in the RoboDK UI by opening RoboDK, then select File > Open… and open the picking_setup.rdk
file.
To run the simulation with the trained model, use the command python pick.py --model model.pt
.
You should now see the simulation running with the robot picking boxes and moving them over to the other conveyor belt.