Applying simple object detection and segmentation on a video using the Huggingface `transformers` library. Tracking across frames is performed using CSRT from OpenCV.