Fully-Convolutional Siamese Networks for Object Tracking

Luca Bertinetto *, Jack Valmadre *, João F. Henriques, Andrea Vedaldi, Philip H.S. Torr

University of Oxford

{name.surname}@eng.ox.ac.uk

 

News: SiamFC ranks very high in all 6 new benchmarks presented in 2018: GOT-10k, LaSoT, OxUvA, TrackinNet, VOT-LT and TLP.

Note: the old SiamFC implementation has now been discontinued, please use the SiamFC conv5 baseline of CVPR'17 CFNet.

News: we won the VOT-17 real-time challenge (conv5 baseline of CVPR'17 CFNet)

News: added VOT-17 results and TraX-compatible code.

 
pipeline picture
   

The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object's appearance exclusively online, using as sole training data the video itself. Despite the success of these methods, their online-only approach inherently limits the richness of the model they can learn. Recently, several attempts have been made to exploit the expressive power of deep convolutional networks. However, when the object to track is not known beforehand, it is necessary to perform Stochastic Gradient Descent online to adapt the weights of the network, severely compromising the speed of the system. In this paper we equip a basic tracking algorithm with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video. Our tracker operates at frame-rates beyond real-time and, despite its extreme simplicity, achieves state-of-the-art performance in multiple benchmarks.

* equal contribution

 

We refer to the most recent version as SiamFC v2. It is described in our CFNet CVPR'17 paper as "baseline conv5". For data, raw results and pretrained networks of SiamFC please check the CFNet project page.

 

Paper (v1/ECCVw'16) [bibtex]

 

Paper (v2/CVPR'17)

 

Code (v2)

 

TensorFlow port (v2, inference/tracking only) 

 

▸ Pre-trained networks (v1) [ color ] [ color+gray ]

 

▸ Results (v1) [ OTB-13 (v1)] [ OTB-100 (v1)] [ VOT-17 (v2)]

 

Code (v2) compatible with the VOT TraX protocol  

 

High level overview  (@ ReWork DL in Healthcare London summit)

 

▸ Example videos (appearance learnt from first frame only!)