Kalman and Optical Flow Tracking

Implementation of KOFT from [9]. It uses Optical-flow enhanced kalman filters to link detections through time. It usually outperforms the Kalman linker with optical-flow.

class byotrack.implementation.linker.frame_by_frame.koft.KOFTLinkerParameters(association_threshold: float, *, detection_std: float | Tensor = 3.0, flow_std: float | Tensor = 1.0, process_std: float | Tensor = 1.5, kalman_order: int = 1, n_valid=3, n_gap=3, association_method: str | AssociationMethod = AssociationMethod.OPT_SMOOTH, anisotropy: Tuple[float, float, float] = (1.0, 1.0, 1.0), cost: str | Cost = Cost.EUCLIDEAN, track_building: str | TrackBuilding = TrackBuilding.FILTERED, split_factor: float = 0.0, merge_factor: float = 0.0, extract_flows_on_detections=False, always_measure_velocity=True, online_process_std=0.0, initial_std_factor=10.0)

Bases: KalmanLinkerParameters

Parameters of KOFTLinker

Note

The merging and splitting features is still experimental.

association_threshold

This is the main hyperparameter, it defines the threshold on the distance used not to link tracks with detections. It prevents to link with false positive detections.

Type:: float

detection_std

Expected measurement noise on the detection process. The detection process is modeled with a Gaussian noise with this given std. (You can provide a different noise for each dimension). See torch_kf.ckf.constant_kalman_filter. Default: 3.0 pixels

Type:: Union[float, torch.Tensor]

flow_std

Expected measurement noise on the optical flow process. The optical flow process is modeled with a Gaussian noise with this given std. (You can provide a different noise for each dimension). Default: 1.0 pixels

Type:: Union[float, torch.Tensor]

process_std

Expected process noise. See torch_kf.ckf.constant_kalman_filter, the process is modeled as constant order-th derivative motion. This quantify how much the supposely “constant” order-th derivative can change between two consecutive frames. A common rule of thumb is to use 3 * process_std ~= max_t(| dx^(order)(t+1) - dx^(order)(t)|). It can be provided for each dimension). Default: 1.5 pixels

Type:: Union[float, torch.Tensor]

kalman_order

Order of the Kalman filter to use. 0 is for brownian motion (it predicts a 0 velocity) 1 for directed brownian motion, 2 for accelerated brownian motions, etc… Default: 1

Type:: int

n_valid

Number associated detections required to validate the track after its creation. Default: 3

Type:: int

n_gap

Number of consecutive frames without association before the track termination. Default: 3

Type:: int

association_method

The frame-by-frame association to use. See AssociationMethod. It can be provided as a string. (Choice: GREEDY, [SPARSE_]OPT_HARD, [SPARSE_]OPT_SMOOTH) Default: OPT_SMOOTH

Type:: AssociationMethod

anisotropy

Anisotropy of images (Ratio of the pixel sizes for each axis, depth first). This will be used to scale distances. It will only impact EUCLIDEAN[_SQ] costs. For probabilistic cost, anisotropy should be already integrated in the stds of the kalman filter (providing one std for each dimension). Default: (1., 1., 1.)

Type:: Tuple[float, float, float]

cost_method

The cost method to use. It can be provided as a string. See CostMethod. It also indicates what is the correct unit of association_threshold. Default: EUCLIDEAN

Type:: CostMethod

track_building

Tells the linker how to build the final tracks. Either from detections, or from filtered/smoothed positions computed by the Kalman filter. See TrackBuilding. It can be provided as a string. Default: FILTERED

Type:: TrackBuilding

split_factor

Allow splitting of tracks, using a second association step. The association threshold in this case is split_factor * association_threshold. Default: 0.0 (No splits)

Type:: float

merge_factor

Allow merging of tracks, using a second association step. The association threshold in this case is merge_factor * association_threshold. Default: 0.0 (No merges)

Type:: float

extract_flows_on_detections

If True it extracts the optical flow at the detection location if possible. Otherwise it extract the flow from the curent estimate of the track position. Default: False

Type:: bool

always_measure_velocity

Update velocity for all tracks even non-linked ones. If set to False, it implements KOFT– from the paper. This is sub-optimal, you should keep it True. Default: True

Type:: bool

online_process_std

Recomputes the process std online following “A. Genovesio, et al, 2004, October. Adaptive gating in Gaussian Bayesian multi-target tracking. ICIP’04. (Vol. 1, pp. 147-150). IEEE.” Each track has its own process std depending on the errors made in the past. It automatically adjusts to process errors, allowing to increase the validation gate. Should be used in conjonction with MAHALANOBIS or LIKELIHOOD cost_method. As this may be detrimental, it is disabed by default. Default: 0.0 (Process_std is constant)

Type:: float

initial_std_factor

The uncertainties on initial velocities/accelerations are set to initial_std_factor * process_std. Having a small factor will prevent handling correctly starting tracks that already moves on their first frames. But large values will lead to large uncertainty on the first prediction, making it hard to associate to a detection with MAHALANOBIS or LIKELIHOOD methods. Typical values lies in 3.0 to 10.0. Default: 10.0

Type:: float

class byotrack.implementation.linker.frame_by_frame.koft.KOFTLinker(specs: KOFTLinkerParameters, optflow: OpticalFlow | None = None, features_extractor: FeaturesExtractor | None = None, save_all=False)

Bases: KalmanLinker

Kalman and Optical Flow Tracking [9]

Motion is modeled with a Kalman filter of a specified order >= 1 (See torch_kf.ckf) Positions are measured through the detection process. A second update step is performed to measure the velocity of all tracks using optical flow.

Matching is done to optimize the given cost.

Note

This implementation requires torch-kf. (pip install torch-kf)

See KalmanLinker for the other attributes.

specs

Parameters specifications of the algorithm. See KOFTLinkerParameters.

Type:: KOFTLinkerParameters

last_detections

The last detections used in update. Optionnaly used to extract flows at the detection positions and not the track state. Required for motion_model

Type:: byotrack.Detections

reset(dim=2) → None

Reset the linking algorithm

Flush all data stored from a previous linking and prepare a new linking.

Parameters:: dim (int) – The dimension of the data. Default: 2

motion_model() → None

Optional modelisation of motion for tracks

It can be used to update some internal state of the tracker after the optical flow computation and before the distance computation.

cost(_: ndarray, detections: Detections) → Tuple[Tensor, float]

Compute the association cost between active tracks and detections

It also returns the threshold to use (Depending on the dist you use, association_threshold could be related to a more meaning full quantity than the cost itself). For instance, when using a squared Euclidean distance, the association threshold could be express as the distance in pixel, and this function could square it. For likelihood association, you could provide the association threshold as a probability and use -log(threshold) as the true threshold. (See KalmanLinker and NearestNeighborLinker)

Parameters:

frame (np.ndarray) – The current frame of the video Shape: (H, W, C), dtype: float
detections (byotrack.Detections) – Detections for the given frame

Returns:

The cost matrix between active tracks and detections: Shape: (n_tracks, n_dets), dtype: float
float: The association threshold to use.: It can be different than self.association_threshold depeding on the dist build here

Return type:

torch.Tensor

post_association(_: ndarray, detections: Detections, active_mask: Tensor)

Update the internal state of the tracker after update_active_tracks

It should update any internal model/data. It is also responsible to register the position of each active track in all_positions for the current time frame.

Parameters:

frame (np.ndarray) – The current frame of the video Shape: (H, W, C), dtype: float
detections (byotrack.Detections) – Detections for the given frame
active_mask (torch.Tensor) – Boolean tensor indicating True for still active tracks Shape: (N_tracks), dtype: bool