Kalman Linker

Implementation of SKT from [9]

class byotrack.implementation.linker.frame_by_frame.kalman_linker.Cost(*values)

Bases: Enum

The cost to use for association.

LIKELIHOOD
Build the cost matrix to maximize the likelihood of the association. The cost is defined as: the negative log likelihood that the detections j is from track i. C[i, j] = - log P(det_j | track_i) The cost limit is expected to be the smallest probability to accept.
MAHALANOBIS
Mahalanobis distance, using the track uncertainty to correct the euclidean distance. The cost limit is expected to be the largest Mahalanobis distance to accept.
MAHALANOBIS_SQ
Squared Mahalanobis distance. The cost limit is expected to be the largest Mahalanobis distance to accept. (not squared)
EUCLIDEAN
Euclidean distance. The cost limit is expected to be the largest Euclidean distance to accept.
EUCLIDEAN_SQ
Squared euclidean distance. The cost limit is expected to be the largest Euclidean distance to accept. (not squared)

class byotrack.implementation.linker.frame_by_frame.kalman_linker.TrackBuilding(*values)

Bases: Enum

How to build the final tracks.

DETECTION
Build tracks from detections without filtering nor filling gaps
FILTERED
Build tracks from the Kalman filter outputs
SMOOTHED
Add a backward run of the Kalman filter (RTS) to smooth tracks position

class byotrack.implementation.linker.frame_by_frame.kalman_linker.KalmanLinkerParameters(association_threshold: float = -1.0, *, detection_std: float | Tensor = 0.0, process_std: float | Tensor = 0.0, kalman_order: int = 1, n_valid=3, n_gap=3, association_method: str | AssociationMethod = AssociationMethod.SPARSE_OPT_SMOOTH, anisotropy: tuple[float, float, float] = (1.0, 1.0, 1.0), cost: str | Cost = Cost.LIKELIHOOD, track_building: str | TrackBuilding = TrackBuilding.SMOOTHED, split_factor: float = 0.0, merge_factor: float = 0.0, online_process_std: float = 0.0, initial_std_factor: float = 5.0)

Bases: FrameByFrameLinkerParameters

Parameters of KalmanLinker.

Note

Most parameters can be estimated automatically from the detections using estimate.

association_threshold

This is the main hyperparameter, it defines the threshold on the distance used not to link tracks with detections. A low threshold will typically reduce wrong assignments and ID-switches, but may increase track fragmentation. Higher values will reduce track fragmentation, but miss-detected tracks may be linked to a wrong detection. Depending on cost, it is either expressed the maximum Euclidean distance (pixels), or the maximum Mahalanobis distance, or the minimum likelihood (probability). Default: -1.0 (to be estimated, see estimate.)

Type:: float

detection_std

Expected measurement noise (in pixel) on the detection process. The detection process is modeled with a Gaussian noise with this given std. You may provide a different noise for each dimension. See torch_kf.ckf.constant_kalman_filter. Default: 0.0 (to be estimated, see estimate.)

Type:: float | torch.Tensor

process_std

Expected process noise (in pixel). See torch_kf.ckf.constant_kalman_filter The process is modeled as constant order-th derivative motion with a Gaussian noise. This quantify how much the supposedly “constant” order-th derivative can change between two consecutive frames. A common rule of thumb is to use 4 * process_std ~= max_t(| dx^(order)(t+1) - dx^(order)(t)|) (see estimate_process_std_from_tracks). It can be provided for each dimension. Default: 0.0 (to be estimated, see estimate)

Type:: float | torch.Tensor

kalman_order

Order of the Kalman filter to use. 0 for brownian motions, 1 for directed brownian motion, 2 for accelerated brownian motions, etc… Default: 1

Type:: int

n_valid

Number of detections required to validate the track after its creation. If a track is missed during its first n_valid frames, it is dropped. This provides robustness to false positive detections. With no false positives, it can be set to 1 (a detection always belongs to a track). Highers values allow to remove non time-consistent false positives, but may prune real tracks that have been miss-detected. Default: 3

Type:: int

n_gap

Number of consecutive frames without any association (miss-detected) before the track termination. This provides robustness to false negative detections. Without any false negatives, it can be set to 0. Higher values allow to support larger gaps in the track, but may lead to wrong assignments. Default: 3

Type:: int

association_method

The frame-by-frame association to use. See AssociationMethod. It can be provided as a string. Choice: GREEDY, OPT_HARD, OPT_SMOOTH, SPARSE_OPT_HARD, SPARSE_OPT_SMOOTH. Default: SPARSE_OPT_SMOOTH

Type:: AssociationMethod

anisotropy

Anisotropy of images (Ratio of the pixel sizes for each axis, depth first). This will be used to scale distances. Note that it will only impact EUCLIDEAN[_SQ] costs; for probabilistic cost, anisotropy should be already integrated within the stds of the kalman filter (providing one std for each dimension). Default: (1., 1., 1.)

Type:: tuple[float, float, float]

cost_method

The cost method to use. See CostMethod. It can be provided as a string. Choice: EUCLIDEAN, EUCLIDEAN_SQ, MAHALANOBIS, MAHALANOBIS_SQ, LIKELIHOOD. This also defines the unit of association_threshold (in pixels for Euclidean, no units for Mahalanobis, and a probability for likelihood). Default: LIKELIHOOD

Type:: CostMethod

track_building

How the linker will build the final tracks. See TrackBuilding. Either from detections, or from filtered/smoothed positions computed by the Kalman filter. It can be provided as a string. Choice: DETECTION, FILTERED, SMOOTHED. Default: SMOOTHED

Type:: TrackBuilding

split_factor

Allow splitting of tracks, using a second association step. The association threshold in this case is split_factor * association_threshold. Default: 0.0 (No splits)

Type:: float

merge_factor

Allow merging of tracks, using a second association step. The association threshold in this case is merge_factor * association_threshold. Default: 0.0 (No merges)

Type:: float

online_process_std

Recomputes the process std online following “A. Genovesio, et al, 2004, October. Adaptive gating in Gaussian Bayesian multi-target tracking. ICIP’04. (Vol. 1, pp. 147-150). IEEE.” Each track has its own process std depending on the errors made in the past. It automatically adjusts to process errors, allowing to increase the validation gate. Should be used in conjunction with MAHALANOBIS or LIKELIHOOD cost_method. As this may be detrimental, it is disabled by default. Default: 0.0 (Process_std is constant)

Type:: float

initial_std_factor

The uncertainties on initial velocities/accelerations are set to initial_std_factor * process_std. See KalmanLinker.build_initial_covariance. Having a small factor will prevent handling correctly starting tracks with large initial velocity on their first frames. But large values will lead to large uncertainty on the first prediction, making it hard to associate to a detection with MAHALANOBIS or LIKELIHOOD methods. Typical values lies between 3.0 to 10.0. Default: 5.0

Type:: float

check(): Check the specification for invalid values.

estimate(detections_sequence: Sequence[byotrack.Detections]) → KalmanLinkerParameters

Estimate parameters from the given detections.

Estimation is triggered by providing negative dummy values for positive parameters. The dummy values are then replaced by their estimate.

Estimators:

detection_std: average_radius / 2 (i.e. localization is rarely predicted outside the target)
process_std: average_radius (i.e. unmodeled motion is ~the size of targets) (Consider using estimate_process_std_from_tracks instead)
association_threshold: steady_state_covariance * 3 (See estimate_association_threshold).
anisotropy: Computed from statistics.anisotropy.
split_factor: 1.0 if the number of detection increase by more than 30% over the full sequence.
merge_factor: 1.0 if the number of detection decrease by more than 30% over the full sequence.

Parameters:: detections_sequence (Sequence[byotrack.Detections]) – Detections for the current sequence.
Returns:: self with updated parameters.
Return type:: NearestNeighborParameters

estimate_association_threshold(dim: int, mahalanobis_threshold: float = 3.0) → None

Estimate association_threshold based on the steady state covariance of the filter.

Modify in place association_threshold so that it cuts at mahalanobis_threshold when the filter is in its steady state (i.e. after a few non-missed assignments).

Parameters:

dim (int) – Dimension of the video (2D or 3D).
mahalanobis_threshold (float) – Threshold on the mahalanobis distance in the steady state. Default: 3.0

estimate_process_std_from_tracks(tracks: Collection[byotrack.Track], quantile=0.99993) → None

Estimate process_std based on the given tracks.

It modifies in place process_std so that it roughly fits the maximum unmodeled motion.

NOTE: Without annotations, you may set the process_std according to the following rule:: For kalman_order=0, it can be set to the maximum velocity (in pixel) divided by 4, where you manually estimate the velocity visually (a rough estimation is enough). For kalman_order=1, it can be similarly be set to the maximum variation of velocity between consecutive frames divided by 4, with a rough visual estimation of the velocity variation.

Parameters:

tracks (Collection[byotrack.Track]) – Partial ground-truth tracks. Note for a given kalman_order, only the given tracks with length >= kalman_order + 2 will be used. If these are manually annotated tracks, consider using a RTSSmoother to reduce the annotation noise.
quantile (float) – Quantile to extract the maximum value. Can be reduced to ignore some false positive links. Default: 0.99993

build_filter(dim: int) → KalmanFilter

Build the Kalman filter used by the Linker.

See torch_kf.ckf.constant_kalman_filter.

class byotrack.implementation.linker.frame_by_frame.kalman_linker.KalmanLinker(specs: KalmanLinkerParameters, optflow: OpticalFlow | None = None, features_extractor: FeaturesExtractor | None = None, *, save_all=False)

Bases: FrameByFrameLinker

Frame by frame linker using Kalman filters.

Motion is modeled with a Kalman filter of a specified order (See torch_kf.ckf) Matching is done to optimize the given cost. If optical flow is provided, it is used online to warp the predicted state positions of the kalman filter. This will work, but it is sub-optimal: consider using KOFTLinker that exploits in a finer way optical flow inside Kalman filters.

This is an implementation of Simple Kalman Tracking (SKT) from KOFT [9].

See FrameByFrameLinker for the other attributes.

specs

Parameters specifications of the algorithm. See KalmanLinkerParameters.

Type:: KalmanLinkerParameters

kalman_filter

The Kalman filter.

Type:: torch_kf.KalmanFilter

active_states

The Kalman filter estimation for each track. Shape: mean=(N, D * (order + 1), 1), covariance=(N, D * (order + 1), dim * (order + 1)) dtype: float

Type:: torch_kf.GaussianState

projections

The Kalman filter projection for each track. Shape: mean=(N, D, 1), covariance=(N, D, D), precision=(N, D, D) dtype: float

Type:: torch_kf.GaussianState

process_noises

The Kalman filter process noise for each track. Only used when online_process_std > 0.0. It allows to compute an adaptative process_std and therefore gating for each track. Shape: (N, D, 1), dtype: float

Type:: torch.Tensor

all_states

The Kalman filter estimation for each track at each seen frame. States are only registered when save_all=True or if you build tracks from RTS smoothing. Shape: mean=(N, D * (order + 1), 1), covariance=(N, D * (order + 1), dim * (order + 1)) dtype: float

Type:: list[torch_kf.GaussianState]

reset(dim=2) → None

Reset the linking algorithm.

Flush all data stored from a previous linking and prepare a new linking.

Parameters:: dim (int) – The dimension of the data. Default: 2

collect() → list[Track]

Collect and build all the tracks up to the last given frame.

Returns:: Tracks from the last reset to the last given frame.
Return type:: Collection[byotrack.Track]

motion_model() → None

Optional modelisation of motion for tracks.

It can be used to update some internal state of the tracker after the optical flow computation and before the distance computation.

cost(frame: np.ndarray | None, detections: byotrack.Detections) → tuple[torch.Tensor, float]

Compute the association cost between active tracks and detections.

It also returns the threshold to use (Depending on the dist you use, association_threshold could be related to a more meaning full quantity than the cost itself). For instance, when using a squared Euclidean distance, the association threshold could be express as the distance in pixel, and this function could square it. For likelihood association, you could provide the association threshold as a probability and use -log(threshold) as the true threshold. (See KalmanLinker and NearestNeighborLinker)

Parameters:

frame (np.ndarray | None) – The current optional frame of the video Shape: (H, W, C), dtype: float
detections (byotrack.Detections) – Detections for the given frame

Returns:

The cost matrix between active tracks and detections: Shape: (n_tracks, n_dets), dtype: float
float: The association threshold to use.: It can be different than self.association_threshold depending on the dist build here

Return type:

torch.Tensor

post_association(frame: np.ndarray | None, detections: byotrack.Detections, active_mask: torch.Tensor) → None

Update the internal state of the tracker after update_active_tracks.

It should update any internal model/data. It is also responsible to register the position of each active track in all_positions for the current time frame.

Parameters:

frame (np.ndarray | None) – The optional current frame of the video Shape: (H, W, C), dtype: float
detections (byotrack.Detections) – Detections for the given frame
active_mask (torch.Tensor) – Boolean tensor indicating True for still active tracks Shape: (N_tracks), dtype: bool

build_initial_covariance(dim: int) → Tensor

Build the diagonal initial covariance matrix.

The position is initially unknown, leading to a belief (given by the first detection) set to the position of the first detection, with detection_std uncertainty.

The velocity (and higher order derivatives) are assumed to be 0.0 with a relatively high uncertainty: initial_std_factor * process_std.

Note that having a large initial_std_factor (>10) may decrease performances, as the first prediction will be impacted and largely uncertain, leading to low probabilities for every associations. In KOFT, as the velocity is measured before the first prediction, the initial_std_factor can be increased to reduce this bias toward a nul initial velocity. We found that initial_std_factor=0.0 is a good trade off in practice.