Video

Video

class byotrack.video.video.VideoTransformConfig(aggregate: bool = False, normalize: bool = False, selected_channel: int | None = None, q_min: float = 0.0, q_max: float = 1.0, smooth_clip: float = 0.0, compute_stats_on: int = 50)

Bases: object

Configuration for video transformations.

aggregate

Aggregate channels

Type:

bool

normalize

Scale and Normalize the video in [0, 1]

Type:

bool

selected_channel

Channel to use for aggregation If None, channel average is done. If any, it performs channel selection

Type:

int | None

q_min

Minimum quantile to use when scaling the video

Type:

float

q_max

Maximum quantile to use when scaling the video

Type:

float

smooth_clip

Smoothness of the clipping process (log clipping) See ScaleAndNormalize: it logs clip the highest values on q_max. If 0.0, hard clipping is done.

Type:

float

compute_stats_on

Number of frames to use to compute the quantiles.

Type:

int

class byotrack.video.video.Video(data_source: str | PathLike | VideoReader, **kwargs: Any)

Bases: Sequence[ndarray]

Video: Iterable, indexable and sliceable sequence of frames wrapping a VideoReader.

It wraps VideoReader in order to add video transformation (Channel Aggregation, Scaling, Normalization) and to add useful pythonic protocols (Sliceable, Indexable, Iterable).

Frames are 2D or 3D with a channel axis. It behaves similarly as a 5D/4D numpy array of shape (T[, D], H, W, C).

Example

import byotrack

# Read a video (Usually 2D RGB)
video = byotrack.Video(video_path)

# Add a transform that will aggregate channel and normalize in [0, 1] the intensities
transform_config = byotrack.VideoTransformConfig(aggregate=True, normalize=True, q_min=0.01, q_max=0.999)
video.set_transform(transform_config)

# Iterate through the video
for frame in video:
    pass

# Temporal slicing
sliced = video[10:50:3]  # Take one frame every three from frame 10 to frame 50.

# Spatial slicing
sliced = video[:, 100:200, 150:250]  # All frames on the roi (100:200 x 150:250)
ndim

Either 4 (2D) or 5 (3D). (T, H, W, C) in 2D or (T, D, H, W, C) in 3D.

Type:

int

shape

Shape of the video (Time, [Depth, ]Height, Width)

Type:

tuple[int, …]

channels

Number of channels

Type:

int

reader

Underlying video reader

Type:

byotrack.VideoReader

set_transform(transform_config: VideoTransformConfig) None

Set the transform (channel_selector and normalizer).

Parameters:

transform_config (byotrack.VideoTransformConfig) – Configuration of the transformations

transform(frame: ndarray) ndarray

Transform a frame using channel aggregation and normalization.

__getitem__(index: int) ndarray
__getitem__(slice_: slice) Video
__getitem__(slices: tuple[slice, ...]) Video

Indexing and slicing operations.

When indexed, it returns the ith frame in the slice When sliced, it duplicates the video (wrapper) with the right slice

Parameters:

key (int | slice | tuple[slice, ...]) – index or slice of the video

Returns:

Frame at index or a shallow copy of the video with the right slice

Return type:

np.ndarray | Video

byotrack.video.video.compose_slice(slice_1: slice, slice_2: slice, length: int) slice

Compose two slices in the given order.

Video Reader

byotrack.video.reader.slice_length(slice_: slice, shape: int) int

Compute the number of element in a slice.

class byotrack.video.reader.MetaVideoReader(cls_name: str, bases: tuple, attributes: dict)

Bases: type

MetaClass for Video Readers.

Each VideoReader has to define a list of supported extensions. The last constructed VideoReader to claim an extension will be used to open the video. If no one has claimed an extension the default OpenCVVideoReader is used.

class byotrack.video.reader.VideoReader(path: str | os.PathLike, **kwargs: Any)

Bases: object

Unified video reader api.

Close to OpenCV API but few key differences:

  • There is always a frame loaded

  • Frame ids goes from 0 to length - 1

  • Frames are loaded in RGB.

  • It support any number of channels and 2D/3D

  • Read method is very different:

    • It retrieves the current frame then grabs the next (The other way around in opencv)

    • It returns therefore a ndarray and a bool rather than a bool and a ndarray

    • The boolean returned indicated if we can continue to read and not if the read operation has failed

  • Easy to check main attributes like:

    • frame_id

    • length

    • channels

    • shape

    • fps if known (-1 otherwise)

supported_extensions

Static attribute used by open method to automatically choose which VideoReader to use.

Type:

list[str]

path

Path of the current video

Type:

pathlib.Path

released

True when release has been called (close and release memory)

Type:

bool

fps

Frame rate (-1 if unknown)

Type:

int

shape

Spatial dimensions of frames (Height, Width[, Depth])

Type:

tuple[int, …]

channels

Number of channels

Type:

int

length

Number of frames

Type:

int

frame_id

Current frame id

Type:

int

release() None

Close the file and free memory.

grab() bool

Grab the next frame.

Can be faster than self.seek(self.frame_id + 1)

Returns:

True if able to grab next frame

Return type:

bool

retrieve() ndarray

Retrieve the current frame.

Returns:

The current frame

Shape: ([D, ]H, W, C)

Return type:

np.ndarray

read() tuple[ndarray, bool]

Consume a frame. Is equivalent to retrieve + grab.

As in this implementation there is always a current frame. It reverses OpenCV implementation It first retrieves then grab next frame

Returns:

The current frame - Shape: ([D, ]H, W, C) bool: Whether there is a next frame to read

Return type:

np.ndarray

seek(frame_id: int) None

Seek frame_id (will update the current frame).

Valid frame ids from 0 to length - 1

Raises:

EOFError if seeking an invalid frame

tell() int

Returns self.frame_id.

Returns:

current frame_id

Return type:

int

static open(path: str | os.PathLike, **kwargs: Any) VideoReader

Open a video file.

Use the extension to know which VideoReader to use

Parameters:
  • path (str | os.PathLike) – File to open

  • kwargs – Any additional args for the underlying video reader

Returns:

VideoReader

class byotrack.video.reader.OpenCVVideoReader(path: str | os.PathLike, **kwargs: Any)

Bases: VideoReader

Wrapper around opencv VideoCapture.

Default VideoReader when opening a file.

It only supports 2D images (grayscale or RGB).

video

VideoCapture from opencv

Type:

cv2.VideoCapture

release() None

Close the file and free memory.

grab() bool

Grab the next frame.

Can be faster than self.seek(self.frame_id + 1)

Returns:

True if able to grab next frame

Return type:

bool

retrieve() ndarray

Retrieve the current frame.

Returns:

The current frame

Shape: ([D, ]H, W, C)

Return type:

np.ndarray

seek(frame_id: int) None

Seek frame_id (will update the current frame).

Valid frame ids from 0 to length - 1

Raises:

EOFError if seeking an invalid frame

class byotrack.video.reader.PILVideoReader(path: str | os.PathLike, **kwargs: Any)

Bases: VideoReader

Old PIL video reader. Works well for 2D multi frames Tiff files that are not supported by OpenCV.

IT only supports 2D videos. TiffVideoReader should have a larger support for TiffFiles

See VideoReader for inherited attributes.

video

PIL image (animated)

Type:

PIL.Image.Image

release() None

Close the file and free memory.

grab() bool

Grab the next frame.

Can be faster than self.seek(self.frame_id + 1)

Returns:

True if able to grab next frame

Return type:

bool

retrieve() ndarray

Retrieve the current frame.

Returns:

The current frame

Shape: ([D, ]H, W, C)

Return type:

np.ndarray

seek(frame_id: int) None

Seek frame_id (will update the current frame).

Valid frame ids from 0 to length - 1

Raises:

EOFError if seeking an invalid frame

class byotrack.video.reader.TiffVideoReader(path: str | os.PathLike, level=0, axes: str | None = None, ax_slice: dict[str, slice] | None = None, **kwargs: Any)

Bases: VideoReader

Tiff video reader with tifffile. Handle 2D and 3D videos with any channels.

Axes are inferred from the tifffile metadata and convert into (T, [D, ]H, W, C) (<=> T[Z]YXC). We may not support all formats, or your specific metadata can be wrong/missing. In this case, you can also provide the expected axes of the tifffile using an ordered string.

For example: “TYX” for 2D videos without channel, “TCZYX” for 3D videos with channels (ordered by time, channel then stack), “ZTYX” for 3D videos without channels (ordered by stack then time).

Note

With tifffile syntax, we use X for width, Y for height, Z for depth and C/S for channels (C and S are not supported together) and T for time. Any other letter (I, O, Q, …) is first interpret as T if it is missing, then Z if it is missing, and finally it will yield an error.

It also supports to read the tiff at a specific resolution level

See VideoReader for inherited attributes.

out_axes

Axes order of the outputs

Type:

str

level

Resolution level (if any) Default: 0 (finest level)

Type:

int

in_axes

Parsed in axes

Type:

dict[str, int]

ax_slice

Optional slices to use on axes.

Type:

dict[str, slice]

release() None

Close the file and free memory.

grab() bool

Grab the next frame.

Can be faster than self.seek(self.frame_id + 1)

Returns:

True if able to grab next frame

Return type:

bool

seek(frame_id: int) None

Seek frame_id (will update the current frame).

Valid frame ids from 0 to length - 1

Raises:

EOFError if seeking an invalid frame

retrieve() ndarray

Retrieve the current frame.

Returns:

The current frame

Shape: ([D, ]H, W, C)

Return type:

np.ndarray

class byotrack.video.reader.FrameTiffLoader(level=0, axes: str | None = None, ax_slice: dict[str, slice] | None = None)

Bases: object

Load a single frame stored in a TiffFile with tifffile.

It handle 2D and 3D videos with any channels. Axes are inferred from the tifffile metadata and convert into ([D, ]H, W, C) (<=> [Z]YXC).

We may not support all formats, or your specific metadata can be wrong/missing. In this case, you can also provide the expected axes of the tifffile using an ordered string.

For example: “YX” for 2D videos without channel, “CZYX” for 3D videos with channels (ordered by channel then stack).

Note

With tifffile syntax, we use X for width, Y for height, Z for depth and C/S for channels (C and S are not supported together). Any other letter (T, I, O, Q, …) is either interpreted as Z if it is missing, or it will yield an error.

It also supports to read the tiff at a specific resolution level.

out_axes

Axes order of the outputs

Type:

str

level

Resolution level (if any) Default: 0 (finest level)

Type:

int

axes

Override the axes found in the tiff metadata. Default: None (Parse the metadata)

Type:

str | None

byotrack.video.reader.pil_loader(path: str | os.PathLike) np.ndarray

Load an image with PIL. Shape: (H, W, C).

It only supports 2D images

class byotrack.video.reader.MultiFrameReader(path: str | os.PathLike, paths: list[str | os.PathLike] | None = None, extension: str | None = None, frame_loader: Callable[[str | os.PathLike], np.ndarray] | None = None, **kwargs: Any)

Bases: VideoReader

Read video from a list of files inside a folder.

By default, it will find the alphanumerically sorted list of paths that shares the most common extension in the folder. The extension may be provided by the user.

You can provide your own list of paths (absolute paths). The folder path is then ignored.

Finally, you may also provide your own loading function to load each frame as a numpy array.

See VideoReader for inherited attributes.

paths

Sorted list of Paths to each frame of the video.

Type:

list[pathlib.Path]

frame_loader

Loads frame from their associated files.

Type:

Callable[[str | os.PathLike], np.ndarray]

grab() bool

Grab the next frame.

Can be faster than self.seek(self.frame_id + 1)

Returns:

True if able to grab next frame

Return type:

bool

seek(frame_id: int) None

Seek frame_id (will update the current frame).

Valid frame ids from 0 to length - 1

Raises:

EOFError if seeking an invalid frame

retrieve() ndarray

Retrieve the current frame.

Returns:

The current frame

Shape: ([D, ]H, W, C)

Return type:

np.ndarray

Video transforms

class byotrack.video.transforms.ChannelSelect(channel: int)

Bases: object

Select a given channel.

channel

Channel to keep

Type:

int

Parameters:

frame (np.ndarray) – Frame of a video Shape: (…, C)

Returns:

Filtered frame with a single channel

Shape: (…, 1)

Return type:

np.ndarray

class byotrack.video.transforms.ChannelAvg

Bases: object

Average channels into a single one.

Parameters:

frame (np.ndarray) – Frame of a video Shape: (…, C)

Returns:

Average of channels

Shape: (…, 1)

Return type:

np.ndarray

class byotrack.video.transforms.ScaleAndNormalize(q_min: float, q_max: float, smooth_clip: float = 0, compute_stats_on: int = 50)

Bases: object

Scale and Normalize each channel into [0, 1].

min and max values are computed using quantile of the video to improve stability

q_min

Quantile of the minimum value to consider

Type:

float

q_max

Quantile of the maximum value to consider

Type:

float

mini

Minimum value kept (one for each channel) Shape: (C, ), dtype: float32

Type:

np.ndarray

maxi

Maximum value kept (one for each channel) Shape: (C, ), dtype: float32

Type:

np.ndarray

smooth_clip

Smoothness of the clipping process If 0, values are clipped on mini/maxi Else, values above maxi are log clipped: v = 1 + a log((v - 1)/a + 1) for v > 1, with a the smooth_clip factor Typical values are between 0 and 1. Default: 0 (hard clipping)

Type:

float

max

True maximum values (one for each channel) when using smooth clipping Shape: (C, ), dtype: float32

Type:

np.ndarray

compute_stats_on

Max number of frames to compute stats on. It prevents heavy computations that can occurs with large videos. Default: 50

Type:

int

Parameters:

frame (np.ndarray) – Frame of the video Shape: (…, C)

Returns:

Normalized version of the frame in [0, 1]

Shape: (…, C), dtype: float32

Return type:

np.ndarray

update_stats(frames: ndarray) None

Update mini and maxi values based on the given frames.

Parameters:

frames (np.ndarray) – Several frames of the same video to compute the stats Shape: (…, C)