Skip to content

pipeline

Pipeline stage functions, runner classes, and supporting contracts exported from corridorkey.

Stage Functions

Scan a path for processable clips.

Accepts: - A clips directory containing multiple clip subfolders - A single clip folder (must contain Input/ and optionally AlphaHint/) - A single video file (reorganised in-place into a clip folder structure)

Parameters:

Name Type Description Default
path str | Path

Path to a clips directory, a single clip folder, or a video file.

required
reorganise bool

If True (default), loose video files are moved into an Input/ subfolder in-place. If False, loose videos are reported as skipped rather than silently ignored.

True
events PipelineEvents | None

Optional PipelineEvents for streaming clip discovery to a GUI. on_clip_found fires for each valid clip as it is discovered. on_clip_skipped fires for each path that could not be used.

None

Returns:

Type Description
ScanResult

ScanResult with clips ready for the loader stage and any skipped paths.

Raises:

Type Description
ClipScanError

If the path does not exist or is an unrecognised file type.

PermissionError

If the top-level directory cannot be read.

OSError

If video reorganisation fails.

Validate a clip and return its manifest.

For image sequence inputs, reads directly from Input/ and AlphaHint/. For video inputs, extracts frames into Frames/ and AlphaFrames/ without touching the original files.

Parameters:

Name Type Description Default
clip Clip

A Clip from stage 0.

required
events PipelineEvents | None

Optional PipelineEvents for extraction progress reporting.

None
png_compression int

PNG compression level for video extraction (0–9). Default 1 is recommended for intermediate frames.

DEFAULT_PNG_COMPRESSION

Returns:

Type Description
ClipManifest

ClipManifest ready for preprocessing, or for the interface to generate

ClipManifest

alpha externally via resolve_alpha() if needs_alpha is True.

Raises:

Type Description
FrameMismatchError

If validation fails.

ExtractionError

If video extraction fails.

Update a manifest with an externally generated alpha sequence.

Called by the interface layer (CLI, GUI, etc.) after it has generated alpha frames using an external tool. Alpha generation is not a pipeline stage — it is entirely the interface's responsibility.

Validates the provided alpha directory matches the stored frame count (using the count already in the manifest — no re-scan of the input directory), then returns an updated manifest with needs_alpha=False and alpha_frames_dir set, ready for preprocessing.

Parameters:

Name Type Description Default
manifest ClipManifest

A ClipManifest with needs_alpha=True.

required
alpha_frames_dir Path

Path to the directory containing the generated alpha frame sequence.

required

Returns:

Type Description
ClipManifest

Updated ClipManifest with needs_alpha=False, ready for preprocessing.

Raises:

Type Description
ValueError

If manifest already has alpha or the directory doesn't exist.

FrameMismatchError

If the alpha frame count doesn't match manifest.frame_count.

Read VideoMetadata from the clip root directory, if present.

Parameters:

Name Type Description Default
clip_root Path

Clip root directory.

required

Returns:

Type Description
VideoMetadata | None

VideoMetadata if video_meta.json exists, None otherwise.

Preprocess one frame from a clip for model inference.

Parameters:

Name Type Description Default
manifest ClipManifest

ClipManifest from stage 1. Must have needs_alpha=False.

required
i int

Frame index within manifest.frame_range.

required
config PreprocessConfig

Preprocessing configuration.

required
image_files list[Path] | None

Pre-sorted image paths (build once per clip, pass every frame).

None
alpha_files list[Path] | None

Pre-sorted alpha paths (build once per clip, pass every frame).

None

Returns:

Type Description
PreprocessedFrame

PreprocessedFrame — tensor on device + FrameMeta for postprocessing.

Raises:

Type Description
ValueError

If manifest still needs alpha or i is out of range.

FrameReadError

If a frame file cannot be read.

Load a GreenFormer model from a checkpoint.

Constructs the model architecture, moves it to the configured device, loads the checkpoint, applies dtype fixups, and optionally compiles the model with torch.compile.

Parameters:

Name Type Description Default
config InferenceConfig

InferenceConfig with checkpoint_path, device, img_size, use_refiner, model_precision, and refiner_mode.

required
resolved_refiner_mode str | None

The concrete refiner mode after "auto" has been resolved (either "full_frame" or "tiled"). If None, uses config.refiner_mode directly (must not be "auto").

None

Returns:

Type Description
Module

GreenFormer in eval mode on config.device, ready for inference.

Raises:

Type Description
FileNotFoundError

If the checkpoint file does not exist.

RuntimeError

If the checkpoint cannot be loaded.

Run model inference on a single preprocessed frame.

Parameters:

Name Type Description Default
frame PreprocessedFrame

Output of the preprocessing stage. tensor is [1, 4, H, W] on config.device, already ImageNet-normalised.

required
model Module

Loaded GreenFormer in eval mode.

required
config InferenceConfig

Inference configuration.

required
resolved_refiner_mode str | None

Pre-resolved refiner mode ("full_frame" or "tiled"). When provided, skips the per-frame VRAM probe that _should_tile_refiner would otherwise perform for "auto" mode. TorchBackend.run() always passes this to avoid pynvml overhead on every frame.

None

Returns:

Type Description
InferenceResult

InferenceResult with alpha and fg tensors on device, plus FrameMeta.

Postprocess a single inference result into output-ready numpy arrays.

Parameters:

Name Type Description Default
result InferenceResult

InferenceResult from the inference stage.

required
config PostprocessConfig

Postprocessing options (despill, despeckle, source_passthrough).

required
stem str

Filename stem for output naming (e.g. "frame_000001"). Defaults to "frame_{frame_index:06d}" when empty.

''
output_dir Path | None

Root output directory. Required when config.debug_dump=True.

None

Returns:

Type Description
PostprocessedFrame

PostprocessedFrame with all arrays at source resolution, float32.

Write all enabled outputs for one postprocessed frame to disk.

Parameters:

Name Type Description Default
frame PostprocessedFrame

PostprocessedFrame from the postprocessor stage.

required
config WriteConfig

WriteConfig controlling which outputs to write and where.

required

Raises:

Type Description
OSError

If any cv2.imwrite call fails.

Runner Classes

Runs the full pipeline for a single clip on a single device.

Instantiate once per clip, call run(), then discard.

Parameters:

Name Type Description Default
manifest ClipManifest

ClipManifest from the loader stage. Must have needs_alpha=False.

required
config PipelineConfig

Pipeline configuration.

required

run()

Run the pipeline for the clip. Blocks until all frames are done.

Configuration for the full pipeline runner.

Attributes:

Name Type Description
preprocess PreprocessConfig

Preprocessing stage config (img_size, device, strategy).

inference InferenceConfig | None

Inference stage config (checkpoint, device, precision). None means inference is skipped (preprocess-only mode).

postprocess PostprocessConfig

Postprocessing stage config (despill, despeckle, checkerboard).

write WriteConfig | None

Writer stage config (formats, enabled outputs). None means a default WriteConfig is derived from the manifest output_dir.

input_queue_depth int

Max preprocessed frames waiting for inference. Keep small — each frame is ~64MB on GPU at 2048 resolution.

output_queue_depth int

Max inference outputs waiting for postprocessing.

resolved_refiner_mode str | None

Pre-resolved refiner mode ("full_frame" or "tiled"). When set, PipelineRunner skips the VRAM probe entirely and uses this value directly. Populated by to_pipeline_config() so the VRAM probe that already ran for img_size resolution is not repeated.

resolved_refiner_mode = None class-attribute instance-attribute

Optional event callbacks for all pipeline stages. Pass a PipelineEvents instance to receive per-frame and per-stage progress notifications.

Runs the pipeline across multiple GPUs in parallel (frame-level dispatch).

Each GPU gets its own model instance loaded in its own thread. All GPU workers pull from a single shared input queue and push to a single shared output queue. The preprocessor and postwriter are single-threaded as usual.

Frame ordering: output frames may arrive out of order when multiple GPUs are running at different speeds. The PostWriteWorker writes frames as they arrive — if strict ordering is required, the caller should sort by InferenceResult.meta.frame_index after the run.

Parameters:

Name Type Description Default
manifest ClipManifest

ClipManifest from the loader stage.

required
config MultiGPUConfig

MultiGPUConfig with a list of device strings.

required

run()

Run the pipeline. Blocks until all frames are written.

Configuration for multi-GPU frame-level parallel inference.

Each device gets its own model instance and InferenceWorker thread. All workers share a single input queue (preprocessed frames) and a single output queue (inference results), so the pipeline naturally load-balances — whichever GPU finishes first picks up the next frame.

Attributes:

Name Type Description
devices list[str]

List of PyTorch device strings to use (e.g. ["cuda:0", "cuda:1"]). Must have at least one entry. Use resolve_devices("all") to populate this from all available CUDA GPUs.

inference InferenceConfig

Base InferenceConfig. The device field is overridden per-worker — all other fields (checkpoint, precision, etc.) are shared across all GPUs.

preprocess PreprocessConfig

Preprocessing config. Runs on CPU (device-agnostic).

postprocess PostprocessConfig

Postprocessing config.

write WriteConfig | None

Writer config. None → default derived from manifest output_dir.

input_queue_depth int

Shared input queue depth. Scale with GPU count — a depth of 2×N gives each GPU a frame in flight plus one buffered.

output_queue_depth int

Shared output queue depth.

events PipelineEvents | None

Optional pipeline event callbacks.

Contracts

Bases: BaseModel

A clip ready for stage 1. Output contract of stage 0.

Frozen — all fields are immutable after construction. Downstream stages must never mutate a Clip; create a new one if a field needs to change.

Attributes:

Name Type Description
name str

Human-readable clip name derived from the folder name.

root Path

Absolute path to the clip folder.

input_path Path

Path to the input asset. Either the Input/ directory (for pre-structured clips) or a video file inside Input/ (for normalised videos).

alpha_path Path | None

Path to the alpha hint asset. None if absent — the interface must generate alpha externally and call resolve_alpha() before proceeding.

Bases: BaseModel

Output contract of stage 1. Input to all downstream stages.

Frozen — all fields are immutable after construction. Use model_copy() to produce an updated manifest (e.g. in resolve_alpha).

Downstream stages only receive what they need — resolved frame paths, output destination, and clip metadata. All discovery, validation, and extraction decisions are made in stage 1 and are not repeated.

Attributes:

Name Type Description
clip_name str

Clip name, carried through for logging and output naming.

clip_root Path

Absolute path to the clip folder. Parent of Input/, Output/, Frames/, AlphaHint/, etc.

frames_dir Path

Directory containing the input frame sequence. Points to Input/ for image sequence inputs, or Frames/ for video inputs (extracted by stage 1).

alpha_frames_dir Path | None

Directory containing the alpha hint frame sequence. Points to AlphaHint/ for image sequence inputs, or AlphaFrames/ for video inputs. None if absent — the interface layer is responsible for generating alpha externally and calling resolve_alpha() to update the manifest before proceeding.

output_dir Path

Directory where stage 6 writes all output images. Created by stage 1 at clip/Output/.

needs_alpha bool

True if alpha is absent. The interface layer must generate alpha externally and call resolve_alpha() before proceeding.

frame_count int

Total number of input frames.

frame_range tuple[int, int]

Half-open range (start, end) of frames to process. Defaults to (0, frame_count) — the full sequence. Can be narrowed for partial runs or testing.

is_linear bool

True if input frames are in linear light (e.g. .exr).

video_meta_path Path | None

Path to the video_meta.json sidecar file written by stage 1 during video extraction. None for image sequence inputs. Stage 6 reads this to re-encode output with matching properties.

png_compression int

PNG compression level used during extraction (0–9). Stored so the writer stage can use the same level for consistency.

Bases: BaseModel

Source video metadata captured at extraction time.

Carried through the pipeline so stage 6 can re-encode output with matching properties.

Attributes:

Name Type Description
filename str

Original video filename (stem + suffix).

width int

Frame width in pixels.

height int

Frame height in pixels.

fps_num int

Framerate numerator.

fps_den int

Framerate denominator.

pix_fmt str

Pixel format string (e.g. "yuv420p").

codec_name str

Video codec name (e.g. "h264", "prores").

frame_count int

Total frame count as reported by the container. 0 if the container does not report it (use duration_s / fps instead).

duration_s float | None

Total duration in seconds. None if not reported by container.

has_audio bool

True if the source container has at least one audio stream.

color_space str | None

Color space string (e.g. "bt709"). None if not reported.

color_transfer str | None

Transfer characteristic (e.g. "bt709"). None if not reported.

color_primaries str | None

Color primaries (e.g. "bt709"). None if not reported.

estimated_frame_count property

Best-effort frame count: container value if available, else duration * fps.

fps property

Framerate as a float.

Output contract of the preprocessing stage.

Attributes:

Name Type Description
tensor Tensor

Model input tensor [1, 4, img_size, img_size] on device. Channels: [R_norm, G_norm, B_norm, alpha_hint]. dtype is float32 unless PreprocessConfig.half_precision=True.

meta FrameMeta

Original frame dimensions and optional source image for postprocessing.

Metadata carried alongside the tensor for use by postprocessing.

Attributes:

Name Type Description
frame_index int

Index of this frame within the clip's frame_range.

original_h int

Frame height before resizing, in pixels.

original_w int

Frame width before resizing, in pixels.

source_image ndarray | None

Original sRGB image [H, W, 3] float32, RGB channel order, at source resolution, used by postprocessor source_passthrough to replace model FG in opaque interior regions. None if source passthrough is disabled.

alpha_hint ndarray | None

Raw alpha hint [H, W, 1] float32 0-1 at source resolution, used by postprocessor hint_sharpen to produce a hard binary mask that eliminates soft edge tails introduced by upscaling. None if no alpha hint was provided.

Configuration for the preprocessing stage.

Attributes:

Name Type Description
img_size int

Square resolution the model runs at. 2048 is the native training resolution — do not change unless retraining.

device str

PyTorch device string ("cuda", "mps", "cpu").

image_upsample_mode ImageUpsampleMode

Interpolation mode used when the source image is smaller than img_size. "bicubic" (default) gives the sharpest result. "bilinear" is faster but slightly softer. Has no effect when downscaling — area mode is always used then.

half_precision bool

If True, cast tensors to float16 before inference.

source_passthrough bool

If True, carry the original sRGB source image in FrameMeta so the postprocessor can replace model FG in opaque interior regions with original source pixels.

sharpen_strength float

Unsharp mask strength applied after upscaling. 0.3 (default) recovers softness from the antialias filter. 0.0 disables. Has no effect when downscaling.

Runtime configuration for the inference stage.

Attributes:

Name Type Description
checkpoint_path Path

Path to the .pth model checkpoint file.

device str

PyTorch device string ("cuda", "cuda:0", "mps", "cpu").

img_size int

Square resolution the model runs at. Must be one of 0, 512, 1024, 1536, or 2048. 0 means auto-select based on VRAM (resolved by pipeline.to_inference_config() before the model is loaded — do not pass 0 directly to load_model). 2048 is the native training resolution and produces the best output. Smaller values reduce VRAM usage at the cost of output quality.

use_refiner bool

Whether to enable the CNN refiner module. The refiner corrects transformer macroblocking artifacts at subject edges. Disabling it is faster but produces visibly coarser alpha mattes.

mixed_precision bool

Run the forward pass under autocast (fp16/bf16). Ignored on CPU. Reduces VRAM usage with minimal quality impact.

model_precision dtype

Weight dtype for the model forward pass. float32 is safe everywhere. float16/bfloat16 saves VRAM on CUDA but may reduce numerical stability.

refiner_mode RefinerMode

Controls how the CNN refiner executes. "auto" — probe VRAM; <12 GB → tiled, else → full_frame. "full_frame" — run the refiner on the full image at once. Best performance on GPUs with 12+ GB VRAM. "tiled" — run the refiner in 512×512 overlapping tiles. Keeps peak VRAM flat. Identical output quality to full_frame. Required on low-VRAM GPUs.

refiner_scale float

Multiplier applied to the CNN refiner's delta output. 1.0 applies full refinement. 0.0 disables the refiner output entirely (equivalent to use_refiner=False but without skipping the forward pass). Values between 0 and 1 blend between no refinement and full refinement.

backend BackendChoice

Which inference backend to use. "auto" — Apple Silicon + corridorkey-mlx installed → mlx, else → torch. "torch" — always use PyTorch (CUDA / ROCm / MPS / CPU). "mlx" — always use MLX (Apple Silicon only, optional package). Can also be overridden via the CORRIDORKEY_BACKEND env var.

Output contract of the inference stage. Input to postprocessing.

Attributes:

Name Type Description
alpha Tensor

Predicted alpha matte [1, 1, img_size, img_size], sigmoid-activated, range 0-1, on device.

fg Tensor

Predicted foreground colour [1, 3, img_size, img_size], sigmoid-activated sRGB range 0-1, on device.

meta FrameMeta

Original frame dimensions and index, carried through from preprocessing so postprocessing can resize outputs back to source resolution.

Configuration for the postprocessor stage.

Attributes:

Name Type Description
fg_upsample_mode FgUpsampleMode

Interpolation mode for upscaling the foreground when the model resolution is smaller than the source. "lanczos4" (default) gives the sharpest result. "bicubic" is slightly faster. "bilinear" is fastest. Downscaling always uses INTER_AREA.

alpha_upsample_mode AlphaUpsampleMode

Interpolation mode for upscaling the alpha matte. "lanczos4" (default) gives the sharpest matte edges. "bilinear" is faster. Downscaling always uses INTER_AREA.

despill_strength float

Green spill suppression strength (0.0 = off, 1.0 = full).

auto_despeckle bool

Remove small disconnected alpha islands.

despeckle_size int

Minimum connected region area in pixels to keep.

despeckle_dilation int

Dilation radius in pixels applied after component removal to recover edges lost during binarisation. Default 25.

despeckle_blur int

Gaussian blur radius applied after dilation to soften the hard mask edge. Default 5.

checkerboard_size int

Tile size in pixels for the preview composite background.

source_passthrough bool

Replace model FG in opaque interior regions with the original source pixels. Eliminates dark fringing caused by background contamination in the model FG prediction. Requires source_image in FrameMeta (set PreprocessConfig.source_passthrough=True).

edge_erode_px int

Erosion radius (pixels) for the interior mask used by source_passthrough. Shrinks the interior region inward so the blend seam sits inside the subject rather than at the raw alpha edge.

edge_blur_px int

Gaussian blur radius for the source_passthrough blend seam. Higher values produce a softer transition between model FG and source.

hint_sharpen bool

Apply a hard binary mask derived from the alpha hint to eliminate soft edge tails introduced by upscaling. Requires an alpha hint in FrameMeta. Default True.

hint_sharpen_dilation int

Dilation radius in pixels applied to the binarised hint before masking. Gives breathing room so fine model edge detail is not clipped. Default 3.

debug_dump bool

Save raw inference output (before any postprocessing) to a debug/ subfolder alongside the normal outputs. Writes four PNGs per frame: raw_alpha, raw_fg, post_hint_alpha, post_hint_fg. Useful for diagnosing whether quality issues originate in the model or in postprocessing. Default False.

Output contract of the postprocessor stage. Input to the writer stage.

All arrays are at original source resolution, float32, numpy.

Attributes:

Name Type Description
alpha ndarray

Alpha matte [H, W, 1], linear, range 0-1.

fg ndarray

Foreground RGB [H, W, 3], sRGB straight, range 0-1. In transparent regions the values are undefined — use processed for compositing work.

processed ndarray

Premultiplied linear RGBA [H, W, 4], range 0-1. This is the primary output for compositing. Transparent regions are correctly zeroed out (fg * alpha), so no black-blob artefacts.

comp ndarray

Preview composite over checkerboard [H, W, 3], sRGB, range 0-1.

frame_index int

Frame index carried through from FrameMeta.

source_h int

Original frame height in pixels.

source_w int

Original frame width in pixels.

stem str

Filename stem for output naming (e.g. "frame_000001").

Configuration for the writer stage.

Controls which outputs are written and in what format. All output subdirectories are created under output_dir.

Attributes:

Name Type Description
output_dir Path

Root directory for all outputs.

alpha_enabled bool

Write the alpha matte.

alpha_format ImageFormat

File format for alpha output ("png" or "exr").

fg_enabled bool

Write the straight sRGB foreground colour image.

fg_format ImageFormat

File format for fg output ("png" or "exr").

processed_enabled bool

Write the premultiplied linear RGBA output. This is the primary compositor output — transparent regions are correctly zeroed out. Saved as EXR (float32) by default.

processed_format ImageFormat

File format for processed output ("png" or "exr").

comp_enabled bool

Write the checkerboard preview composite.

comp_format Literal['png']

File format for comp output (always "png").

exr_compression str

EXR compression codec name. One of: "none", "rle", "zips", "zip", "piz", "pxr24", "dwaa", "dwab".