Skip to content

Lifting API Reference

The lifting module exposes tasks that convert 2D image data into 3D representations — depth maps, point clouds, and meshes.


DepthEstimation

The primary entry point for the depth estimation task. Instantiate once and call .run() with a DepthEstimationCommand.

vizion3d.lifting.DepthEstimation

Facade for the Depth Estimation task.

This class serves as the primary entry point for triggering monocular depth estimation inference via direct Python import.

Example
from vizion3d.lifting import (
    DepthEstimation,
    DepthEstimationAdvanceConfig,
    DepthEstimationCommand,
)

cmd = DepthEstimationCommand(
    image_input=b"...",
    return_point_cloud=True,
    advanced_config=DepthEstimationAdvanceConfig(
        fx=615.0, fy=615.0, cx=320.0, cy=240.0, depth_trunc=5.0
    ),
)
result = DepthEstimation().run(cmd)
Source code in vizion3d/lifting/__init__.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
class DepthEstimation:
    """
    Facade for the Depth Estimation task.

    This class serves as the primary entry point for triggering monocular depth
    estimation inference via direct Python import.

    Example:
        ```python
        from vizion3d.lifting import (
            DepthEstimation,
            DepthEstimationAdvanceConfig,
            DepthEstimationCommand,
        )

        cmd = DepthEstimationCommand(
            image_input=b"...",
            return_point_cloud=True,
            advanced_config=DepthEstimationAdvanceConfig(
                fx=615.0, fy=615.0, cx=320.0, cy=240.0, depth_trunc=5.0
            ),
        )
        result = DepthEstimation().run(cmd)
        ```
    """

    experimental: bool = False

    def run(self, command: DepthEstimationCommand) -> DepthEstimationResult:
        """
        Dispatches the provided command through the CQRS bus to the registered handler.

        Args:
            command (DepthEstimationCommand): The inference parameters and flags.

        Returns:
            DepthEstimationResult: The resultant depth map and optional generated files.
        """
        return command_bus.dispatch(command)

run(command)

Dispatches the provided command through the CQRS bus to the registered handler.

Parameters:

Name Type Description Default
command DepthEstimationCommand

The inference parameters and flags.

required

Returns:

Name Type Description
DepthEstimationResult DepthEstimationResult

The resultant depth map and optional generated files.

Source code in vizion3d/lifting/__init__.py
39
40
41
42
43
44
45
46
47
48
49
def run(self, command: DepthEstimationCommand) -> DepthEstimationResult:
    """
    Dispatches the provided command through the CQRS bus to the registered handler.

    Args:
        command (DepthEstimationCommand): The inference parameters and flags.

    Returns:
        DepthEstimationResult: The resultant depth map and optional generated files.
    """
    return command_bus.dispatch(command)

DepthEstimationCommand

Input contract for the depth estimation task. All inference parameters are declared here.

vizion3d.lifting.commands.DepthEstimationCommand dataclass

Bases: Command[DepthEstimationResult]

Command payload to trigger a depth estimation inference task.

Attributes:

Name Type Description
image_input str | bytes

The input image. Pass a file-path string or raw image bytes. The handler auto-detects which form is supplied.

model_backend str

Model backend to use for inference.

  • Default value is the vizion3D release checkpoint URL (depth_anything_v2_vitb.pth), which is downloaded on first use and cached under ~/.cache/vizion3d/models/. Set VIZION3D_MODEL_CACHE to override the cache directory.
  • A local .pth or .pt path is loaded directly as a Depth Anything V2 checkpoint — no download occurs.
  • Any HTTPS URL is downloaded to the cache directory and loaded as a checkpoint.
return_depth_image bool

When True, the result includes a 16-bit grayscale open3d.geometry.Image (dtype uint16) mapping [min_depth, max_depth] to the full 0–65535 range. Requires Open3D (Python 3.12).

return_point_cloud bool

When True, the result includes an open3d.geometry.PointCloud unprojected from the RGB-D image using the camera intrinsics in advanced_config. Point coordinates are in metres. Requires Open3D (Python 3.12).

advanced_config DepthEstimationAdvanceConfig

Camera intrinsics and depth range settings. Override any field to customise — e.g. advanced_config=DepthEstimationAdvanceConfig(fx=615.0, fy=615.0). Unspecified fields keep their defaults (PrimeSense values).

Source code in vizion3d/lifting/commands.py
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
@dataclass
class DepthEstimationCommand(Command[DepthEstimationResult]):
    """
    Command payload to trigger a depth estimation inference task.

    Attributes:
        image_input: The input image. Pass a file-path string or raw image bytes.
            The handler auto-detects which form is supplied.
        model_backend: Model backend to use for inference.

            - Default value is the vizion3D release checkpoint URL
              (`depth_anything_v2_vitb.pth`), which is downloaded on first use and
              cached under `~/.cache/vizion3d/models/`.
              Set `VIZION3D_MODEL_CACHE` to override the cache directory.
            - A local `.pth` or `.pt` path is loaded directly as a Depth Anything V2
              checkpoint — no download occurs.
            - Any HTTPS URL is downloaded to the cache directory and loaded as a
              checkpoint.

        return_depth_image: When `True`, the result includes a 16-bit grayscale
            `open3d.geometry.Image` (dtype `uint16`) mapping `[min_depth, max_depth]`
            to the full 0–65535 range. Requires Open3D (Python 3.12).
        return_point_cloud: When `True`, the result includes an
            `open3d.geometry.PointCloud` unprojected from the RGB-D image using
            the camera intrinsics in `advanced_config`. Point coordinates are in metres.
            Requires Open3D (Python 3.12).
        advanced_config: Camera intrinsics and depth range settings. Override any
            field to customise — e.g.
            ``advanced_config=DepthEstimationAdvanceConfig(fx=615.0, fy=615.0)``.
            Unspecified fields keep their defaults (PrimeSense values).
    """

    image_input: str | bytes
    model_backend: str = DEFAULT_DEPTH_MODEL_URL
    return_depth_image: bool = False
    return_point_cloud: bool = False
    advanced_config: DepthEstimationAdvanceConfig = field(
        default_factory=DepthEstimationAdvanceConfig
    )

DepthEstimationAdvanceConfig

Camera intrinsics and depth range settings. Pass an instance of this model as advanced_config on DepthEstimationCommand to override the PrimeSense defaults used for point cloud unprojection.

vizion3d.lifting.models.DepthEstimationAdvanceConfig

Bases: BaseModel

Camera intrinsics and depth range settings for depth estimation.

All fields are optional overrides — unspecified fields retain their defaults, which match the Open3D PrimeSense preset (640×480 RGB-D sensor).

Attributes:

Name Type Description
fx float

Horizontal focal length in pixels. Controls the horizontal field of view: a larger value means a narrower FOV and more perspective compression.

fy float

Vertical focal length in pixels. Usually equal to fx for square pixels; differs on sensors with non-square pixels.

cx float

Principal point x — the pixel column of the optical axis, typically near the horizontal image centre.

cy float

Principal point y — the pixel row of the optical axis, typically near the vertical image centre.

depth_scale float

Divisor applied to raw uint16 depth values to convert them to metres. 1000 means the raw values are in millimetres (the standard for RealSense, Kinect, and PrimeSense sensors).

depth_trunc float

Maximum depth in metres. Points beyond this distance are discarded from the point cloud.

Source code in vizion3d/lifting/models.py
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
class DepthEstimationAdvanceConfig(BaseModel):
    """
    Camera intrinsics and depth range settings for depth estimation.

    All fields are optional overrides — unspecified fields retain their defaults,
    which match the Open3D PrimeSense preset (640×480 RGB-D sensor).

    Attributes:
        fx: Horizontal focal length in pixels. Controls the horizontal field of
            view: a larger value means a narrower FOV and more perspective compression.
        fy: Vertical focal length in pixels. Usually equal to ``fx`` for square
            pixels; differs on sensors with non-square pixels.
        cx: Principal point x — the pixel column of the optical axis, typically
            near the horizontal image centre.
        cy: Principal point y — the pixel row of the optical axis, typically near
            the vertical image centre.
        depth_scale: Divisor applied to raw uint16 depth values to convert them to
            metres. ``1000`` means the raw values are in millimetres (the standard
            for RealSense, Kinect, and PrimeSense sensors).
        depth_trunc: Maximum depth in metres. Points beyond this distance are
            discarded from the point cloud.
    """

    fx: float = 525.0
    fy: float = 525.0
    cx: float = 319.5
    cy: float = 239.5
    depth_scale: float = 1000.0
    depth_trunc: float = 10.0

DepthEstimationResult

Output contract returned by DepthEstimation.run(). All fields are always present; optional geometry fields are None when the corresponding return_* flag was not set.

vizion3d.lifting.models.DepthEstimationResult

Bases: BaseModel

Result payload returned after a depth estimation inference task.

Attributes:

Name Type Description
depth_map list[list[float]]

Raw floating-point depth array, shape [H][W]. Values are relative (not metric) for monocular models — closer objects have higher values for inverse-depth outputs.

min_depth float

Minimum value in depth_map.

max_depth float

Maximum value in depth_map. Guaranteed max_depth >= min_depth.

backend_used str

Resolved model identifier that processed the request (local file path).

depth_image Image | None

16-bit grayscale open3d.geometry.Image (dtype uint16), present when return_depth_image=True was set on the command. The full 0–65535 range maps linearly to [min_depth, max_depth].

point_cloud PointCloud | None

Coloured open3d.geometry.PointCloud unprojected from the RGB-D image, present when return_point_cloud=True. Coordinates are in metres — multiply distances by point_cloud_scale (always 1.0) to confirm the unit.

point_cloud_scale float

Scale factor for the point cloud coordinate space. Multiply any distance measured between two points in the returned point cloud by this value to get the equivalent distance in metres. Always 1.0 — Open3D produces point cloud coordinates directly in metres.

Source code in vizion3d/lifting/models.py
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
class DepthEstimationResult(BaseModel):
    """
    Result payload returned after a depth estimation inference task.

    Attributes:
        depth_map: Raw floating-point depth array, shape `[H][W]`. Values are
            relative (not metric) for monocular models — closer objects have
            higher values for inverse-depth outputs.
        min_depth: Minimum value in `depth_map`.
        max_depth: Maximum value in `depth_map`. Guaranteed `max_depth >= min_depth`.
        backend_used: Resolved model identifier that processed the request
            (local file path).
        depth_image: 16-bit grayscale `open3d.geometry.Image` (dtype `uint16`),
            present when `return_depth_image=True` was set on the command.
            The full 0–65535 range maps linearly to `[min_depth, max_depth]`.
        point_cloud: Coloured `open3d.geometry.PointCloud` unprojected from the
            RGB-D image, present when `return_point_cloud=True`. Coordinates are
            in metres — multiply distances by `point_cloud_scale` (always `1.0`)
            to confirm the unit.
        point_cloud_scale: Scale factor for the point cloud coordinate space.
            Multiply any distance measured between two points in the returned
            point cloud by this value to get the equivalent distance in metres.
            Always `1.0` — Open3D produces point cloud coordinates directly in metres.
    """

    depth_map: list[list[float]]
    min_depth: float
    max_depth: float
    backend_used: str
    depth_image: O3dImage | None = None
    point_cloud: O3dPointCloud | None = None
    point_cloud_scale: float = 1.0

    model_config = ConfigDict(arbitrary_types_allowed=True)