Depth Estimation

Category: Lifting (2D → 3D)
Experimental: No

Depth estimation predicts the per-pixel distance from the camera for every pixel in a 2D RGB image, producing a depth map and optionally unprojecting it into a 3D point cloud. vizion3d uses Depth Anything V2 as its default backend.

Model backends

Default checkpoint download: depth_anything_v2_vitb.pth

curl -L \
  https://github.com/OlafenwaMoses/vizion3D/releases/download/essentials-v1/depth_anything_v2_vitb.pth \
  -o depth_anything_v2_vitb.pth

Value	What happens
(default)	Downloads the vizion3D release checkpoint (`depth_anything_v2_vitb.pth`) to `~/.cache/vizion3d/models/` on first use, then loads it directly
An HTTPS URL ending in `.pth` or `.pt`	Downloaded to the cache directory on first use, then loaded as a Depth Anything V2 checkpoint
A local `.pth` or `.pt` file path	Loaded directly as a Depth Anything V2 checkpoint — never downloaded

Models are kept in memory after the first inference in the current process. Subsequent calls to any DepthEstimation instance reuse the loaded weights.

Set VIZION3D_MODEL_CACHE in your environment to change the default cache directory.

Command parameters

DepthEstimationCommand is the input contract for this task.

Parameter	Type	Required	Default	Description
`image_input`	`str \\| bytes`	Yes	—	Image to process. Pass a file path string or raw image bytes.
`model_backend`	`str`	No	vizion3D release checkpoint URL	Model backend identifier. See Model backends above.
`return_depth_image`	`bool`	No	`False`	If `True`, the result includes a 16-bit grayscale Open3D Image of the depth map.
`return_point_cloud`	`bool`	No	`False`	If `True`, the result includes an Open3D PointCloud unprojected from the RGB-D image.
`advanced_config`	`DepthEstimationAdvanceConfig`	No	PrimeSense defaults	Camera intrinsics and depth range settings. See Advanced config below.

Result fields

DepthEstimationResult is the output contract for this task.

Field	Type	Always present	Description
`depth_map`	`list[list[float]]`	Yes	Raw floating-point depth array, shape `[H][W]`. Values are relative (not metric) — closer objects have higher values for inverse-depth models.
`min_depth`	`float`	Yes	Minimum value in `depth_map`.
`max_depth`	`float`	Yes	Maximum value in `depth_map`. Guaranteed `max_depth >= min_depth`.
`backend_used`	`str`	Yes	Resolved model identifier that processed the request (local file path).
`depth_image`	`open3d.geometry.Image \\| None`	When `return_depth_image=True`	16-bit grayscale image, dtype `uint16`, shape `(H, W)`. The full 0–65535 range maps to `[min_depth, max_depth]`.
`point_cloud`	`open3d.geometry.PointCloud \\| None`	When `return_point_cloud=True`	Coloured 3D point cloud unprojected from the RGB-D image using the intrinsics in `advanced_config`. Coordinates are in metres.
`point_cloud_scale`	`float`	Yes	Scale factor: multiply any distance measured between two points in the point cloud by this value to get the equivalent distance in metres. Always `1.0` — Open3D produces point cloud coordinates directly in metres.

1. Direct Python import — image bytes

The most common usage: read an image file into bytes and dispatch the command.

from vizion3d.lifting import DepthEstimation, DepthEstimationCommand

with open("scene.png", "rb") as f:
    img_bytes = f.read()

cmd = DepthEstimationCommand(image_input=img_bytes)
result = DepthEstimation().run(cmd)

print(f"Depth map shape : {len(result.depth_map)} rows × {len(result.depth_map[0])} cols")
print(f"Depth range     : {result.min_depth:.4f} → {result.max_depth:.4f}")
print(f"Backend used    : {result.backend_used}")

2. Direct Python import — file path

Pass a file path string instead of bytes; the handler opens it automatically.

from vizion3d.lifting import DepthEstimation, DepthEstimationCommand

cmd = DepthEstimationCommand(image_input="scene.png")
result = DepthEstimation().run(cmd)

print(f"Depth range: {result.min_depth:.4f} → {result.max_depth:.4f}")

3. Depth image (16-bit PNG)

Request a 16-bit grayscale Open3D Image of the depth map for visualization or downstream processing.

import numpy as np
from PIL import Image as PILImage
from vizion3d.lifting import DepthEstimation, DepthEstimationCommand

cmd = DepthEstimationCommand(
    image_input="scene.png",
    return_depth_image=True,
)
result = DepthEstimation().run(cmd)

# result.depth_image is an open3d.geometry.Image (uint16)
depth_array = np.asarray(result.depth_image)   # shape (H, W), dtype uint16
print(f"Depth image shape: {depth_array.shape}, dtype: {depth_array.dtype}")

# Save as 16-bit PNG via PIL
PILImage.fromarray(depth_array).save("depth.png")

4. Point cloud

Request a coloured 3D point cloud unprojected from the RGB-D image. Distances between points are in metres (point_cloud_scale == 1.0).

import numpy as np
import open3d as o3d
from vizion3d.lifting import DepthEstimation, DepthEstimationCommand

cmd = DepthEstimationCommand(
    image_input="scene.png",
    return_point_cloud=True,
)
result = DepthEstimation().run(cmd)

pcd = result.point_cloud                          # open3d.geometry.PointCloud
points = np.asarray(pcd.points)                   # shape (N, 3), float64, in metres
colors = np.asarray(pcd.colors)                   # shape (N, 3), float64, range [0, 1]

print(f"Points : {len(points)}")
print(f"Scale  : {result.point_cloud_scale} metre per unit")

# Measure real-world distance between two points
dist_metres = np.linalg.norm(points[0] - points[1]) * result.point_cloud_scale
print(f"Distance p0→p1: {dist_metres:.4f} m")

# Save as PLY
o3d.io.write_point_cloud("scene.ply", pcd)

5. All outputs at once

Both optional outputs can be requested in a single inference pass.

import numpy as np
import open3d as o3d
from vizion3d.lifting import DepthEstimation, DepthEstimationCommand

cmd = DepthEstimationCommand(
    image_input="scene.png",
    return_depth_image=True,
    return_point_cloud=True,
)
result = DepthEstimation().run(cmd)

# Depth map
print(f"Depth range : {result.min_depth:.4f} → {result.max_depth:.4f}")

# 16-bit depth image
depth_arr = np.asarray(result.depth_image)        # uint16 (H, W)

# Point cloud
pcd = result.point_cloud
o3d.io.write_point_cloud("scene.ply", pcd)

6. Custom model backend

Use a local .pth checkpoint or a remote URL to a .pth file.

from vizion3d.lifting import DepthEstimation, DepthEstimationCommand

# Local checkpoint
cmd = DepthEstimationCommand(
    image_input="scene.png",
    model_backend="/models/depth_anything_v2_vitl.pth",
)
result = DepthEstimation().run(cmd)
print(f"Backend: {result.backend_used}")

# Remote checkpoint URL (downloaded and cached on first use)
cmd = DepthEstimationCommand(
    image_input="scene.png",
    model_backend=(
        "https://github.com/OlafenwaMoses/vizion3D/releases/download/"
        "essentials-v1/depth_anything_v2_vitb.pth"
    ),
)
result = DepthEstimation().run(cmd)
print(f"Backend: {result.backend_used}")

7. REST API

Start the server with all REST features enabled:

pip / Poetry

vizion3d-serve-rest

uv

uv run vizion3d-serve-rest

To preload a depth-estimation checkpoint into memory at startup, pass --depth_model. This also enables the depth-estimation endpoint. If this flag is omitted, the default vizion3D release model is downloaded on first inference and cached under ~/.cache/vizion3d/models/.

uv run vizion3d-serve-rest --depth_model /models/depth_anything_v2_vitb.pth

The REST server can also expose only selected features. If none of --depth_estimation, --stereo_depth, --depth_model, or --stereo_model is provided, all features are enabled. If any of those flags is provided, only the selected features are enabled. A model path flag selects and preloads its feature:

# Only POST /lifting/depth-estimation
uv run vizion3d-serve-rest --depth_estimation

# Only depth estimation, with the model loaded before the first request
uv run vizion3d-serve-rest \
  --depth_estimation \
  --depth_model /models/depth_anything_v2_vitb.pth

Send a request with multipart/form-data:

curl -X POST "http://localhost:8000/lifting/depth-estimation" \
  -F "image=@scene.png" \
  -F "return_point_cloud=true"

The response is a JSON-serialised DepthEstimationResult. Binary fields (depth_image, point_cloud_ply) are base64-encoded in the JSON response.

8. gRPC API

Start the server:

pip / Poetry

vizion3d-serve-grpc

uv

uv run vizion3d-serve-grpc

Call from any gRPC client using the generated stubs:

import grpc
from vizion3d.proto import lifting_pb2, lifting_pb2_grpc

channel = grpc.insecure_channel("localhost:50051")
stub = lifting_pb2_grpc.LiftingServiceStub(channel)

with open("scene.png", "rb") as f:
    img_bytes = f.read()

request = lifting_pb2.DepthEstimationRequest(
    image_bytes=img_bytes,
    return_point_cloud=True,
)

response = stub.RunDepthEstimation(request)
print(f"Min depth : {response.min_depth}")
print(f"Max depth : {response.max_depth}")
print(f"Backend   : {response.backend_used}")

9. Advanced config: camera intrinsics & depth range

DepthEstimationAdvanceConfig lets you supply the actual camera intrinsics and depth range for your sensor, replacing the built-in PrimeSense defaults. This is required for accurate metric 3D geometry when your camera is not a 640×480 PrimeSense sensor.

from vizion3d.lifting import (
    DepthEstimation,
    DepthEstimationAdvanceConfig,
    DepthEstimationCommand,
)

result = DepthEstimation().run(
    DepthEstimationCommand(
        image_input="scene.png",
        return_point_cloud=True,
        advanced_config=DepthEstimationAdvanceConfig(
            fx=909.15,
            fy=908.48,
            cx=640.0,
            cy=360.0,
            depth_trunc=6.0,
        ),
    )
)

The same config is available in the REST and gRPC entry points. See Advanced Config for the full field reference, formulas, entry-point examples, and camera presets.

Known limitations

Relative depth only — the default monocular backend produces relative (inverse) depth, not metric depth. Point cloud distances are internally consistent but not calibrated to real-world scale without a known reference distance.
Python 3.12 required for Open3D — return_depth_image and return_point_cloud require Open3D, which currently only supports Python 3.12 in this project.