Advanced Config: Camera Intrinsics & Depth Range
DepthEstimationAdvanceConfig lets you override the camera intrinsics and depth range parameters that control how a raw depth map is lifted into a 3D point cloud. Without it, vizion3d uses built-in PrimeSense defaults; with it, you can match your actual camera and scene requirements precisely.
Background: the pinhole camera model
Every point in a point cloud is computed by inverting the pinhole camera projection. Given a pixel at image coordinates (u, v) with a depth value d (in metres), its 3D position (X, Y, Z) is:
Z = d
X = (u - cx) * d / fx
Y = (v - cy) * d / fy
All four intrinsic parameters — fx, fy, cx, cy — appear in this formula. Getting them wrong produces a point cloud that is geometrically distorted: correct topology but wrong angles, skewed shapes, or objects that appear compressed or stretched.
Config fields
fx — horizontal focal length (pixels)
Default: 525.0
The horizontal focal length of the camera in pixels. It is the product of the physical focal length (mm) and the horizontal pixel density (pixels/mm). A larger fx means the camera has a narrower horizontal field of view; the same scene width maps to fewer pixels.
Effect on the point cloud: fx controls the horizontal spread of 3D points. If fx is too small, the point cloud is horizontally compressed. If too large, it is horizontally stretched.
How to find it: Use your camera's calibration matrix K[0][0], or compute it from the horizontal field of view FoV_h:
fx = (image_width / 2) / tan(FoV_h / 2)
fy — vertical focal length (pixels)
Default: 525.0
The vertical focal length in pixels. For cameras with square pixels, fy ≈ fx. Cameras with non-square sensors may have fy ≠ fx.
Effect on the point cloud: Controls vertical spread analogously to fx. Incorrect fy produces vertically compressed or stretched geometry.
How to find it: K[1][1] from the calibration matrix, or:
fy = (image_height / 2) / tan(FoV_v / 2)
cx — horizontal principal point (pixels)
Default: 319.5
The horizontal image coordinate of the optical axis — ideally the exact centre of the sensor. For a 640-wide image the ideal value is 319.5; for a 1920-wide image it is typically near 959.5.
Effect on the point cloud: Shifts the entire point cloud left or right. A wrong cx makes the scene appear to be viewed from an off-centre vantage point, introducing a lateral tilt.
cy — vertical principal point (pixels)
Default: 239.5
The vertical image coordinate of the optical axis. For a 480-tall image the ideal value is 239.5.
Effect on the point cloud: Shifts the entire point cloud up or down. Like cx, an incorrect value introduces a tilt — vertical in this case.
depth_scale — depth value scale factor
Default: 1000.0
The divisor applied to the raw uint16 depth buffer before passing depth values to Open3D's RGBDImage.create_from_color_and_depth. Open3D divides the stored integer depth by depth_scale to obtain a value in metres. The default of 1000.0 means the uint16 range [0, 65535] maps to [0, 65.535] metres.
Effect on the point cloud: Changing depth_scale rescales all Z values (and therefore X/Y values, since X = (u - cx) * Z / fx). Doubling depth_scale halves all distances. This does not change the relative shape of the cloud — only the metric scale.
When to adjust: Only change this if you are supplying a depth buffer in a different unit (e.g. centimetres instead of millimetres). In vizion3d the depth map is internally normalised before being encoded into uint16, so the default 1000.0 is correct for the standard workflow.
depth_trunc — maximum depth clip distance (metres)
Default: 10.0
Points with a depth value greater than depth_trunc metres are discarded by Open3D before building the point cloud. This controls the far clipping plane.
Effect on the point cloud: Lowering depth_trunc removes distant background points and produces a denser, cleaner cloud for near objects. Setting it to a very small value will discard almost all points. Setting it too large can include noisy, low-confidence depth estimates at the scene boundary.
Practical guidance:
- Indoor close-up scenes: 2.0–5.0 m
- Room-scale scenes: 5.0–10.0 m (default)
- Outdoor or large-scale: 10.0–30.0 m
Default values and PrimeSense
The built-in defaults match the PrimeSense / Microsoft Kinect v1 sensor at 640×480 VGA resolution:
| Parameter | Default | PrimeSense VGA |
|---|---|---|
fx |
525.0 |
525.0 px |
fy |
525.0 |
525.0 px |
cx |
319.5 |
319.5 px |
cy |
239.5 |
239.5 px |
depth_scale |
1000.0 |
— |
depth_trunc |
10.0 |
— |
These are reasonable placeholders for any RGB camera with a ~60° horizontal FoV. For accurate metric reconstruction, always supply intrinsics from your actual camera calibration.
Usage: Direct Python
from vizion3d.lifting import (
DepthEstimation,
DepthEstimationAdvanceConfig,
DepthEstimationCommand,
)
# Full custom intrinsics (e.g. Intel RealSense D435 at 1280×720)
config = DepthEstimationAdvanceConfig(
fx=909.15,
fy=908.48,
cx=640.0,
cy=360.0,
depth_scale=1000.0,
depth_trunc=6.0,
)
with open("scene.png", "rb") as f:
img_bytes = f.read()
result = DepthEstimation().run(
DepthEstimationCommand(
image_input=img_bytes,
return_point_cloud=True,
advanced_config=config,
)
)
import numpy as np
points = np.asarray(result.point_cloud.points)
print(f"Points: {len(points)}")
Partial overrides work too — unspecified fields keep their defaults:
# Only change depth_trunc; everything else stays at PrimeSense defaults
result = DepthEstimation().run(
DepthEstimationCommand(
image_input=img_bytes,
return_point_cloud=True,
advanced_config=DepthEstimationAdvanceConfig(depth_trunc=3.0),
)
)
Usage: REST API
All six config parameters are optional form fields on the POST /lifting/depth-estimation endpoint.
# Full custom intrinsics
curl -X POST "http://localhost:8000/lifting/depth-estimation" \
-F "image=@scene.png" \
-F "return_point_cloud=true" \
-F "fx=909.15" \
-F "fy=908.48" \
-F "cx=640.0" \
-F "cy=360.0" \
-F "depth_scale=1000.0" \
-F "depth_trunc=6.0"
Partial overrides — omit any field to keep its default:
# Only override depth_trunc
curl -X POST "http://localhost:8000/lifting/depth-estimation" \
-F "image=@scene.png" \
-F "return_point_cloud=true" \
-F "depth_trunc=3.0"
Python requests equivalent:
import requests
with open("scene.png", "rb") as f:
img_bytes = f.read()
response = requests.post(
"http://localhost:8000/lifting/depth-estimation",
files={"image": ("scene.png", img_bytes, "image/png")},
data={
"return_point_cloud": "true",
"fx": "909.15",
"fy": "908.48",
"cx": "640.0",
"cy": "360.0",
"depth_trunc": "6.0",
},
)
data = response.json()
print(f"Depth range: {data['min_depth']:.4f} → {data['max_depth']:.4f}")
Usage: gRPC API
The DepthEstimationAdvanceConfig proto message mirrors the Python model. All fields are optional, so any omitted field falls back to the server-side default.
import grpc
from vizion3d.proto import lifting_pb2, lifting_pb2_grpc
channel = grpc.insecure_channel("localhost:50051")
stub = lifting_pb2_grpc.LiftingServiceStub(channel)
with open("scene.png", "rb") as f:
img_bytes = f.read()
# Full custom intrinsics
request = lifting_pb2.DepthEstimationRequest(
image_bytes=img_bytes,
return_point_cloud=True,
advanced_config=lifting_pb2.DepthEstimationAdvanceConfig(
fx=909.15,
fy=908.48,
cx=640.0,
cy=360.0,
depth_scale=1000.0,
depth_trunc=6.0,
),
)
response = stub.RunDepthEstimation(request)
print(f"Depth range: {response.min_depth:.4f} → {response.max_depth:.4f}")
Partial override — only depth_trunc:
request = lifting_pb2.DepthEstimationRequest(
image_bytes=img_bytes,
return_point_cloud=True,
advanced_config=lifting_pb2.DepthEstimationAdvanceConfig(depth_trunc=3.0),
)
response = stub.RunDepthEstimation(request)
How to get your camera intrinsics
Option 1: camera datasheet or SDK
Most camera SDKs expose the intrinsic matrix directly:
# Intel RealSense
import pyrealsense2 as rs
pipeline = rs.pipeline()
profile = pipeline.start()
intr = profile.get_stream(rs.stream.color).as_video_stream_profile().intrinsics
config = DepthEstimationAdvanceConfig(
fx=intr.fx, fy=intr.fy, cx=intr.ppx, cy=intr.ppy
)
Option 2: OpenCV calibration
Run a standard checkerboard calibration with cv2.calibrateCamera. The returned camera_matrix is:
[[fx, 0, cx],
[ 0, fy, cy],
[ 0, 0, 1]]
import cv2
import numpy as np
# After calibrating…
_, camera_matrix, _, _, _ = cv2.calibrateCamera(obj_points, img_points, image_size, None, None)
config = DepthEstimationAdvanceConfig(
fx=float(camera_matrix[0, 0]),
fy=float(camera_matrix[1, 1]),
cx=float(camera_matrix[0, 2]),
cy=float(camera_matrix[1, 2]),
)
Option 3: approximate from field of view
If you know the camera's horizontal field of view FoV_h (in degrees) and image dimensions:
import math
image_width = 1920
image_height = 1080
fov_h_deg = 69.0 # horizontal FoV in degrees
fx = (image_width / 2) / math.tan(math.radians(fov_h_deg / 2))
fy = fx # assumes square pixels
cx = image_width / 2 - 0.5
cy = image_height / 2 - 0.5
config = DepthEstimationAdvanceConfig(fx=fx, fy=fy, cx=cx, cy=cy)
Common camera presets
These are approximate values for common cameras. Always prefer calibrated values over these presets.
| Camera | Resolution | fx | fy | cx | cy |
|---|---|---|---|---|---|
| PrimeSense / Kinect v1 | 640×480 | 525.0 | 525.0 | 319.5 | 239.5 |
| Intel RealSense D415 | 1920×1080 | 1382.0 | 1382.0 | 960.5 | 540.5 |
| Intel RealSense D435 | 1280×720 | 909.0 | 908.0 | 640.0 | 360.0 |
| iPhone 14 wide (approx.) | 4032×3024 | 5500.0 | 5500.0 | 2016.0 | 1512.0 |
| Webcam 1080p (typical) | 1920×1080 | 1400.0 | 1400.0 | 960.0 | 540.0 |