barathbee 3 hours ago
How it works: - YOLO11n detects objects in 2D (5MB, runs on Neural Engine) - BoxerNet lifts each 2D box to a 7-DoF 3D bounding box (center, size, yaw) using DINOv3 visual features + LiDAR depth + Plücker ray encoding - 3D boxes are placed in AR via SceneKit
Exporting BoxerNet (100M params, DINOv3 backbone) to ONNX for on-device inference
Models run as float16 ONNX (~196MB total) with ONNX Runtime
Based on: https://facebookresearch.github.io/boxer/
Code: https://github.com/Barath19/Boxer3D
Would love feedback.