This article argues that trustworthy digital twins in mixed reality depend on a tight chain from sensing to persistence: robust pose estimation, environment reconstruction, and repeatable world-locked anchoring that survives time, tracking loss, and multi-device use. It outlines the sensor stack underpinning metric correctness, comparing LiDAR, structured light, time-of-flight, and stereo depth, then shows why depth alone is insufficient without anchor lifecycles, permissions, and recovery behaviours embedded at platform level. It maps the post-cloud landscape following the retirement of Azure Spatial Anchors towards vendor-native systems and emerging portable semantics in OpenXR spatial entities, with implications for cross-platform test matrices and maintenance burden [2], [3]. For richer twins, it positions volumetric capture and point-cloud compression (including V-PCC) alongside neural scene representations such as Gaussian Splatting, stressing the need to register visual assets to metric frames and anchor graphs to avoid drift and false physical interactions [9,10]. The piece closes with design rules and a layered reference architecture that prioritise textured localisation features, anchor redundancy, privacy-aware depth access, and governance models that measure stability as infrastructure rather than novelty [1].
[1] Meta, “Spatial anchors, overview and tutorials,” May–Aug. 2024. [Online]. Available: developers.meta.com. developers.meta.com+2developers.meta.com+2
[2] The Khronos Group, “Khronos releases OpenXR 1.1,” Apr. 15, 2024; “OpenXR Spatial Entities Extensions,” Jun. 10–14, 2025. [Online]. Available: khronos.org. The Khronos Group+2The Khronos Group+2
[3] Microsoft Azure, “Azure Spatial Anchors retirement,” Dec. 2, 2023; GitHub CLI deprecation follow-up, Feb. 20, 2025. [Online]. Available: azure.microsoft.com; github.com. Microsoft Azure+1
[4] Apple, “RoomPlan framework and samples; ARKit tracking guidance,” 2022–2024. [Online]. Available: developer.apple.com. Apple Developer+3Apple Developer+3Apple Developer+3
[5] Google, “ARCore Geospatial API and anchors,” Oct. 31, 2024. [Online]. Available: developers.google.com. Google for Developers+2Google for Developers+2
[6] Apple, “Capturing depth using the LiDAR camera,” sample code and WWDC session, 2022; “TrueDepth camera overview,” 2017. [Online]. Available: developer.apple.com. Apple Developer+2Apple Developer+2
[7] The Verge, “Microsoft kills Kinect again,” Aug. 21, 2023; Orbbec, “Products based on Microsoft iToF depth technology,” Aug. 17, 2023; PR Newswire, “Orbbec showcases Azure Kinect DK replacement,” Mar. 20, 2024. [Online]. The Verge+2orbbec.com+2
[8] Intel, “Product Change Notice 807360-01; RealSense LiDAR and tracking EOL,” Apr. 4, 2024; support notes updated May 28, 2024. [Online]. Available: intel.com; support.realsenseai.com. Intel CDRD+1
[9] ISO/IEC, “23090-5, Visual volumetric video-based coding and V-PCC,” 2023–2025 editions; DVB Study Mission on Volumetric Video, Feb. 2024. [Online]. ISO+2ITEH Standards+2
[10] B. Kerbl et al., “3D Gaussian Splatting for real-time radiance field rendering,” 2023; and follow-on works on depth-aware or anchored splatting, 2024–2025. [Online]. Available: arXiv; Inria repo. arXiv+1
[11] Apple, “Apple Vision Pro privacy overview,” 2024. [Online]. Available: apple.com/privacy. Apple
[12] Epic Games, The Virtual Production Field Guide, 2019.
[13] E. Malakhatka et al., “XR Experience Design and Evaluation Framework,” in Human-Technology Interaction, Springer, 2025.
[14] BoF and McKinsey, The State of Fashion 2025, 2024.
If a digital twin is going to be trusted, it must respect the physics and geometry of its real counterpart. That reliability starts with how we sense space and how we anchor content. In mixed reality, LiDAR, depth cameras and modern spatial anchoring pipelines convert raw photons into coordinates that remain stable across time and devices, which is why virtual objects appear to “stick” to walls, tables and people with convincing precision.
Mixed reality systems have to solve three problems at once, and at interactive rates. First, they estimate pose, the six-degree-of-freedom position and orientation of the device. Second, they reconstruct geometry and semantics of the environment, for example planes, edges and surfaces. Third, they persist world-locked references so that content reappears in the same place later, or on another device in the same space. These steps map to visual-inertial odometry and SLAM, scene understanding, and anchoring or persistence. On modern platforms, anchoring is no longer an application-specific trick; it is a first-class feature with lifecycle, permissions and sharing models. Meta documents anchors as world-locked frames that persist between sessions, with explicit guidance on creation, updates and sharing, reflecting how core this has become to MR reliability. developers.meta.com+2developers.meta.com+2
Depth imaging on phones and head-mounted displays typically uses one of three approaches. Structured light projects a known infrared pattern and triangulates disparities, used by Apple’s TrueDepth camera on the front of several devices. Time-of-flight measures phase or time delay of modulated light, used by Apple’s rear LiDAR Scanner for room-scale mapping. Passive stereo uses two RGB sensors and triangulation. Apple’s documentation distinguishes structured light on TrueDepth from LiDAR’s time-of-flight capture pipelines, and exposes sample code for streaming LiDAR depth for measurement and occlusion. Apple Developer+2Apple Developer+2
Industrial depth cameras remain central to professional capture stages and robotics. Microsoft’s Azure Kinect DK, widely used in volumetric capture rigs, was discontinued in 2023, with Microsoft directing developers to partner devices; Orbbec’s Femto series now fills that role using Microsoft iToF depth technology. The Verge+2orbbec.com+2 Intel also formally ended its LiDAR and tracking product lines, noting last-time-buy status and a focus on stereo depth. This matters because production teams planning multi-sensor capture or mobile rovers for site scanning need to consider supply stability, driver support and calibration options across mixed fleets. Intel CDRD+1
Depth alone does not guarantee stability. Anchoring depends on robust frames of reference that can be recovered after tracking loss and across sessions. In 2024, Microsoft retired Azure Spatial Anchors, a cloud service that provided cross-device persistence. Developers have moved to platform-native alternatives or to standards-based stacks. The deprecation and retirement notices are explicit, and several developer ecosystems have since updated their anchor guidance. Microsoft Azure+2Microsoft Learn+2
Today, vendors expose anchor and scene APIs with clearer semantics. Meta’s documentation covers shared anchors, depth permissions and scene models that unify planes and meshes under a privacy-gated permission. developers.meta.com+1 Apple provides world tracking, room scanning and RoomPlan for parametric models, with examples for multi-room capture and recovery when tracking quality drops. The WWDC material notes that when the system falls back to orientation-only tracking, anchors can be temporarily “not tracked,” then resume when world tracking recovers, which is essential for user trust. Apple Developer+3Apple Developer+3Apple Developer+3 Google’s ARCore complements device-local anchors with the Geospatial API that uses VPS imagery for outdoor scale, an approach suited to city-scale digital twins. Google for Developers+2Google for Developers+2
At the standardisation layer, OpenXR 1.1 consolidates common extensions and signals improved handling of spatial entities, while the 2025 Spatial Entities Extensions move plane, marker and persistent anchors toward portable semantics across runtimes. For teams building multi-device digital twins, that shift reduces bespoke per-platform code and makes test matrices manageable. DEVELOP3D+3The Khronos Group+3The Khronos Group+3
A faithful digital twin often needs people, garments or machinery recorded volumetrically. Traditional volumetric video uses multi-camera arrays of RGB or RGB-D sensors, then compresses dynamic point clouds using standards like MPEG’s V-PCC under ISO/IEC 23090-5, which reached updated editions through 2024–2025 and is under active work. Rig architects should plan for V3C or V-PCC in their storage and delivery pipelines to avoid lock-in and to meet bitrate constraints. DVB+2ISO+2
Real-time neural scene representations have expanded the toolkit. Gaussian Splatting achieves fast, photoreal novel view synthesis from camera images, which enables quick capture of small spaces and props for MR previews. The technique is already widely implemented in open source and commercial tools. For MR anchoring, two cautions apply. First, many radiance field methods optimise visual quality rather than metric accuracy, so splat clouds must be aligned to device tracking frames or fused with metric depth if content must interact physically with site geometry. Second, long-term persistence still relies on anchors that can be resolved reliably, so splat assets should be registered to anchor sets or room meshes. Recent research explores depth-aware or anchored splatting that reduces drift, which is directly relevant to MR stability. arXiv+3arXiv+3repo-sam.inria.fr+3
For production teams in creative industries, virtual production guides outline how depth capture and real-time engines shorten iteration cycles and bring VFX decisions earlier into the shoot, which is one reason volumetric capture is finding a home on LED stages and XR volumes.
From shipping SDKs and production practice, several design rules have emerged.
A robust digital twin for a venue, factory or store can follow this layered architecture.
Capture layer, combine device-mounted LiDAR scans for base geometry, photogrammetry or Gaussian Splatting for visual look, and volumetric rigs for people and dynamic assets, selected per scene. Maintain consistent scale with fiducials or surveyed points during shoots to ensure metric alignment later.
Spatial data model, store meshes and planes with semantic tags, then manage anchor graphs per room or zone. Use platform SDKs for short-term anchors, and Geospatial or room maps for long-term persistence. Track provenance and versioning, since twins change as spaces change. Google for Developers
Compression and delivery, for volumetric sequences consider V-PCC for efficient streaming to headsets that lack native point cloud decoders. For splat assets, package alongside alignment transforms to platform anchors so they resolve to the same world coordinates as meshes. ISO
Runtime, target OpenXR 1.1 where possible to reduce platform divergences, and monitor the progress of Spatial Entities Extensions for portable anchors and surfaces. This will make multi-vendor deployments, for example iPad capture to headset playback, simpler to maintain. The Khronos Group+1
Governance, align with emerging XR experience design frameworks that measure usability, safety and effectiveness of MR interactions, particularly for enterprise rollouts where repeatability and comfort are as important as spectacle.
Across recent deployments, realistic “stickiness” presents as sub-centimetre drift for tabletop content and single-digit centimetres at room scale, measured after users walk out and return. This level of stability is achievable on commodity hardware when anchors are created against reliable planes, when depth permissions are granted, and when scene relocalisation is allowed a short settling period. Vendor guidance and field notes echo these thresholds and workflows. developers.meta.com+1
For fashion and retail, the same foundations matter. As the industry explores store-scale digital twins for merchandising and assisted selling, the difference between a product model that “floats” and one that remains aligned to a fixture is the difference between novelty and utility. Broader sector reports forecast a cautious 2025 with consumer price sensitivity; if MR is to justify its cost in such a climate, it must deliver accuracy and repeatability first.
Two threads will shape the next cycle. First, standardised spatial entities in OpenXR are likely to reduce fragmentation around anchors, markers and planes, which will help cross-platform MR twins mature. The Khronos Group Second, neural capture, including depth-aware Gaussian Splatting and related techniques, will continue to speed up look-dev and rapid surveys, provided teams tie them back to metric anchors and scene meshes for physical correctness. arXiv
Bottom line, LiDAR and depth cameras provide the metric ground truth, volumetric pipelines bring people and motion into the scene, and anchors turn all of that into a stable coordinate system you can rely on. Get those layers right, and your digital twin stops being a demo; it becomes infrastructure.