Quality WolvesBook a call
← All posts

ARKit & LiDAR: Building Point Clouds in Swift (Part 1)

iOSAR/VR

ARKit & LiDAR Part 1 cover

ARKit is Apple's powerful augmented reality framework that allows developers to craft immersive, interactive AR experiences specifically designed for iOS devices. For LiDAR-equipped devices, ARKit leverages depth-sensing capabilities that substantially improve environmental scanning precision. Unlike traditional LiDAR systems that tend to be bulky and expensive, iPhone's LiDAR technology is compact, affordable, and integrated directly into consumer hardware — making advanced depth sensing accessible to a broader developer community.

LiDAR enables creation of point clouds — collections of data points representing object surfaces in three-dimensional space. This article demonstrates extracting LiDAR data and converting it into individual points within an AR 3D environment. Part 2 covers merging continuously received points into a unified cloud and exporting to .PLY format.

Prerequisites

  • Xcode 16 with Swift 6
  • SwiftUI for UI design
  • Swift Concurrency for efficient multithreading
  • A LiDAR-equipped iPhone or iPad

Setting Up and Creating the UI

Create a new project named PointCloudExample, removing unnecessary files except PointCloudExampleApp.swift. Next, create ARManager.swift as an actor managing ARSCNView and the AR session. Since SwiftUI lacks native ARSCNView support, UIKit bridging is necessary.

In the ARManager initializer, set it as delegate for ARSession and start the session with ARWorldTrackingConfiguration, ensuring the sceneDepth property is set for LiDAR-equipped devices:

import Foundation
import ARKit

actor ARManager: NSObject, ARSessionDelegate, ObservableObject {

    @MainActor let sceneView = ARSCNView()

    @MainActor
    override init() {
        super.init()

        sceneView.session.delegate = self

        // start session
        let configuration = ARWorldTrackingConfiguration()
        configuration.frameSemantics = .sceneDepth
        sceneView.session.run(configuration)
    }
}

In PointCloudExampleApp.swift, create an ARManager state object and present the AR view to SwiftUI:

struct UIViewWrapper<V: UIView>: UIViewRepresentable {
    let view: UIView
    func makeUIView(context: Context) -> some UIView { view }
    func updateUIView(_ uiView: UIViewType, context: Context) { }
}

@main
struct PointCloudExampleApp: App {
    @StateObject var arManager = ARManager()
    var body: some Scene {
        WindowGroup {
            UIViewWrapper(view: arManager.sceneView).ignoresSafeArea()
        }
    }
}

Obtaining LiDAR Depth Data

The AR session continuously generates frames containing depth and camera image data. To maintain real-time performance, frames are skipped while one is being processed. Since ARManager is an actor, processing occurs on separate threads, preventing UI freezing.

actor ARManager: NSObject, ARSessionDelegate, ObservableObject {

    @MainActor private var isProcessing = false
    @MainActor @Published var isCapturing = false

    nonisolated func session(_ session: ARSession, didUpdate frame: ARFrame) {
        Task { await process(frame: frame) }
    }

    @MainActor
    private func process(frame: ARFrame) async {
        guard !isProcessing else { return }
        isProcessing = true
        // processing code here
        isProcessing = false
    }
}

Each ARFrame provides:

  • depthMap — Float32 buffer where each pixel contains distance in meters from camera to surface
  • confidenceMap — UInt8 buffer with values 1–3 indicating depth measurement confidence per pixel
  • capturedImage — YCbCr format pixel buffer

Accessing Pixel Data from CVPixelBuffer

Depth and Confidence Buffers

Reading from CVPixelBuffer requires locking the buffer for exclusive data access. The pixel offset formula is: Value = Y × bytesPerRow + X.

struct Size {
    let width: Int
    let height: Int
    var asFloat: simd_float2 { simd_float2(Float(width), Float(height)) }
}

final class PixelBuffer<T> {
    let size: Size
    let bytesPerRow: Int
    private let pixelBuffer: CVPixelBuffer
    private let baseAddress: UnsafeMutableRawPointer

    init?(pixelBuffer: CVPixelBuffer) {
        self.pixelBuffer = pixelBuffer
        CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly)

        guard let baseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0) else {
            CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly)
            return nil
        }
        self.baseAddress = baseAddress
        size = .init(width: CVPixelBufferGetWidth(pixelBuffer),
                     height: CVPixelBufferGetHeight(pixelBuffer))
        bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
    }

    func value(x: Int, y: Int) -> T {
        let rowPtr = baseAddress.advanced(by: y * bytesPerRow)
        return rowPtr.assumingMemoryBound(to: T.self)[x]
    }

    deinit { CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly) }
}

YCbCr Captured Image Buffer

YCbCr separates luminance (Y) from chrominance (Cb, Cr), requiring conversion to RGB:

final class YCBCRBuffer {
    let size: Size
    private let pixelBuffer: CVPixelBuffer
    private let yPlane: UnsafeMutableRawPointer
    private let cbCrPlane: UnsafeMutableRawPointer

    init?(pixelBuffer: CVPixelBuffer) {
        self.pixelBuffer = pixelBuffer
        CVPixelBufferLockBaseAddress(pixelBuffer, .readOnly)

        guard let yPlane = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0),
              let cbCrPlane = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1) else {
            CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly)
            return nil
        }
        self.yPlane = yPlane
        self.cbCrPlane = cbCrPlane
        size = .init(width: CVPixelBufferGetWidth(pixelBuffer),
                     height: CVPixelBufferGetHeight(pixelBuffer))
    }

    func color(x: Int, y: Int) -> simd_float4 {
        let yIndex = y * CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0) + x
        let uvIndex = y / 2 * CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 1) + x / 2 * 2

        let yValue = yPlane.advanced(by: yIndex)
            .assumingMemoryBound(to: UInt8.self).pointee
        let cbValue = cbCrPlane.advanced(by: uvIndex)
            .assumingMemoryBound(to: UInt8.self).pointee
        let crValue = cbCrPlane.advanced(by: uvIndex + 1)
            .assumingMemoryBound(to: UInt8.self).pointee

        let y = Float(yValue) - 16
        let cb = Float(cbValue) - 128
        let cr = Float(crValue) - 128

        let r = 1.164 * y + 1.596 * cr
        let g = 1.164 * y - 0.392 * cb - 0.813 * cr
        let b = 1.164 * y + 2.017 * cb

        return simd_float4(max(0, min(255, r)) / 255.0,
                           max(0, min(255, g)) / 255.0,
                           max(0, min(255, b)) / 255.0, 1.0)
    }

    deinit { CVPixelBufferUnlockBaseAddress(pixelBuffer, .readOnly) }
}

Reading Depth and Color

With buffers ready, define a Vertex structure and iterate through depth pixels:

struct Vertex {
    let position: SCNVector3
    let color: simd_float4
}
func process(frame: ARFrame) async {
    guard let depth = (frame.smoothedSceneDepth ?? frame.sceneDepth),
          let depthBuffer = PixelBuffer<Float32>(pixelBuffer: depth.depthMap),
          let confidenceMap = depth.confidenceMap,
          let confidenceBuffer = PixelBuffer<UInt8>(pixelBuffer: confidenceMap),
          let imageBuffer = YCBCRBuffer(pixelBuffer: frame.capturedImage) else { return }

    for row in 0..<depthBuffer.size.height {
        for col in 0..<depthBuffer.size.width {
            let confidenceRawValue = Int(confidenceBuffer.value(x: col, y: row))
            guard let confidence = ARConfidenceLevel(rawValue: confidenceRawValue) else { continue }
            if confidence != .high { continue }

            let depth = depthBuffer.value(x: col, y: row)
            if depth > 2 { return }

            let normalizedCoord = simd_float2(Float(col) / Float(depthBuffer.size.width),
                                              Float(row) / Float(depthBuffer.size.height))
            let imageSize = imageBuffer.size.asFloat
            let pixelRow = Int(round(normalizedCoord.y * imageSize.y))
            let pixelColumn = Int(round(normalizedCoord.x * imageSize.x))
            let color = imageBuffer.color(x: pixelColumn, y: pixelRow)
        }
    }
}

Converting Points to 3D Scene Coordinates

Calculate the 2D screen coordinate, then use camera intrinsics to convert to 3D camera space:

let screenPoint = simd_float3(normalizedCoord * imageSize, 1)
let localPoint = simd_inverse(frame.camera.intrinsics) * screenPoint * depth

The iPhone camera orientation differs from the phone orientation. Apply a flip transformation to align axes:

func makeRotateToARCameraMatrix(orientation: UIInterfaceOrientation) -> matrix_float4x4 {
    let flipYZ = matrix_float4x4(
        [1, 0, 0, 0],
        [0, -1, 0, 0],
        [0, 0, -1, 0],
        [0, 0, 0, 1]
    )
    let rotationAngle: Float = switch orientation {
        case .landscapeLeft: .pi
        case .portrait: .pi / 2
        case .portraitUpsideDown: -.pi / 2
        default: 0
    }
    let quaternion = simd_quaternion(rotationAngle, simd_float3(0, 0, 1))
    let rotationMatrix = matrix_float4x4(quaternion)
    return flipYZ * rotationMatrix
}

let rotateToARCamera = makeRotateToARCameraMatrix(orientation: .portrait)
let cameraTransform = frame.camera.viewMatrix(for: .portrait).inverse * rotateToARCamera

let worldPoint = cameraTransform * simd_float4(localPoint, 1)
let resultPosition = worldPoint / worldPoint.w

Conclusion

This first installment established the foundational workflow: obtaining depth data from LiDAR alongside corresponding camera images, and transforming each pixel into a colored 3D point. Filtering by confidence level ensures data accuracy.

Part 2 covers merging captured points into a unified point cloud, visualizing results in the AR view, and exporting to .PLY format.