Photogrammetry for the Categorically Minded

The Premise

You have played video games. You know that a 3D engine takes geometry and renders a flat image from a virtual camera. This is a projection \(\pi:\mathbb{R}^3\dashrightarrow\mathbb{R}^2\). It is surjective but not injective: every pixel corresponds to a ray, and the depth along that ray is destroyed.

Photogrammetry is the inverse problem: given a collection of flat images taken from different viewpoints, recover the 3D geometry. If rendering is the pullback \(p^*\), photogrammetry is descent.

One photograph is not enough — the depth fiber is killed. But multiple photographs from different positions each kill a different fiber direction. The collection of cameras forms a cover, and the problem becomes: given coherent local data on the cover, reconstruct the global object on the base.

The Capture Rig

Professional studios use static multi-camera rigs — 30 to 200+ cameras arranged in rings and domes around a central volume. All cameras fire simultaneously, freezing the subject. The rig geometry is fixed and pre-calibrated; only the subject changes between captures.

Below: cameras in three rings (low, mid, high) surround a bust. Drag to orbit. Click any camera to see its 2D projection — the pullback \(p^*E\).

Interactive · Static Capture Rig

Drag to orbit · Click a camera to see its view · Scroll to zoom

Cameras / ring Frustums

Low ring

Mid ring

High ring

Selected view

The Reconstruction Pipeline

Once the photographs are captured, reconstruction proceeds through stages — each with a categorical analog. Step through them below.

Interactive · Reconstruction Pipeline

Noise

Triangulation: Recovering the Fiber

Two cameras, two rays, one intersection. Each matched feature determines a ray from each camera center through the image plane. These rays generically meet in a unique 3D point — effective descent.

Drag the scene point. Watch the rays converge.

Interactive · Triangulation (Effective Descent)

Drag the amber dot to move the scene point

Camera separation

Cameras

Scene point P

Reconstructed P̂

Rays

When cameras are too close, rays become nearly parallel and the intersection degenerates. This is the analog of a cover that is faithful but not "flat enough."

The Cocycle Condition

With three or more views, correspondences must be transitive:

\(\varphi_{ik}=\varphi_{jk}\circ\varphi_{ij}\)

Failure is measured by reprojection error: project each 3D point back into every camera and measure the residual. Bundle adjustment minimizes this globally — enforcing the cocycle condition in a least-squares sense. Crank the noise slider:

Interactive · Cocycle Condition & Bundle Adjustment

Noise (obstruction) Reprojection error: 0.0 px

Cameras

True points

Reconstructed

Error

The Dictionary

The correspondence is not an analogy — it is a functor.

Descent Theory	Photogrammetry	Gloss
Base \(X\)	3D scene	The unknown geometry
Cover \(Y\to X\)	Camera rig	Viewpoints whose union covers the scene
Faithful flatness	≥ 60% overlap	Every point visible to ≥ 2 cameras
Pullback \(p^*E\)	Photograph	Depth fiber killed
\(Y\!\times_X\! Y\)	Overlap region	Points visible to cameras \(i\) and \(j\)
Descent datum \(\varphi_{ij}\)	Feature correspondence	Matched keypoints across views
Cocycle on \(Y^3_X\)	Trifocal tensor	Three-view consistency
Cocycle condition	Bundle adjustment	Transitivity of correspondences
Effective descent	3D reconstruction	Point cloud recovered
\(H^1\) obstruction	Reprojection error	The red residuals
\(\mathrm{SE}(3)\)	Camera pose group	Rigid motions
Gauge ambiguity	Scale ambiguity	Up to global similarity
Section	Ground control point	Fixes the gauge
Sheaf, no descent	NeRF	\(T\mapsto\mathrm{view}(T)\) directly

The last row is the punchline. A NeRF learns \((\mathrm{pos},\mathrm{dir})\mapsto(\mathrm{color},\mathrm{density})\) — the functor of views — without ever constructing a mesh. It is a sheaf that refuses to descend.