# Wildfire Documentation through Wave-Based Light Field Localization: # A Frequency-Domain Approach to Camera Network Calibration **Author:** Stephen Guerin **Affiliations:** - Harvard University, Department of Earth and Planetary Sciences, Visualization Research and Teaching Laboratory - Harvard University Graduate School of Design, Landscape Architecture Program - SimTable LLC, Santa Fe, New Mexico **Date:** January 2, 2026 ## Abstract This research proposes a novel approach to localizing and calibrating decentralized camera networks through wave-based light field analysis, with immediate application to documenting the 2025 Pacific Palisades wildfire from hundreds of uncalibrated street-level videos. Drawing on principles from quantum optics, signal processing, and computational photography, we treat camera networks as sparse samplers of a continuous light field and apply frequency-domain fingerprinting methods analogous to acoustic recognition systems (e.g., Shazam) to establish correspondence with georeferenced panoramic imagery. The approach extends traditional epipolar geometry from point-line dualities to wave function sampling across 2D pixel arrays, enabling robust localization even under severe scene changes caused by fire, smoke, and structural damage. ## 1. Introduction ### 1.1 Motivation Wildfire documentation presents a critical challenge for emergency response, forensic analysis, and scientific understanding of fire behavior. During the January 2025 Pacific Palisades fire, hundreds of videos were captured by residents, emergency personnel, and media from street-level perspectives. These videos contain invaluable spatiotemporal information about fire progression, structure ignition sequences, and evacuation dynamics. However, most footage lacks reliable metadata about camera position, orientation, or intrinsic parameters, limiting its utility for quantitative analysis and integration with simulation models. Traditional photogrammetric approaches to camera localization require either calibration targets, dense feature correspondence across multiple views, or structure-from-motion pipelines that assume static scenes. All of these assumptions fail during active wildfires where scenes change rapidly, smoke obscures features, and footage is captured opportunistically by uncalibrated devices. ### 1.2 Theoretical Foundation This research builds on three key theoretical insights: **Wave-based perspective on imaging:** Following Carver Mead's interpretation of quantum mechanics and Alvy Ray Smith's insight that "a pixel is not a little square," we treat images not as collections of discrete intensity measurements but as discrete samples of continuous electromagnetic field configurations. Each image samples the light field—the spatially-varying distribution of radiance as a function of position and direction—at a particular location with a particular point spread function determined by the optical system. **Light field structure and constraints:** The plenoptic function L(x,y,z,θ,φ,t) describing radiance as a function of 3D position, 2D direction, and time is highly constrained by the physics of light transport, scene geometry, and material properties. These constraints mean that sparse samples of the light field contain sufficient information to infer both scene structure and camera parameters, provided we exploit the field's mathematical structure rather than treating samples as independent measurements. **Bidirectional field reciprocity:** Drawing from Wheeler-Feynman absorber theory and Helmholtz reciprocity, we recognize that light field observations involve transactions between cameras (absorbers) and scene elements (emitters/reflectors). The path light takes from scene point P to sensor point S is identical (with conjugate properties) to the path a projected ray from S would take to reach P. This bidirectionality suggests calibration strategies based on resonance between forward propagation (scene→camera) and backward projection (camera→scene). ### 1.3 Core Innovation We propose treating camera localization as a resonance search problem in spatial frequency space. By computing multi-scale frequency fingerprints from uncalibrated video and matching them against a pre-computed catalog of frequency signatures extracted from georeferenced panoramic imagery (Google Street View), we can localize cameras to specific GPS coordinates and viewing directions without requiring feature correspondence, scene reconstruction, or calibration targets. The approach extends epipolar geometry from traditional point-to-line constraints to wave-based field consistency requirements. Instead of matching discrete features, we match the spatial frequency structure of 2D image patches, treating them as samples of underlying wave functions. This frequency-domain approach is robust to appearance changes from fire and smoke, as geometric structure persists in mid-range spatial frequencies even when absolute intensities and fine details are altered. ## 2. Technical Approach ### 2.1 Hierarchical Light Field Representation We organize all spatial data—digital elevation models (DEMs), OpenStreetMap vector features, 3D photogrammetric tiles, and georeferenced panoramic imagery—using a hierarchical tile pyramid in EPSG:4326 (unprojected WGS84 geographic coordinates). This coordinate system preserves angular relationships and aligns naturally with spherical geometry, unlike Web Mercator projections that introduce metric distortions. At zoom level Z, the world is divided into 2^Z × 2^Z tiles, each covering a geographic rectangle of constant latitude-longitude extent. For the Pacific Palisades study area (approximately 34.05°N, 118.5°W), zoom levels 16-20 provide meter to submeter resolution appropriate for street-level localization. For each tile at each zoom level, we pre-compute: - **Terrain elevation statistics** from DEM data - **Geometric constraints** from OpenStreetMap (road networks, building footprints, known landmarks) - **Light field frequency signatures** from all Street View panoramas within the tile - **3D geometric models** from photogrammetric reconstruction tiles ### 2.2 Frequency Fingerprinting from Panoramic Imagery For each Street View panorama with known GPS coordinates (latitude λ, longitude φ): 1. **Discretize viewing directions:** Sample N azimuthal bins (e.g., N=32, yielding 11.25° angular resolution) 2. **Generate perspective projections:** For each azimuth α_i, synthesize a perspective view from the equirectangular panorama with field of view matching typical smartphone cameras (e.g., 60-80° horizontal FOV) 3. **Multi-scale frequency decomposition:** Divide each perspective view into overlapping patches at multiple scales (e.g., 64×64, 128×128, 256×256 pixels). For each patch: - Compute 2D Discrete Fourier Transform - Extract magnitude spectrum |F(k_x, k_y)| - Identify peak frequencies and their orientations - Build rotation-invariant descriptors (e.g., log-polar transform of spectrum) - Optionally estimate phase structure via Transport of Intensity methods 4. **Compact descriptor generation:** Encode each patch's frequency characteristics as a compact binary descriptor (e.g., 128-512 bits) suitable for fast approximate nearest neighbor search 5. **Hierarchical aggregation:** For each tile at each zoom level, compute aggregate frequency statistics summarizing the characteristic spatial frequency distributions observed across all panoramas and viewing directions within that tile ### 2.3 Video Frame Localization Pipeline For each uncalibrated video frame from wildfire footage: **Step 1: Multi-scale frequency extraction** - Divide frame into overlapping patches - Compute frequency fingerprints identical to panoramic imagery processing - Generate descriptors at multiple scales to handle unknown camera zoom **Step 2: Hierarchical tile search** - Begin at coarse zoom level (e.g., Z=12, covering ~5km region) - Compare frame fingerprints against aggregate tile signatures - Identify top K candidate tiles by frequency correlation - Refine search by descending to child tiles (Z+1) of promising candidates - Continue hierarchical descent to finest zoom level (e.g., Z=20) **Step 3: Pose hypothesis generation** - For each high-scoring fine-resolution tile, retrieve associated Street View panoramas - For each panorama, test multiple viewing directions (azimuth bins) - Generate candidate camera poses: (λ, φ, α, focal_length) **Step 4: Geometric consistency verification** - For each pose hypothesis, check consistency across multiple image patches - Different patches should match panorama regions separated by angles consistent with camera field of view - Use RANSAC-style robust estimation to reject outlier patch matches - Verify against geometric constraints: DEM (elevation), OSM (position must be on accessible ground), 3D tiles (line-of-sight consistency) **Step 5: Wave propagation validation** - Treat matched Street View patches as field sources - Propagate field forward to candidate camera position using angular spectrum method - Compare predicted field structure against observed video frame - Similarly, back-propagate observed frame to Street View location - Score pose by bidirectional consistency: forward prediction ↔ backward inference **Step 6: Temporal tracking** (for video sequences) - Use optical flow and feature tracking between consecutive frames - Constrain frame-to-frame pose changes by physical motion limits - Track camera trajectory through tile hierarchy - Enforce spatial continuity: consecutive frames should map to same or adjacent tiles ### 2.4 Extension of Epipolar Constraints to Wave Functions Traditional epipolar geometry relates point correspondences across views through the fundamental matrix F or essential matrix E. Given a point p₁ in image 1, the corresponding point p₂ in image 2 must lie on the epipolar line l₂ = F^T p₁. We extend this to wave-based constraints on 2D image patches: **Frequency-domain epipolar consistency:** Given image patches P₁(x,y) and P₂(x,y) from two cameras with known relative pose (R,t): 1. Compute 2D Fourier transforms F₁(k_x, k_y) and F₂(k_x, k_y) 2. The spatial frequency structure should be related by: F₂(k'_x, k'_y) ≈ H(k_x, k_y; R, t, Z(x,y)) · F₁(k_x, k_y) where: - (k'_x, k'_y) = R(k_x, k_y) (rotated frequency coordinates) - H is a propagation kernel accounting for distance and depth Z - Z(x,y) is the depth map relating the two views 3. For patches viewing the same scene region with different parallax, certain frequency bands should show consistent phase relationships determined by scene depth 4. Cross-correlation of frequency magnitudes |F₁| and |F₂| should show peaks at offsets corresponding to parallax displacement This wave-based formulation is more robust than point matching because: - Frequency structure persists even when individual features are obscured - Mid-range frequencies encode geometric regularity of building facades, street patterns - Low frequencies (coarse structure) are insensitive to smoke and lighting changes - High frequencies can be down-weighted when image quality is degraded ### 2.5 Handling Fire-Induced Scene Changes The wildfire alters scene appearance through: - **Structural damage:** Buildings collapse, vegetation burns - **Smoke obscuration:** Atmospheric scattering affects different wavelengths differently - **Illumination changes:** Fire light sources create unusual lighting - **Temporal evolution:** Scene changes over minutes to hours Our frequency-based approach handles these challenges through: **Frequency band weighting:** Low-to-mid frequencies (corresponding to building scale, 1-10m features) are most stable. High frequencies (texture details) degrade first. We weight frequency bands by expected reliability given fire conditions. **Geometric persistence:** Street layout, terrain topology, and distant landmarks persist even when foreground is damaged. We prioritize matching frequencies from depth layers likely to be unchanged. **Multi-temporal calibration:** Early video frames (before severe damage) localize more reliably. Once localized, we track camera motion forward through degraded conditions using temporal coherence. **Synthetic view rendering:** Using 3D tiles and known fire progression data, we can render "expected" views from candidate camera positions accounting for known damage, improving match quality for later footage. ## 3. Data Sources and Study Area ### 3.1 Pacific Palisades Fire (January 2025) - **Geographic extent:** Approximately 34.0°N to 34.1°N, -118.6°W to -118.5°W - **Terrain:** Elevation range 0-500m, steep topography, Santa Monica Mountains - **Urban density:** Mixed residential/wildland interface, street network well-documented in OSM - **Video corpus:** ~100-300 videos from social media, news media, resident footage - **Duration:** Fire evolution over 24-48 hours ### 3.2 Prior Geometric Data **Digital Elevation Models:** - Source: USGS 3DEP, 1/3 arc-second (~10m) or better resolution - Coverage: Complete for study area - Format: GeoTIFF tiles in EPSG:4326 - Use: Terrain constraints, line-of-sight calculations, height priors for ground-level cameras **OpenStreetMap Vector Data:** - Features: Road centerlines, building footprints, amenities, landmarks - Quality: High coverage in urban areas, updated frequently - Use: Camera position constraints (must be on accessible ground), semantic landmarks for matching, street furniture as geometric references **Google Street View Panoramas:** - Density: High coverage along all public roads, typical spacing 10-20m - Resolution: Equirectangular panoramas, ~13312×6656 pixels typical - Metadata: GPS coordinates, capture date, compass heading - Use: Reference light field for frequency matching, pre-fire appearance baseline **3D Photogrammetric Tiles:** - Source: Google 3D tiles or similar photogrammetry-derived meshes - Resolution: Variable, submeter in urban cores - Use: Geometric ground truth for validation, synthetic view rendering ### 3.3 Video Characteristics - **Capture devices:** Smartphones (various models), consumer cameras, dashcams, news cameras - **Resolution:** 720p to 4K, variable - **Metadata reliability:** GPS often absent or inaccurate, timestamps may be present - **Motion characteristics:** Handheld (shaky), vehicle-mounted (smooth), static (tripod) - **Duration:** 10 seconds to several minutes per clip - **Viewing conditions:** Extreme - smoke, fire light, low visibility ## 4. Validation and Evaluation ### 4.1 Ground Truth Establishment For a subset of videos where location can be determined through manual landmark identification: - Human annotators identify distinctive features (building facades, street signs, geographic landmarks) - Cross-reference with Street View and satellite imagery to establish ground truth GPS coordinates and viewing directions - Accuracy typically ±5-10m in position, ±10° in azimuth ### 4.2 Evaluation Metrics **Localization accuracy:** - Position error (meters) relative to ground truth - Azimuth error (degrees) relative to ground truth - Success rate at various thresholds (% localized within 10m, 25m, 50m) **Temporal consistency:** - For video sequences, smoothness of estimated camera trajectory - Violations of physical motion constraints (impossible velocities/accelerations) **Geometric consistency:** - Agreement with DEM elevation constraints - Consistency with OSM road network (camera should be on accessible ground) - Line-of-sight consistency with 3D tiles (visible features should not be occluded by geometry) **Cross-video consistency:** - When multiple videos show overlapping scene regions, do their localized poses agree on shared geometry? - Triangulation accuracy for shared landmarks ### 4.3 Ablation Studies To validate the wave-based approach, we compare against: **Traditional feature matching:** SIFT/ORB/SuperPoint feature extraction and matching against Street View imagery, without frequency-domain fingerprinting **Direct pixel-intensity matching:** Template correlation without frequency decomposition **Geometry-only localization:** Using only geometric priors (DEM, OSM) without light field matching **Point-based epipolar geometry:** Classical structure-from-motion pipelines We expect frequency-based matching to show superior robustness to appearance changes while maintaining comparable accuracy on undamaged scenes. ## 5. Implementation Considerations ### 5.1 Computational Requirements **Pre-processing (one-time cost):** - Extract and fingerprint all Street View panoramas in study area - Estimate: ~10,000 panoramas × 32 viewing directions × 3 scales = ~1M frequency descriptors - Processing time: ~1 second per view on GPU → ~10 hours total - Storage: ~500 bytes per descriptor → ~500 MB total **Per-frame localization (real-time target):** - Frequency extraction from video frame: ~100ms on GPU - Hierarchical tile search (6-8 zoom levels): ~50-200 descriptor comparisons with locality-sensitive hashing - Geometric verification: ~10-50ms - Total: ~200-500ms per frame, enabling near-real-time processing **Video corpus processing:** - 200 videos × 60 seconds average × 1 frame/second = ~12,000 frames - At 500ms/frame: ~2 hours total processing time on single GPU - Parallelizes trivially across videos ### 5.2 Software Architecture **Core components:** - **Tile server:** PostgreSQL/PostGIS database for hierarchical tile storage and spatial queries - **Frequency extraction:** Python/PyTorch for 2D FFT, descriptor generation, GPU acceleration - **Descriptor indexing:** FAISS (Facebook AI Similarity Search) for approximate nearest neighbors - **Geometric reasoning:** GDAL/OGR for DEM and vector operations, Open3D for 3D geometry - **Visualization:** Web-based interface for displaying localized videos on map, trajectory animation **Data flow:** 1. Video ingestion → frame extraction → frequency fingerprinting 2. Hierarchical search → pose hypothesis generation 3. Geometric verification → trajectory smoothing 4. Output: CSV of (timestamp, latitude, longitude, azimuth, confidence) 5. Visualization: Georeferenced video playback with uncertainty bounds ### 5.3 Open Source Strategy All code, algorithms, and processed datasets will be released under permissive open source licenses (MIT/Apache 2.0) to enable: - Replication and validation by other researchers - Application to future wildfire events and other disaster documentation scenarios - Integration with emergency management workflows (e.g., SimTable fire simulation platform) ## 6. Applications and Impact ### 6.1 Immediate Applications **Fire progression analysis:** By localizing and timestamping video footage, we can reconstruct the spatiotemporal evolution of the fire front, identify structure ignition sequences, and measure fire spread rates. This provides ground truth for validating fire behavior models. **Evacuation route analysis:** Understanding where and when roads became impassable helps evaluate evacuation effectiveness and improve future evacuation planning. **Structural vulnerability assessment:** By documenting which structures survived versus which were destroyed, correlating with construction methods, defensible space, and micro-topography, we can improve building codes and mitigation strategies. ### 6.2 Broader Research Contributions **Camera network calibration theory:** The wave-based approach to sparse light field sampling has applications beyond wildfire documentation: - Autonomous vehicle sensor fusion (localizing dashcam footage) - Augmented reality (aligning user-generated content with geographic databases) - Forensic video analysis (geolocating video evidence) - Archaeological documentation (registering historical photographs) **Computational landscape analysis:** This work extends the concept of landscapes as computational substrates. The light field itself becomes a readable medium encoding spatial structure, and the hierarchical tile organization provides a natural multi-scale framework for analyzing landscape processes. **Bidirectional field methods:** The resonance-based calibration approach, inspired by Wheeler-Feynman absorber theory, demonstrates how reciprocity principles can solve inverse problems in computer vision. This has potential applications in: - Medical imaging (CT/MRI reconstruction) - Seismic imaging (subsurface structure from wave propagation) - Radio astronomy (interferometric imaging) ### 6.3 Integration with SimTable Platform The localized video corpus will be integrated with SimTable's agent-based wildfire simulation platform: **Model validation:** Observed fire progression from georeferenced video provides spatiotemporal ground truth for validating SimTable's fire spread predictions **Data assimilation:** Real-time video localization during future fires could feed into SimTable's simulation engine, allowing dynamic updating of fire perimeter estimates and improved short-term predictions **Training data generation:** The dataset of localized fire behavior observations can train machine learning models for fire behavior prediction, particularly for structure-to-structure fire spread in wildland-urban interface settings ## 7. Timeline and Milestones **Months 1-2:** Data acquisition and preprocessing - Collect and organize Pacific Palisades video corpus - Download and tile-organize Street View panoramas, DEM, OSM data for study area - Implement frequency fingerprinting pipeline - Build tile database and indexing system **Months 3-4:** Algorithm development - Implement hierarchical search and pose hypothesis generation - Develop geometric verification and wave propagation validation modules - Create temporal tracking and trajectory smoothing algorithms - Establish ground truth dataset through manual annotation **Months 5-6:** Validation and refinement - Run localization on full video corpus - Compare against ground truth, compute accuracy metrics - Conduct ablation studies - Refine algorithms based on failure mode analysis **Months 7-8:** Analysis and documentation - Analyze fire progression patterns from localized videos - Generate spatiotemporal visualizations - Write research paper for submission to computer vision and/or remote sensing journal - Prepare open source code release **Months 9-12:** Broader applications and dissemination - Apply methodology to additional wildfire events (if available) - Integrate with SimTable validation workflows - Present at conferences (CVPR, ICCV, or domain conferences like IGARSS, Fire and Forest Meteorology) - Engage with emergency management community for operational deployment planning ## 8. Theoretical Implications and Future Directions ### 8.1 Light Field as Physical Field This research takes seriously the idea that the light field is not merely a geometric abstraction but a manifestation of the electromagnetic field configuration in space. By treating image samples as measurements of wave functions rather than collections of discrete intensities, we gain access to wave-theoretic tools (interference, diffraction, propagation operators) that are more powerful than pure geometric methods. Future work could extend this by: - Explicitly recovering phase information through transport-of-intensity methods or interferometric techniques - Using coherence properties of light to establish correspondences (mutual coherence function relates field samples at different points) - Exploring holographic representations where full complex field is reconstructed from intensity measurements ### 8.2 Hierarchical Spatial Reasoning The tile pyramid organization in EPSG:4326 demonstrates how hierarchical spatial decomposition aligns naturally with multi-scale physical phenomena. Coarse tiles capture large-scale structure (terrain, urban layout) while fine tiles resolve local details. This hierarchy mirrors: - Wavelet decompositions in signal processing - Octree/quadtree structures in computer graphics - Multi-resolution analysis in numerical PDE solvers Future research could explore how this hierarchical structure relates to: - Scale-free or fractal organization in landscapes - Renormalization group ideas from physics (how phenomena at one scale emerge from finer-scale interactions) - Information-theoretic measures of landscape complexity at different scales ### 8.3 Bidirectional Causation and Resonance The Wheeler-Feynman inspired approach of requiring consistency between forward (scene→camera) and backward (camera→scene) propagation represents a deeper principle: that observations are transactions between systems, not unidirectional measurements. This suggests: - Calibration problems can be framed as finding configurations that satisfy reciprocity constraints - The "resonance" metaphor is precise: we're looking for parameter settings where forward and backward models constructively interfere - This connects to variational principles in physics where true trajectories are those that extremize action Future theoretical work could formalize this connection: - Can camera calibration be derived from a least-action principle? - Do Noether-like symmetries relate conserved quantities to geometric invariances in the calibration problem? - Does Step Theory (proposed dual to Noether's theorem for far-from-equilibrium systems) apply to information propagation in sensor networks? ### 8.4 Landscape as Computational Substrate This work contributes to a broader research program viewing landscapes not as passive geometric stages but as active computational media: - The light field encodes information about scene structure - Cameras sample this field sparsely - The calibration process reconstructs the scene/field configuration from samples - This is computation: input (samples) → processing (matching, propagation) → output (scene structure, camera poses) The landscape itself (terrain, buildings, vegetation) acts as a constraint field shaping what light fields are possible. The dual-field reciprocity framework (constraint field ↔ potential field → observable patterns) manifests here as: - Geometric constraints (DEM, OSM) ↔ Light propagation → Observable images Neither field is primary; they co-determine each other. This perspective has implications for: - Understanding how spatial structure generates observable patterns (forward problem) - Inferring spatial structure from observations (inverse problem) - Designing sensor networks to maximize information capture - Thinking about ecological and geophysical processes as distributed computation ## 9. Conclusion We have proposed a wave-based approach to camera network calibration that treats images as sparse samples of continuous light fields and uses spatial frequency fingerprinting to establish correspondence with georeferenced panoramic imagery. The methodology extends classical epipolar geometry to wave function sampling, enabling robust localization even under severe scene changes. Application to the Pacific Palisades wildfire documentation problem demonstrates the practical value: hundreds of uncalibrated videos can be automatically georeferenced, providing unprecedented spatiotemporal data on fire progression. Beyond immediate utility, this work contributes to fundamental understanding of: - How sparse samples constrain continuous field reconstruction - The role of reciprocity and resonance in inverse problems - Hierarchical spatial reasoning across scales - Landscapes as computational substrates encoding and transforming information The open source implementation will enable both replication and extension to other domains where sensor localization from appearance matching is required despite scene changes - disaster response, forensic analysis, historical photograph registration, and autonomous navigation. ## References *To be completed with full citations to:* - Alvy Ray Smith's pixel-as-sample exposition - Carver Mead's collective electrodynamics and quantum field interpretation - Light field rendering literature (Levoy, Gortler, et al.) - Structure from motion and SLAM literature - Spherical harmonics in graphics (Ramamoorthi, Hanrahan, et al.) - Wheeler-Feynman absorber theory - Acoustic fingerprinting methods (Shazam/Audio recognition) - 3D Gaussian Splatting (Kerbl et al., 2023) - Transport of Intensity equation for phase recovery - SimTable fire simulation platform documentation ## Acknowledgments This research builds on collaborations with Stuart Kauffman (theoretical foundations of self-organization), Craig Douglas (visualization pedagogy), Michael Mehaffy (pattern languages and emergence), and the Harvard Visualization Research and Teaching Laboratory. SimTable deployments with CAL FIRE, Texas A&M Forest Service, and Australian fire agencies have shaped understanding of operational fire documentation needs. --- **Contact:** Stephen Guerin, sguerin@simtable.com **Project Repository:** [To be established upon funding] **License:** MIT (code), CC-BY (documentation)