AlphaEarth Satellite Embeddings : révolution ou gadget pour l’exploration minière ?
Vectorizing the Crust: Operationalizing Satellite Embeddings for Regional Mineral Targeting
Current Situation Analysis
Remote sensing has long been the backbone of greenfield mineral exploration. Traditional workflows rely on spectral band math, atmospheric correction, and manual thresholding to isolate alteration halos, structural lineaments, and lithological boundaries. While theoretically sound, this approach has become a bottleneck in modern exploration programs.
The industry pain point is not a lack of data; it is the computational and interpretive overhead required to make that data analysis-ready. Processing raw Sentinel-2 or Landsat scenes demands rigorous atmospheric correction, cloud/shadow masking, topographic normalization, and seasonal compositing. Even after preprocessing, band ratios (e.g., SWIR combinations for clay minerals or iron oxides) are highly sensitive to local illumination, moisture content, and vegetation cover. A ratio calibrated for an arid porphyry system in Chile frequently fails when applied to a semi-arid epithermal district in Nevada, forcing teams to rebuild preprocessing pipelines for every new permit.
This problem is often overlooked because exploration teams treat remote sensing as a cartographic exercise rather than a pattern recognition problem. The focus remains on extracting physically interpretable indices, ignoring that modern self-supervised learning can compress multi-modal observations into stable, transferable representations. The result is a workflow that consumes 60–70% of project time on data wrangling, leaving minimal bandwidth for actual geological interpretation.
Google’s AlphaEarth initiative addresses this by shifting from explicit band math to learned vector representations. Instead of manipulating reflectance values, teams can now query a 64-dimensional embedding space that inherently fuses optical, radar, topographic, and climatic signals. The dataset (GOOGLE/SATELLITE_EMBEDDING/V1_ANNUAL) is pre-processed, cloud-mosaicked, and globally consistent, effectively decoupling data preparation from geological analysis.
WOW Moment: Key Findings
The transition from spectral ratios to learned embeddings fundamentally changes how exploration teams scale their targeting efforts. The following comparison highlights the operational shift:
| Approach | Preprocessing Overhead | Cross-Region Transferability | Interpretability | Temporal Granularity | Computational Cost |
|---|---|---|---|---|---|
| Traditional Spectral Ratios | High (atmospheric correction, masking, compositing) | Low (calibration drift across climates) | High (physical band relationships) | Multi-temporal (daily/weekly) | Client-side heavy (local processing) |
| AlphaEarth Embeddings | Near-zero (analysis-ready composites) | High (learned invariance across zones) | Low (statistical compression, no direct physical mapping) | Annual (2017–2025) | Server-side optimized (GEE execution) |
This finding matters because it enables similarity-driven exploration at continental scales. Instead of deriving new indices for each target type, teams can define a reference signature from a known deposit or field sample and propagate that vector across millions of hectares. The embedding space normalizes for phenology, illumination, and atmospheric noise, allowing direct comparison between geographically isolated regions. For exploration programs managing multiple greenfield licenses, this reduces the targeting phase from months to days while maintaining consistent analytical baselines.
Core Solution
Implementing satellite embeddings for mineral targeting requires a shift from pixel-wise arithmetic to vector-space operations. The workflow leverages Google Earth Engine’s server-side execution to avoid client-side memory constraints and ensure reproducible results.
Architecture Decisions & Rationale
- Server-Side Vector Operations: All similarity calculations must run within GEE. Transferring 64-band rasters to local memory for Python-based distance calculations will trigger memory limits and network timeouts.
- Cosine Similarity Over Euclidean Distance: The embedding vectors are normalized to a roughly -5 to +5 range with non-Gaussian distributions. Cosine similarity measures angular alignment, making it robust to magnitude variations caused by local topography or moisture. Euclidean distance would over-penalize scale differences and produce noisy results.
- Annual Composites for Stability: Phenological cycles and cloud cover introduce high-frequency noise. Annual mosaics smooth out seasonal vegetation changes and atmospheric artifacts, aligning with the geological timescale of alteration halos.
- Threshold Calibration via Known Endmembers: Instead of arbitrary cutoffs, thresholds should be derived from statistical distributions of similarity scores within validated training zones.
Implementation Workflow
The following implementation demonstrates a production-ready pipeline for regional similarity search. It uses a modular structure, server-side execution, and explicit threshold calibration.
import ee
import geemap
# Initialize Earth Engine
ee.Initialize()
class GeoEmbeddingTargeter:
def __init__(self, collection_id="GOOGLE/SATELLITE_EMBEDDING/V1_ANNUAL"):
self.collection = ee.ImageCollection(collection_id)
self.band_prefix = "A"
self.band_count = 64
self.band_names = [f"{self.band_prefix}{i:02d}" for i in range(self.band_count)]
def load_annual_composite(self, year=2023):
"""Extracts the first available annual composite for the specified year."""
start_date = f"{year}-01-01"
end_date = f"{year}-12-31"
return self.collection.filterDate(start_date, end_date).first()
def extract_reference_signature(self, image, region, reducer=ee.Reducer.mean()):
"""Computes the mean embedding vector from a known target zone."""
stats = image.reduceRegion(reducer=reducer, geometry=region, scale=10, maxPixels=1e9)
# Convert dictionary to a single-band image per dimension for vector math
vector_image = ee.Image.fromPixels(
ee.Dictionary({b: stats.get(b) for b in self.band_names}),
self.band_names
)
return vector_image
def compute_cosine_similarity(self, embedding_image, reference_vector):
"""Calculates cosine similarity server-side across the entire region."""
# Normalize both vectors to unit length
def normalize(img):
magnitude = img.pow(2).reduce(ee.Reducer.sum()).sqrt()
return img.divide(magnitude)
norm_embedding = normalize(embedding_image)
norm_reference = normalize(reference_vector)
# Dot product of normalized vectors equals cosine similarity
similarity = norm_embedding.multiply(norm_reference).reduce(ee.Reducer.sum())
return similarity.rename("cosine_sim")
def generate_favorability_map(self, similarity_image, lower_threshold=0.85, upper_threshold=0.95):
"""Classifies similarity scores into exploration priority zones."""
high_priority = similarity_image.gte(upper_threshold).selfMask()
medium_priority = similarity_image.gte(lower_threshold).lt(upper_threshold).selfMask()
return {
"high": high_priority,
"medium": medium_priority,
"raw_similarity": similarity_image
}
# Usage Example
targeter = GeoEmbeddingTargeter()
annual_img = targeter.load_annual_composite(2023)
# Define known deposit geometry (replace with actual ROI)
known_deposit_roi = ee.Geometry.Point([-70.5, -25.0]).buffer(500)
# Extract reference vector
ref_vector = targeter.extract_reference_signature(annual_img, known_deposit_roi)
# Compute similarity across broader exploration area
similarity_map = targeter.compute_cosine_similarity(annual_img, ref_vector)
# Generate classified favorability zones
favorability = targeter.generate_favorability_map(similarity_map)
# Export or visualize
# geemap.Map().addLayer(favorability['high'], {'palette': ['red']}, 'High Priority')
Why This Structure Works
- Modular Class Design: Encapsulates band naming, vector extraction, and similarity logic, making it reusable across different target types (e.g., porphyry Cu vs. orogenic Au).
- Server-Side Normalization: The
normalizefunction computes magnitude and division entirely within GEE’s execution graph, preventing client-side data transfer. - Explicit Thresholding: Separates raw similarity scores from operational decision layers, allowing teams to adjust sensitivity without reprocessing the entire dataset.
- Scalable Export: The output dictionaries can be directly passed to
ee.Image.exportorgeemapvisualization tools without intermediate file generation.
Pitfall Guide
1. Treating Embedding Bands as Physical Features
Explanation: The 64 dimensions (A00–A63) are statistical compressions, not spectral bands. Attempting to interpret A12 as "clay content" or A05 as "iron oxide" will lead to incorrect geological conclusions.
Fix: Validate embeddings against ground truth or traditional indices. Use them as similarity anchors, not direct mineralogical proxies.
2. Ignoring Canopy Penetration Limits
Explanation: While Sentinel-1 SAR data is integrated into the embeddings, L-band penetration is limited. Dense tropical forests mask underlying lithology, causing the model to prioritize canopy structure over bedrock signatures. Fix: Restrict embedding-based targeting to arid, semi-arid, or sparsely vegetated regions. In forested zones, combine with airborne geophysics or LiDAR-derived terrain models.
3. Using Euclidean Distance on Normalized Vectors
Explanation: Euclidean distance penalizes magnitude differences. Since embedding values are normalized but non-Gaussian, magnitude variations often reflect local moisture or topographic shading rather than lithological change. Fix: Always use cosine similarity or angular distance. It measures directional alignment in the 64D space, which correlates better with consistent geological signatures.
4. Overlooking GEE Execution Quotas
Explanation: Processing global or continental-scale similarity maps can trigger compute limits or task timeouts. Running unoptimized client-side loops will fail silently or incur unexpected costs.
Fix: Use reduceRegion with maxPixels limits, chunk exports into tiles, and monitor the GEE Code Editor task queue. Implement exponential backoff for export retries.
5. Skipping Ground-Truth Calibration
Explanation: Similarity scores are relative. A 0.92 cosine score in one region may represent a different geological context than 0.92 in another due to training bias or local environmental factors. Fix: Always calibrate thresholds using known deposits within the target region. Generate ROC curves comparing similarity scores against validated mineral occurrences to establish local cutoffs.
6. Assuming Temporal Stability for Active Mining
Explanation: The dataset provides annual composites. It is unsuitable for monitoring active pit walls, waste rock dumps, or seasonal hydrological changes. Fix: Reserve embeddings for greenfield targeting and regional screening. Use high-frequency Sentinel-2/Landsat time series for operational mine monitoring and environmental compliance.
7. Cross-Climate Transfer Without Adjustment
Explanation: The model was likely trained with higher representation in arid zones. Applying the same similarity thresholds to humid tropical regions will yield false positives due to vegetation moisture masking bedrock signals. Fix: Implement region-specific threshold calibration. Use climate stratification (e.g., Köppen-Geiger zones) to adjust similarity cutoffs before field deployment.
Production Bundle
Action Checklist
- Initialize GEE environment and verify project quotas before scaling to continental extents
- Extract reference vectors from validated deposits or field-confirmed outcrops, not arbitrary pixels
- Implement server-side cosine similarity to avoid client-side memory bottlenecks
- Calibrate similarity thresholds using local ground truth; do not apply global cutoffs blindly
- Mask dense forest and permanent snow zones prior to similarity calculation
- Export results as tiled GeoTIFFs to comply with GEE export limits and ensure QGIS compatibility
- Cross-validate high-similarity zones with traditional spectral indices (e.g., ASTER 6/8, S2 AIT)
- Document threshold decisions and training data provenance for audit and reproducibility
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|---|---|---|
| Greenfield targeting across multiple permits | AlphaEarth Embeddings + Cosine Similarity | Rapid screening, cross-region consistency, minimal preprocessing | Low (GEE free tier covers research/NGO; compute costs scale linearly) |
| Precise clay/iron oxide mapping for metallurgy | Traditional Spectral Ratios (ASTER/S2) | Physically interpretable, validated for mineral quantification | Medium (requires atmospheric correction, local calibration) |
| Active mine monitoring & waste rock tracking | High-Frequency Time Series (Sentinel-2/Landsat) | Daily/weekly resolution captures operational changes | High (requires cloud masking, frequent processing, storage) |
| Forested tropical exploration | Airborne Geophysics + LiDAR | Embeddings cannot penetrate dense canopy; SAR limited to surface structure | High (acquisition costs, but necessary for reliable targeting) |
| Regulatory reporting & publishable methods | Explicit Band Math + Statistical Validation | Transparent, reproducible, meets compliance standards | Low-Medium (higher analyst time, lower computational overhead) |
Configuration Template
# production_config.py
import ee
# Earth Engine Configuration
PROJECT_ID = "your-gcp-project-id"
GEE_SERVICE_ACCOUNT = "your-service@project.iam.gserviceaccount.com"
KEY_FILE_PATH = "/path/to/service-account-key.json"
# Embedding Pipeline Parameters
EMBEDDING_COLLECTION = "GOOGLE/SATELLITE_EMBEDDING/V1_ANNUAL"
TARGET_YEAR = 2023
SIMILARITY_LOWER_THRESHOLD = 0.85
SIMILARITY_UPPER_THRESHOLD = 0.95
EXPORT_SCALE = 10
MAX_PIXELS = 1e9
REGION_MASK_TYPES = ["dense_forest", "permanent_snow", "ocean"]
# Validation & Calibration
KNOWN_DEPOSIT_GEOMETRY = ee.Geometry.Polygon([
[-70.6, -25.1], [-70.4, -25.1], [-70.4, -24.9], [-70.6, -24.9]
])
CALIBRATION_METRIC = "cosine_similarity"
THRESHOLD_TUNING_METHOD = "local_roc_optimization"
def initialize_ee():
"""Secure GEE initialization with service account credentials."""
credentials = ee.ServiceAccountCredentials(GEE_SERVICE_ACCOUNT, KEY_FILE_PATH)
ee.Initialize(credentials, project=PROJECT_ID)
print("GEE initialized successfully.")
# Run initialization
initialize_ee()
Quick Start Guide
- Authenticate & Initialize: Set up a GCP service account with Earth Engine access. Run the initialization block to authenticate and verify quota limits.
- Load Annual Composite: Query
GOOGLE/SATELLITE_EMBEDDING/V1_ANNUALfor your target year. Extract the first available image to ensure cloud-mosaicked consistency. - Define Reference Zone: Digitize a polygon around a known deposit or field-validated outcrop. Extract the mean 64D vector using
reduceRegion. - Compute Similarity: Apply server-side cosine similarity across your exploration ROI. Classify results using locally calibrated thresholds (0.85–0.95 range).
- Export & Validate: Export high-priority zones as tiled GeoTIFFs. Overlay with regional geology maps and traditional spectral indices before committing to field campaigns.
Satellite embeddings do not replace geological expertise; they compress data preparation into a single query. When integrated into a hybrid workflow—screening with vectors, validating with indices, confirming with field work—they transform regional targeting from a months-long bottleneck into a repeatable, scalable operation.
Mid-Year Sale — Unlock Full Article
Base plan from just $4.99/mo or $49/yr
Sign in to read the full article and unlock all tutorials.
Sign In / Register — Start Free Trial7-day free trial · Cancel anytime · 30-day money-back
