Comprehensive geospatial science skill covering remote sensing, GIS, spatial analysis, machine learning for earth observation, and 30+ scientific domains. Supports satellite imagery processing (Sentinel, Landsat, MODIS, SAR, hyperspectral), vector and raster data operations, spatial statistics, point cloud processing, network analysis, and 7 programming languages (Python, R, Julia, JavaScript, C++, Java, Go) with 500+ code examples. Use for remote sensing workflows, GIS analysis, spatial ML, Earth observation data processing, terrain analysis, hydrological modeling, marine spatial analysis, atmospheric science, and any geospatial computation task.
GeoMaster is a comprehensive geospatial science skill covering the full spectrum of geographic information systems, remote sensing, spatial analysis, and machine learning for Earth observation. This skill provides expert knowledge across 70+ topics with 500+ code examples in 7 programming languages.
# Install via conda (recommended for geospatial dependencies)
conda install -c conda-forge gdal rasterio fiona shapely pyproj geopandas
# Or via uv
uv pip install geopandas rasterio fiona shapely pyproj
# Core remote sensing libraries
uv pip install rsgislib torchgeo eo-learn
# For Google Earth Engine
uv pip install earthengine-api
# For SNAP integration
# Download from: https://step.esa.int/main/download/
# QGIS Python bindings (usually installed with QGIS)
# ArcPy requires ArcGIS Pro installation
# GRASS GIS
conda install -c conda-forge grassgrass
# SAGA GIS
conda install -c conda-forge saga-gis
# Deep learning for remote sensing
uv pip install torch-geometric tensorflow-caney
# Spatial machine learning
uv pip install libpysal esda mgwr
uv pip install scikit-learn xgboost lightgbm
# LiDAR processing
uv pip install laspy pylas
# Point cloud manipulation
uv pip install open3d pdal
# Photogrammetry
uv pip install opendm
# Street network analysis
uv pip install osmnx networkx
# Routing engines
uv pip install osrm pyrouting
# Static mapping
uv pip install cartopy contextily mapclassify
# Interactive web maps
uv pip install folium ipyleaflet keplergl
# 3D visualization
uv pip install pydeck pythreejs
# Distributed geospatial processing
uv pip install dask-geopandas
# Xarray for multidimensional arrays
uv pip install xarray rioxarray
# Planetary Computer
uv pip install pystac-client planetary-computer
# PostGIS
conda install -c conda-forge postgis
# SpatiaLite
conda install -c conda-forge spatialite
# GeoAlchemy2 for SQLAlchemy
uv pip install geoalchemy2
# R geospatial packages
# install.packages(c("sf", "terra", "raster", "terra", "stars"))
# Julia geospatial packages
# import Pkg; Pkg.add(["ArchGDAL", "GeoInterface", "GeoStats.jl"])
# JavaScript (Node.js)
# npm install @turf/turf terraformer-arcgis-parser
# Java
# Maven: org.geotools:gt-main
import rasterio
import numpy as np
# Open Sentinel-2 imagery
with rasterio.open('sentinel2.tif') as src:
# Read red (B04) and NIR (B08) bands
red = src.read(4)
nir = src.read(8)
# Calculate NDVI
ndvi = (nir.astype(float) - red.astype(float)) / (nir + red)
ndvi = np.nan_to_num(ndvi, nan=0)
# Save result
profile = src.profile
profile.update(count=1, dtype=rasterio.float32)
with rasterio.open('ndvi.tif', 'w', **profile) as dst:
dst.write(ndvi.astype(rasterio.float32), 1)
print(f"NDVI range: {ndvi.min():.3f} to {ndvi.max():.3f}")
import geopandas as gpd
# Load spatial data
zones = gpd.read_file('zones.geojson')
points = gpd.read_file('points.geojson')
# Ensure same CRS
if zones.crs != points.crs:
points = points.to_crs(zones.crs)
# Spatial join (points within zones)
joined = gpd.sjoin(points, zones, how='inner', predicate='within')
# Calculate statistics per zone
stats = joined.groupby('zone_id').agg({
'value': ['count', 'mean', 'std', 'min', 'max']
}).round(2)
print(stats)
import ee
import pandas as pd
# Initialize Earth Engine
ee.Initialize(project='your-project-id')
# Define region of interest
roi = ee.Geometry.Point([-122.4, 37.7]).buffer(10000)
# Get Sentinel-2 collection
s2 = (ee.ImageCollection('COPERNICUS/S2_SR_HARMONIZED')
.filterBounds(roi)
.filterDate('2020-01-01', '2023-12-31')
.filter(ee.Filter.lt('CLOUDY_PIXEL_PERCENTAGE', 20)))
# Add NDVI band
def add_ndvi(image):
ndvi = image.normalizedDifference(['B8', 'B4']).rename('NDVI')
return image.addBands(ndvi)
s2_ndvi = s2.map(add_ndvi)
# Extract time series
def extract_series(image):
stats = image.reduceRegion(
reducer=ee.Reducer.mean(),
geometry=roi.centroid(),
scale=10,
maxPixels=1e9
)
return ee.Feature(None, {
'date': image.date().format('YYYY-MM-dd'),
'ndvi': stats.get('NDVI')
})
series = s2_ndvi.map(extract_series).getInfo()
df = pd.DataFrame([f['properties'] for f in series['features']])
df['date'] = pd.to_datetime(df['date'])
print(df.head())
Understanding CRS is fundamental to geospatial work:
See coordinate-systems.md for comprehensive CRS reference.
Vector Data: Points, lines, polygons with discrete boundaries
Raster Data: Grid of cells with continuous values
| Type | Examples | Libraries |
|---|---|---|
| Vector | Shapefiles, GeoJSON, GeoPackage | GeoPandas, Fiona, GDAL |
| Raster | GeoTIFF, NetCDF, IMG | Rasterio, GDAL, Xarray |
| Point Cloud | LAZ, LAS, PCD | Laspy, PDAL, Open3D |
| Topology | TopoJSON, TopoArchive | TopoJSON, NetworkX |
| Spatiotemporal | Trajectories, Time-series | MovingPandas, PyTorch Geometric |
Key Open Geospatial Consortium standards:
import rasterio
import numpy as np
def calculate_indices(image_path, output_path):
"""Calculate NDVI, EVI, SAVI, and NDWI from Sentinel-2."""
with rasterio.open(image_path) as src:
# Read bands: B2=Blue, B3=Green, B4=Red, B8=NIR, B11=SWIR1
blue = src.read(2).astype(float)
green = src.read(3).astype(float)
red = src.read(4).astype(float)
nir = src.read(8).astype(float)
swir1 = src.read(11).astype(float)
# Calculate indices
ndvi = (nir - red) / (nir + red + 1e-8)
evi = 2.5 * (nir - red) / (nir + 6*red - 7.5*blue + 1)
savi = ((nir - red) / (nir + red + 0.5)) * 1.5
ndwi = (green - nir) / (green + nir + 1e-8)
# Stack and save
indices = np.stack([ndvi, evi, savi, ndwi])
profile = src.profile
profile.update(count=4, dtype=rasterio.float32)
with rasterio.open(output_path, 'w', **profile) as dst:
dst.write(indices)
# Usage
calculate_indices('sentinel2.tif', 'indices.tif')
from sklearn.ensemble import RandomForestClassifier
import geopandas as gpd
import rasterio
from rasterio.features import rasterize
import numpy as np
def classify_imagery(raster_path, training_gdf, output_path):
"""Train Random Forest classifier and classify imagery."""
# Load imagery
with rasterio.open(raster_path) as src:
image = src.read()
profile = src.profile
transform = src.transform
# Extract training data
X_train, y_train = [], []
for _, row in training_gdf.iterrows():
mask = rasterize(
[(row.geometry, 1)],
out_shape=(profile['height'], profile['width']),
transform=transform,
fill=0,
dtype=np.uint8
)
pixels = image[:, mask > 0].T
X_train.extend(pixels)
y_train.extend([row['class_id']] * len(pixels))
X_train = np.array(X_train)
y_train = np.array(y_train)
# Train classifier
rf = RandomForestClassifier(n_estimators=100, max_depth=20, n_jobs=-1)
rf.fit(X_train, y_train)
# Predict full image
image_reshaped = image.reshape(image.shape[0], -1).T
prediction = rf.predict(image_reshaped)
prediction = prediction.reshape(profile['height'], profile['width'])
# Save result
profile.update(dtype=rasterio.uint8, count=1)
with rasterio.open(output_path, 'w', **profile) as dst:
dst.write(prediction.astype(rasterio.uint8), 1)
return rf
import geopandas as gpd
from shapely.ops import unary_union
# Buffer analysis
gdf['buffer_1km'] = gdf.geometry.to_crs(epsg=32633).buffer(1000)
# Spatial relationships
intersects = gdf[gdf.geometry.intersects(other_geometry)]
contains = gdf[gdf.geometry.contains(point_geometry)]
# Geometric operations
gdf['centroid'] = gdf.geometry.centroid
gdf['convex_hull'] = gdf.geometry.convex_hull
gdf['simplified'] = gdf.geometry.simplify(tolerance=0.001)
# Overlay operations
intersection = gpd.overlay(gdf1, gdf2, how='intersection')
union = gpd.overlay(gdf1, gdf2, how='union')
difference = gpd.overlay(gdf1, gdf2, how='difference')
import rasterio
from rasterio.features import shapes
import numpy as np
def calculate_terrain_metrics(dem_path):
"""Calculate slope, aspect, hillshade from DEM."""
with rasterio.open(dem_path) as src:
dem = src.read(1)
transform = src.transform
# Calculate gradients
dy, dx = np.gradient(dem)
# Slope (in degrees)
slope = np.arctan(np.sqrt(dx**2 + dy**2)) * 180 / np.pi
# Aspect (in degrees, clockwise from north)
aspect = np.arctan2(-dy, dx) * 180 / np.pi
aspect = (90 - aspect) % 360
# Hillshade
azimuth = 315
altitude = 45
azimuth_rad = np.radians(azimuth)
altitude_rad = np.radians(altitude)
hillshade = (np.sin(altitude_rad) * np.sin(np.radians(slope)) +
np.cos(altitude_rad) * np.cos(np.radians(slope)) *
np.cos(np.radians(aspect) - azimuth_rad))
return slope, aspect, hillshade
import osmnx as ox
import networkx as nx
# Download street network
G = ox.graph_from_place('San Francisco, CA', network_type='drive')
# Add speeds and travel times
G = ox.add_edge_speeds(G)
G = ox.add_edge_travel_times(G)
# Find shortest path
orig_node = ox.distance.nearest_nodes(G, -122.4, 37.7)
dest_node = ox.distance.nearest_nodes(G, -122.3, 37.8)
route = nx.shortest_path(G, orig_node, dest_node, weight='travel_time')
# Calculate accessibility
accessibility = {}
for node in G.nodes():
subgraph = nx.ego_graph(G, node, radius=5, distance='time')
accessibility[node] = len(subgraph.nodes())
Comprehensive reference documentation is organized by topic:
import rasterio
import geopandas as gpd
from sklearn.ensemble import RandomForestClassifier
import numpy as np
# 1. Load training data
training = gpd.read_file('training_polygons.gpkg')
# 2. Load satellite imagery
with rasterio.open('sentinel2.tif') as src:
bands = src.read()
profile = src.profile
meta = src.meta
# 3. Extract training pixels
X, y = [], []
for _, row in training.iterrows():
mask = rasterize_features(row.geometry, profile['shape'])
pixels = bands[:, mask > 0].T
X.extend(pixels)
y.extend([row['class']] * len(pixels))
# 4. Train model
model = RandomForestClassifier(n_estimators=100, max_depth=20)
model.fit(X, y)
# 5. Classify image
pixels_reshaped = bands.reshape(bands.shape[0], -1).T
prediction = model.predict(pixels_reshaped)
classified = prediction.reshape(bands.shape[1], bands.shape[2])
# 6. Save result
profile.update(dtype=rasterio.uint8, count=1, nodata=255)
with rasterio.open('classified.tif', 'w', **profile) as dst:
dst.write(classified.astype(rasterio.uint8), 1)
# 7. Accuracy assessment (with validation data)
# ... (see references for complete workflow)
# 1. Download DEM (e.g., from ALOS AW3D30, SRTM, Copernicus)
# 2. Process DEM: fill sinks, calculate flow direction
# 3. Define flood scenarios (return periods)
# 4. Hydraulic modeling (HEC-RAS, LISFLOOD)
# 5. Generate inundation maps
# 6. Assess exposure (settlements, infrastructure)
# 7. Calculate damage estimates
# See references/hydrology.md for complete implementation
import ee
import pandas as pd
import matplotlib.pyplot as plt
# Initialize GEE
ee.Initialize(project='your-project')
# Define ROI
roi = ee.Geometry.Point([x, y]).buffer(5000)
# Get Landsat collection
landsat = ee.ImageCollection('LANDSAT/LC08/C02/T1_L2')\
.filterBounds(roi)\
.filterDate('2015-01-01', '2024-12-31')\
.filter(ee.Filter.lt('CLOUD_COVER', 20))
# Calculate NDVI time series
def add_ndvi(img):
ndvi = img.normalizedDifference(['SR_B5', 'SR_B4']).rename('NDVI')
return img.addBands(ndvi)
landsat_ndvi = landsat.map(add_ndvi)
# Extract time series
ts = landsat_ndvi.getRegion(roi, 30).getInfo()
df = pd.DataFrame(ts[1:], columns=ts[0])
df['date'] = pd.to_datetime(df['time'])
# Analyze trends
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(
range(len(df)), df['NDVI']
)
print(f"Trend: {slope:.6f} NDVI/year (p={p_value:.4f})")
import geopandas as gpd
import rasterio
import numpy as np
from sklearn.preprocessing import MinMaxScaler
# 1. Load criteria rasters
criteria = {
'slope': rasterio.open('slope.tif').read(1),
'distance_to_water': rasterio.open('water_dist.tif').read(1),
'soil_quality': rasterio.open('soil.tif').read(1),
'land_use': rasterio.open('landuse.tif').read(1)
}
# 2. Reclassify (lower is better for slope/distance)
weights = {'slope': 0.3, 'distance_to_water': 0.2,
'soil_quality': 0.3, 'land_use': 0.2}
# 3. Normalize (0-1, using fuzzy membership)
normalized = {}
for key, raster in criteria.items():
if key in ['slope', 'distance_to_water']:
# Decreasing suitability
normalized[key] = 1 - MinMaxScaler().fit_transform(raster.reshape(-1, 1))
else:
normalized[key] = MinMaxScaler().fit_transform(raster.reshape(-1, 1))
# 4. Weighted overlay
suitability = sum(normalized[key] * weights[key] for key in criteria)
suitability = suitability.reshape(criteria['slope'].shape)
# 5. Classify suitability levels
# (Low, Medium, High, Very High)
# 6. Save result
profile = rasterio.open('slope.tif').profile
profile.update(dtype=rasterio.float32, count=1)
with rasterio.open('suitability.tif', 'w', **profile) as dst:
dst.write(suitability.astype(rasterio.float32), 1)
Use Spatial Indexing: R-tree indexes speed up spatial queries by 10-100x
gdf.sindex # Automatically created by GeoPandas
Chunk Large Rasters: Process in blocks to avoid memory errors
with rasterio.open('large.tif') as src:
for window in src.block_windows():
block = src.read(window=window)
Use Dask for Big Data: Parallel processing on large datasets
import dask.array as da
dask_array = da.from_rasterio('large.tif', chunks=(1, 1024, 1024))
Enable GDAL Caching: Speed up repeated reads
import gdal
gdal.SetCacheMax(2**30) # 1GB cache
Use Arrow for I/O: Faster file reading/writing
gdf.to_file('output.gpkg', use_arrow=True)
Reproject Once: Do all analysis in a single projected CRS
Use Efficient Formats: GeoPackage > Shapefile, Parquet for large datasets
Simplify Geometries: Reduce complexity when precision isn't critical
gdf['geometry'] = gdf.geometry.simplify(tolerance=0.0001)
Use COG for Cloud: Cloud-Optimized GeoTIFF for remote data
Enable Parallel Processing: Most libraries support n_jobs=-1
Always Check CRS before any spatial operation
assert gdf1.crs == gdf2.crs, "CRS mismatch!"
Use Appropriate CRS:
Validate Geometries before operations
gdf = gdf[gdf.is_valid]
gdf['geometry'] = gdf.geometry.make_valid()
Handle Missing Data appropriately
gdf['geometry'] = gdf['geometry'].fillna(None)
Document Projections in metadata
Use Vector Tiles for web maps with many features
Apply Cloud Masking for optical imagery
Calibrate Radiometric Values for quantitative analysis
Preserve Lineage for reproducible research
Use Appropriate Spatial Resolution for your analysis scale