第10章：高级空间分析步骤

本章将介绍 GeoPipeAgent 的四个高级空间分析步骤：泰森多边形（Voronoi）、热力图（Heatmap）、空间插值（Interpolate）和空间聚类（Cluster）。这些步骤依赖可选的科学计算库（scipy、sklearn），提供了强大的空间分析能力。

10.1 空间分析步骤概述

10.1.1 四个分析步骤

步骤 ID	功能	核心算法	输出类型
`analysis.voronoi`	泰森多边形	scipy.spatial.Voronoi	GeoDataFrame
`analysis.heatmap`	热力图	KDE 核密度估计	raster_info
`analysis.interpolate`	空间插值	scipy.interpolate / IDW	raster_info
`analysis.cluster`	空间聚类	DBSCAN / KMeans	GeoDataFrame

10.1.2 [analysis] 可选依赖

高级分析步骤需要安装额外的科学计算库：

# 安装分析步骤依赖
pip install geopipe-agent[analysis]

这会安装以下依赖包：

依赖	版本	用途
`scipy`	≥1.10	空间算法（Voronoi、插值、KDE）
`scikit-learn`	≥1.2	聚类算法（DBSCAN、KMeans）

如果未安装这些依赖，分析步骤的模块仍会被加载和注册（因为 @step 装饰器在模块顶层执行），但实际执行时会因为无法导入依赖而抛出 ImportError。

10.1.3 延迟导入策略

分析步骤使用延迟导入来避免在未安装可选依赖时导致整个应用启动失败：

@step(id="analysis.voronoi", ...)
def voronoi(ctx):
    # 延迟导入——仅在步骤执行时才检查依赖
    from scipy.spatial import Voronoi as ScipyVoronoi
    from shapely.geometry import Polygon
    ...

这种设计意味着：

未安装 scipy 时，geopipe-agent list-steps 仍能列出 analysis.voronoi
只有在实际执行该步骤时才会报错
其他不依赖 scipy 的步骤不受影响

10.2 analysis.voronoi 泰森多边形

10.2.1 功能说明

analysis.voronoi 根据输入点要素生成泰森多边形（Voronoi 图）。泰森多边形将平面划分为若干区域，每个区域内的任意一点到其对应的输入点的距离都比到其他输入点的距离更近。

泰森多边形的典型应用：

气象站控制范围：每个气象站的服务区域
商圈分析：每个商店的最近顾客范围
资源分配：医院、学校等公共设施的服务区域划分

10.2.2 参数定义

参数	类型	必需	默认值	说明
`input`	`GeoDataFrame`	✅	—	输入点要素数据
`clip_to_bounds`	`bool`	❌	`True`	是否裁剪到输入数据的边界范围
`buffer`	`float`	❌	`0.0`	边界扩展距离

10.2.3 scipy.spatial.Voronoi 原理

scipy.spatial.Voronoi 基于 Qhull 库计算 Voronoi 图：

from scipy.spatial import Voronoi

# 输入点坐标
points = np.array([
    [0, 0], [1, 0], [0, 1], [1, 1], [0.5, 0.5]
])

# 计算 Voronoi 图
vor = Voronoi(points)

# vor.vertices: Voronoi 顶点坐标
# vor.regions: 每个区域的顶点索引列表
# vor.ridge_vertices: 每条边的顶点索引对
# vor.point_region: 每个输入点对应的区域索引

原始的 Voronoi 图中，边界处的区域是无界的（延伸到无穷远）。analysis.voronoi 通过裁剪处理将无界区域限制在合理范围内。

10.2.4 clip_to_bounds 参数

clip_to_bounds=False：               clip_to_bounds=True：
泰森多边形延伸到无穷远                裁剪到输入数据的边界框

    ╱╲      ╱╲                      ┌────────────────┐
   ╱  ╲    ╱  ╲                     │╲      ╱╲      ╱│
  ╱  • ╲  ╱  • ╲                   │ ╲  • ╱  ╲  • ╱ │
 ╱      ╲╱      ╲                  │  ╲  ╱    ╲  ╱  │
╱────────╳────────╲                │   ╲╱  •   ╲╱   │
╲      ╱╲╱╲      ╱                 │   ╱╲      ╱╲   │
 ╲  • ╱    ╲  • ╱                  │  ╱  ╲    ╱  ╲  │
  ╲  ╱      ╲  ╱                   │ ╱  • ╲  ╱  • ╲ │
   ╲╱        ╲╱                    │╱      ╲╱      ╲│
                                   └────────────────┘

当 clip_to_bounds=True（默认）时，输出的泰森多边形被裁剪到输入点集的边界框。buffer 参数可在此基础上扩展边界：

# buffer=0: 裁剪到点集的精确边界框
# buffer=100: 边界框向外扩展 100 个单位

10.2.5 源码分析

@step(id="analysis.voronoi", ...)
def voronoi(ctx):
    from scipy.spatial import Voronoi as ScipyVoronoi
    from shapely.geometry import Polygon, box
    from shapely.ops import clip_by_rect

    gdf = ctx.input("input")
    clip_to_bounds = ctx.param("clip_to_bounds")
    buffer_dist = ctx.param("buffer") or 0.0

    if clip_to_bounds is None:
        clip_to_bounds = True

    # 提取点坐标
    coords = np.array([(geom.x, geom.y) for geom in gdf.geometry])

    # 计算 Voronoi 图
    vor = ScipyVoronoi(coords)

    # 计算边界框
    minx, miny = coords.min(axis=0) - buffer_dist
    maxx, maxy = coords.max(axis=0) + buffer_dist
    bounding_box = box(minx, miny, maxx, maxy)

    # 构建多边形
    polygons = []
    for point_idx, region_idx in enumerate(vor.point_region):
        region = vor.regions[region_idx]
        if -1 in region or len(region) == 0:
            # 无界区域：使用大多边形近似后裁剪
            polygon = _construct_unbounded_region(vor, point_idx, bounding_box)
        else:
            vertices = [vor.vertices[i] for i in region]
            polygon = Polygon(vertices)

        if clip_to_bounds and polygon is not None:
            polygon = polygon.intersection(bounding_box)

        polygons.append(polygon)

    # 构建输出 GeoDataFrame
    result = gpd.GeoDataFrame(
        gdf.drop(columns="geometry").copy(),
        geometry=polygons,
        crs=gdf.crs,
    )

    stats = {
        "feature_count": len(result),
        "input_point_count": len(gdf),
    }

    return StepResult(output=result, stats=stats)

10.2.6 使用示例

# 为气象站生成泰森多边形
steps:
  - id: read_stations
    step: io.read_vector
    params:
      path: "data/weather_stations.gpkg"

  - id: voronoi_regions
    step: analysis.voronoi
    params:
      input: "$steps.read_stations"
      clip_to_bounds: true
      buffer: 5000

  - id: save_voronoi
    step: io.write_vector
    params:
      input: "$steps.voronoi_regions"
      path: "output/station_service_areas.gpkg"

10.3 analysis.heatmap 热力图

10.3.1 功能说明

analysis.heatmap 使用核密度估计（KDE）生成热力图栅格。热力图直观展示点要素的空间密度分布，常用于：

犯罪热力图分析
人口密度可视化
交通事故频发区域识别
商业活动热度分析

10.3.2 参数定义

参数	类型	必需	默认值	说明
`input`	`GeoDataFrame`	✅	—	输入点要素数据
`resolution`	`int`	❌	`100`	输出栅格分辨率（像素数，即网格边长）
`radius`	`float`	❌	`None`	核函数半径（搜索范围），None 时自动确定
`weight_field`	`str`	❌	`None`	权重字段名（可选）

10.3.3 KDE 核密度估计原理

核密度估计是一种非参数统计方法，将每个数据点的影响通过核函数扩散到周围区域：

单个点的核函数（高斯核）：

      ╭───╮
     ╱     ╲
    ╱       ╲
   ╱    •    ╲
  ╱           ╲
 ╱             ╲
╱───────────────╲
     radius

多个点叠加：

      ╭─╮  ╭─╮
     ╱   ╲╱   ╲
    ╱     ╳     ╲
   ╱   ╱╭─╮╲    ╲
  ╱   ╱ │  │ ╲   ╲
 ╱   ╱  │  │  ╲   ╲
╱───╱───┘  └───╲───╲
   •    •  •    •

每个点产生一个核函数曲面，所有曲面叠加形成最终的密度表面。

10.3.4 resolution 参数

resolution 控制输出栅格的精细程度：

resolution	栅格大小	计算量	适用场景
50	50×50	低	快速预览
100	100×100	中	默认值，适中
500	500×500	高	高质量输出
1000	1000×1000	很高	印刷级精度

10.3.5 weight_field 权重字段

当指定 weight_field 时，每个点的影响力按权重缩放：

# 不使用权重：每个点贡献相同
- step: analysis.heatmap
  params:
    input: "$steps.read_crimes"
    resolution: 200

# 使用权重：按严重程度加权
- step: analysis.heatmap
  params:
    input: "$steps.read_crimes"
    resolution: 200
    weight_field: "severity"

10.3.6 源码分析

@step(id="analysis.heatmap", ...)
def heatmap(ctx):
    from scipy.stats import gaussian_kde

    gdf = ctx.input("input")
    resolution = ctx.param("resolution") or 100
    radius = ctx.param("radius")
    weight_field = ctx.param("weight_field")

    # 提取坐标
    coords = np.array([(geom.x, geom.y) for geom in gdf.geometry])
    x, y = coords[:, 0], coords[:, 1]

    # 提取权重
    weights = None
    if weight_field and weight_field in gdf.columns:
        weights = gdf[weight_field].values

    # 构建网格
    xmin, ymin, xmax, ymax = gdf.total_bounds
    xi = np.linspace(xmin, xmax, resolution)
    yi = np.linspace(ymin, ymax, resolution)
    xx, yy = np.meshgrid(xi, yi)
    grid_coords = np.vstack([xx.ravel(), yy.ravel()])

    # 核密度估计
    if radius is not None:
        bw_method = radius / np.std(coords, axis=0).mean()
    else:
        bw_method = "scott"

    kde = gaussian_kde(
        np.vstack([x, y]),
        bw_method=bw_method,
        weights=weights,
    )
    density = kde(grid_coords).reshape(resolution, resolution)

    # 翻转 y 轴（栅格的 y 轴从上到下）
    density = np.flipud(density)

    # 构建 raster_info
    pixel_width = (xmax - xmin) / resolution
    pixel_height = (ymax - ymin) / resolution
    transform = rasterio.transform.from_bounds(
        xmin, ymin, xmax, ymax, resolution, resolution
    )

    profile = {
        "driver": "GTiff",
        "dtype": "float32",
        "width": resolution,
        "height": resolution,
        "count": 1,
        "crs": gdf.crs,
        "transform": transform,
        "nodata": 0.0,
    }

    result = {
        "data": density[np.newaxis, ...].astype("float32"),
        "transform": transform,
        "crs": gdf.crs,
        "profile": profile,
        "path": None,
    }

    stats = {
        "resolution": resolution,
        "density_min": float(density.min()),
        "density_max": float(density.max()),
        "point_count": len(gdf),
    }

    return StepResult(output=result, stats=stats)

10.3.7 使用示例

# 生成犯罪热力图
steps:
  - id: read_crimes
    step: io.read_vector
    params:
      path: "data/crime_incidents.gpkg"

  - id: crime_heatmap
    step: analysis.heatmap
    params:
      input: "$steps.read_crimes"
      resolution: 500
      radius: 1000
      weight_field: "severity"

  - id: save_heatmap
    step: io.write_raster
    params:
      input: "$steps.crime_heatmap"
      path: "output/crime_density.tif"

10.4 analysis.interpolate 空间插值

10.4.1 功能说明

analysis.interpolate 根据离散采样点的值，估算整个区域内的连续分布。空间插值是将点数据转化为面（栅格）数据的核心方法，常用于：

气温/降水量空间分布
地下水位面插值
土壤污染物浓度分布

10.4.2 参数定义

参数	类型	必需	默认值	说明
`input`	`GeoDataFrame`	✅	—	输入点要素数据
`value_field`	`str`	✅	—	插值属性字段名
`method`	`str`	❌	`"linear"`	插值方法
`resolution`	`int`	❌	`100`	输出栅格分辨率
`power`	`float`	❌	`2.0`	IDW 权重指数（仅 IDW 方法）

10.4.3 四种插值方法

方法	说明	核心算法	适用场景
`linear`	线性插值	Delaunay 三角剖分 + 线性插值	通用场景
`nearest`	最邻近插值	最近点赋值	分类数据
`cubic`	三次插值	三次样条	平滑表面
`idw`	反距离加权	距离倒数加权平均	气象数据

10.4.4 scipy.interpolate.griddata

前三种方法（linear、nearest、cubic）基于 scipy.interpolate.griddata：

from scipy.interpolate import griddata

# points: 采样点坐标 (N, 2)
# values: 采样点值 (N,)
# grid_points: 网格坐标 (M, 2)

result = griddata(
    points,
    values,
    grid_points,
    method="linear",  # 或 "nearest", "cubic"
)

各方法的效果差异：

采样点分布：             linear 插值：           nearest 插值：
    10                   ┌──────────┐            ┌──────────┐
  •       •  30          │10 15 20 25│           │10 10 30 30│
                         │12 17 22 27│           │10 10 30 30│
    20                   │15 20 25 30│           │20 20 40 40│
  •       •  40          │20 25 30 35│           │20 20 40 40│
                         └──────────┘            └──────────┘
                         渐变平滑                 阶梯状

10.4.5 IDW 反距离加权插值

IDW（Inverse Distance Weighting）是 GIS 中最经典的插值方法：

                 Σ(wi × vi)
Z(x,y) = ───────────────
                  Σ wi

其中 wi = 1 / di^p

di = 点到第 i 个采样点的距离
vi = 第 i 个采样点的值
p  = 权重指数（power 参数）

power 参数的影响：

power	效果
1.0	距离权重衰减慢，插值结果较平滑
2.0	默认值，平衡的衰减速度
3.0+	距离权重衰减快，近点影响更大，结果更”尖锐”

# IDW 自定义实现
def idw_interpolate(points, values, grid_points, power=2.0):
    result = np.zeros(len(grid_points))

    for i, (gx, gy) in enumerate(grid_points):
        distances = np.sqrt(
            (points[:, 0] - gx)**2 + (points[:, 1] - gy)**2
        )

        # 避免除零
        zero_mask = distances == 0
        if zero_mask.any():
            result[i] = values[zero_mask][0]
        else:
            weights = 1.0 / distances ** power
            result[i] = np.sum(weights * values) / np.sum(weights)

    return result

10.4.6 源码分析

@step(id="analysis.interpolate", ...)
def interpolate(ctx):
    from scipy.interpolate import griddata

    gdf = ctx.input("input")
    value_field = ctx.param("value_field")
    method = ctx.param("method") or "linear"
    resolution = ctx.param("resolution") or 100
    power = ctx.param("power") or 2.0

    # 提取坐标和值
    coords = np.array([(geom.x, geom.y) for geom in gdf.geometry])
    values = gdf[value_field].values.astype(float)

    # 构建网格
    xmin, ymin, xmax, ymax = gdf.total_bounds
    xi = np.linspace(xmin, xmax, resolution)
    yi = np.linspace(ymin, ymax, resolution)
    xx, yy = np.meshgrid(xi, yi)

    if method == "idw":
        # 自定义 IDW 实现
        grid_points = np.column_stack([xx.ravel(), yy.ravel()])
        result = _idw(coords, values, grid_points, power)
        grid_z = result.reshape(resolution, resolution)
    else:
        # 使用 scipy griddata
        grid_z = griddata(
            coords, values,
            (xx, yy),
            method=method,
        )

    # 翻转 y 轴
    grid_z = np.flipud(grid_z)

    # 处理 NaN（外推区域）
    grid_z = np.nan_to_num(grid_z, nan=0.0)

    # 构建 raster_info
    transform = rasterio.transform.from_bounds(
        xmin, ymin, xmax, ymax, resolution, resolution
    )

    profile = {
        "driver": "GTiff",
        "dtype": "float32",
        "width": resolution,
        "height": resolution,
        "count": 1,
        "crs": gdf.crs,
        "transform": transform,
        "nodata": 0.0,
    }

    raster_info = {
        "data": grid_z[np.newaxis, ...].astype("float32"),
        "transform": transform,
        "crs": gdf.crs,
        "profile": profile,
        "path": None,
    }

    stats = {
        "method": method,
        "resolution": resolution,
        "value_min": float(np.nanmin(grid_z)),
        "value_max": float(np.nanmax(grid_z)),
        "sample_count": len(gdf),
    }

    return StepResult(output=raster_info, stats=stats)

10.4.7 使用示例

# 基于气象站数据插值温度分布
steps:
  - id: read_stations
    step: io.read_vector
    params:
      path: "data/weather_stations.gpkg"

  # IDW 插值
  - id: temp_surface
    step: analysis.interpolate
    params:
      input: "$steps.read_stations"
      value_field: "temperature"
      method: idw
      resolution: 200
      power: 2.0

  - id: save_temp
    step: io.write_raster
    params:
      input: "$steps.temp_surface"
      path: "output/temperature_surface.tif"

10.5 analysis.cluster 空间聚类

10.5.1 功能说明

analysis.cluster 对点要素进行空间聚类分析，识别空间上聚集的点群。聚类结果以 cluster_id 列的形式添加到输入数据中。常用于：

犯罪热点区域识别
商业网点聚集分析
疫情传播聚集检测
城市功能区识别

10.5.2 参数定义

参数	类型	必需	默认值	说明
`input`	`GeoDataFrame`	✅	—	输入点要素数据
`method`	`str`	❌	`"dbscan"`	聚类方法
`n_clusters`	`int`	❌	`5`	聚类数（仅 KMeans）
`eps`	`float`	❌	`0.5`	邻域半径（仅 DBSCAN）
`min_samples`	`int`	❌	`5`	最小样本数（仅 DBSCAN）

10.5.3 DBSCAN 聚类

DBSCAN（Density-Based Spatial Clustering of Applications with Noise）是一种基于密度的聚类算法，特别适合空间数据分析。

核心概念：

概念	说明
eps	邻域半径——两个点之间的最大距离阈值
min_samples	核心点需要的最小邻居数
核心点	eps 半径内有 ≥ min_samples 个邻居的点
边界点	在核心点的 eps 范围内，但自身不是核心点
噪声点	既不是核心点也不是边界点（`cluster_id = -1`）

DBSCAN 聚类示例（eps=1, min_samples=3）：

    •₁  •₁  •₁          cluster 1
      •₁  •₁

                         (间隔大于 eps)

        •₂  •₂
      •₂  •₂  •₂        cluster 2

   ×                     噪声点 (cluster_id = -1)

DBSCAN 的优势：

不需要预先指定聚类数
能发现任意形状的聚类
能识别噪声点（离群值）

10.5.4 KMeans 聚类

KMeans 将数据划分为 K 个聚类，使得每个点属于距离最近的聚类中心：

KMeans 聚类示例（n_clusters=3）：

    •₁  •₁               cluster 1 (centroid: ★₁)
      ★₁  •₁

        •₂  ★₂           cluster 2 (centroid: ★₂)
      •₂  •₂

   •₃  •₃                cluster 3 (centroid: ★₃)
     ★₃

KMeans 的特点：

特点	说明
需要指定 K	必须预先知道聚类数量
球形聚类	倾向于找到大小相近的球形聚类
无噪声概念	所有点都会被分配到某个聚类
计算效率高	适用于大数据集

10.5.5 DBSCAN vs KMeans 对比

特性	DBSCAN	KMeans
聚类数	自动确定	需手动指定
聚类形状	任意形状	球形（凸形）
噪声处理	能识别噪声点	无法识别
参数	eps, min_samples	n_clusters
适用场景	密度不均匀的空间数据	密度均匀、已知聚类数

10.5.6 添加 cluster_id 列

聚类结果以新列 cluster_id 的形式添加到原始 GeoDataFrame 中：

# 聚类前
│ id │ name   │ geometry      │
├────┼────────┼───────────────┤
│ 1  │ 点A    │ POINT(1, 2)   │
│ 2  │ 点B    │ POINT(1.1, 2) │
│ 3  │ 点C    │ POINT(5, 5)   │

# DBSCAN 聚类后
│ id │ name   │ geometry      │ cluster_id │
├────┼────────┼───────────────┼────────────┤
│ 1  │ 点A    │ POINT(1, 2)   │ 0          │
│ 2  │ 点B    │ POINT(1.1, 2) │ 0          │
│ 3  │ 点C    │ POINT(5, 5)   │ -1         │  ← 噪声点

10.5.7 源码分析

@step(id="analysis.cluster", ...)
def cluster(ctx):
    from sklearn.cluster import DBSCAN, KMeans

    gdf = ctx.input("input")
    method = ctx.param("method") or "dbscan"
    n_clusters = ctx.param("n_clusters") or 5
    eps = ctx.param("eps") or 0.5
    min_samples = ctx.param("min_samples") or 5

    # 提取坐标
    coords = np.array([(geom.x, geom.y) for geom in gdf.geometry])

    # 执行聚类
    if method == "dbscan":
        model = DBSCAN(eps=eps, min_samples=min_samples)
        labels = model.fit_predict(coords)
    elif method == "kmeans":
        model = KMeans(n_clusters=n_clusters, random_state=42, n_init=10)
        labels = model.fit_predict(coords)
    else:
        raise ValueError(f"不支持的聚类方法: {method}")

    # 添加 cluster_id 列
    result = gdf.copy()
    result["cluster_id"] = labels

    # 统计信息
    unique_labels = set(labels)
    n_clusters_found = len(unique_labels - {-1})
    n_noise = int((labels == -1).sum())

    stats = {
        "feature_count": len(result),
        "method": method,
        "n_clusters_found": n_clusters_found,
        "n_noise_points": n_noise,
    }

    return StepResult(output=result, stats=stats)

10.5.8 使用示例

# DBSCAN 聚类分析
- id: cluster_crimes
  step: analysis.cluster
  params:
    input: "$steps.read_crime_points"
    method: dbscan
    eps: 500        # 500 米邻域半径
    min_samples: 10 # 至少 10 个点才构成核心点

# KMeans 聚类分析
- id: cluster_shops
  step: analysis.cluster
  params:
    input: "$steps.read_shops"
    method: kmeans
    n_clusters: 8

10.6 空间分析步骤组合实战

10.6.1 综合空间分析流水线

以下流水线演示了一个完整的城市犯罪热点分析场景：读取犯罪数据 → 聚类识别热点 → 生成热力图 → 为聚类中心创建服务区域 → 插值犯罪密度面。

name: 城市犯罪热点综合分析
description: 对犯罪数据进行聚类、热力图和插值分析

steps:
  # 1. 读取数据
  - id: read_crimes
    step: io.read_vector
    params:
      path: "data/crime_incidents.gpkg"

  - id: read_boundary
    step: io.read_vector
    params:
      path: "data/city_boundary.gpkg"

  # 2. 投影转换（确保使用米为单位）
  - id: crimes_utm
    step: vector.reproject
    params:
      input: "$steps.read_crimes"
      target_crs: "EPSG:32650"

  # 3. 裁剪到城市范围
  - id: crimes_clipped
    step: vector.clip
    params:
      input: "$steps.crimes_utm"
      clip_geometry: "$steps.read_boundary"

  # 4. DBSCAN 聚类分析
  - id: crime_clusters
    step: analysis.cluster
    params:
      input: "$steps.crimes_clipped"
      method: dbscan
      eps: 500
      min_samples: 10

  # 5. 筛选非噪声点（属于某个聚类的点）
  - id: clustered_crimes
    step: vector.query
    params:
      input: "$steps.crime_clusters"
      expression: "cluster_id >= 0"

  # 6. 生成犯罪热力图
  - id: crime_heatmap
    step: analysis.heatmap
    params:
      input: "$steps.crimes_clipped"
      resolution: 500
      radius: 1000

  # 7. 基于犯罪密度插值
  - id: read_grid_points
    step: io.read_vector
    params:
      path: "data/analysis_grid.gpkg"

  # 8. 保存所有结果
  - id: save_clusters
    step: io.write_vector
    params:
      input: "$steps.crime_clusters"
      path: "output/crime_clusters.gpkg"
      format: gpkg

  - id: save_heatmap
    step: io.write_raster
    params:
      input: "$steps.crime_heatmap"
      path: "output/crime_heatmap.tif"

10.6.2 气象数据分析流水线

name: 气象站数据空间分析
description: 基于气象站点数据进行泰森多边形划分和温度插值

steps:
  # 1. 读取气象站数据
  - id: read_stations
    step: io.read_vector
    params:
      path: "data/weather_stations.gpkg"

  - id: read_province
    step: io.read_vector
    params:
      path: "data/province_boundary.gpkg"

  # 2. 投影转换
  - id: stations_utm
    step: vector.reproject
    params:
      input: "$steps.read_stations"
      target_crs: "EPSG:32650"

  # 3. 泰森多边形（气象站服务范围）
  - id: station_voronoi
    step: analysis.voronoi
    params:
      input: "$steps.stations_utm"
      clip_to_bounds: true
      buffer: 10000

  # 4. 裁剪到省界
  - id: voronoi_clipped
    step: vector.clip
    params:
      input: "$steps.station_voronoi"
      clip_geometry: "$steps.read_province"

  # 5. 温度 IDW 插值
  - id: temp_idw
    step: analysis.interpolate
    params:
      input: "$steps.stations_utm"
      value_field: "temperature"
      method: idw
      resolution: 300
      power: 2.0

  # 6. 降水量线性插值
  - id: precip_linear
    step: analysis.interpolate
    params:
      input: "$steps.stations_utm"
      value_field: "precipitation"
      method: linear
      resolution: 300

  # 7. 保存结果
  - id: save_voronoi
    step: io.write_vector
    params:
      input: "$steps.voronoi_clipped"
      path: "output/station_regions.gpkg"
      format: gpkg

  - id: save_temp
    step: io.write_raster
    params:
      input: "$steps.temp_idw"
      path: "output/temperature.tif"

  - id: save_precip
    step: io.write_raster
    params:
      input: "$steps.precip_linear"
      path: "output/precipitation.tif"

10.7 本章小结

本章详细介绍了 GeoPipeAgent 的四个高级空间分析步骤：

analysis.voronoi：基于 scipy.spatial.Voronoi 生成泰森多边形，支持 clip_to_bounds 裁剪和 buffer 边界扩展，常用于服务区域划分
analysis.heatmap：基于高斯核密度估计（KDE）生成热力图栅格，支持自定义 resolution、radius 和 weight_field，常用于密度可视化
analysis.interpolate：支持四种插值方法（linear / nearest / cubic / idw），前三种使用 scipy.interpolate.griddata，IDW 为自定义实现，支持 power 参数控制距离衰减
analysis.cluster：支持 DBSCAN 和 KMeans 两种聚类算法，DBSCAN 能识别噪声点和任意形状聚类，KMeans 需指定聚类数，结果以 cluster_id 列添加到数据中

这些步骤依赖 [analysis] 可选依赖包（scipy、scikit-learn），使用延迟导入策略确保未安装时不影响其他功能。通过组合这些高级分析步骤与基础的 IO 和矢量步骤，可以构建强大的空间分析工作流。

通过第 6 至第 10 章的学习，您已经深入了解了步骤注册表、IO 数据读写、矢量分析、栅格分析和高级空间分析的完整知识体系。接下来我们将进入深度教程部分，深入探讨网络分析、数据质检、多后端系统、执行引擎和数据模型。

下一章：网络分析步骤详解 →