第二十章：最佳实践与综合案例

本章总结 GeoPipeAgent 的最佳实践，并通过一个完整的综合案例，展示框架在真实 GIS 项目中的完整应用。

20.1 流水线设计最佳实践

1. 变量化所有路径和关键参数

将文件路径、阈值参数等关键值抽取到 variables，使流水线可通过 --var 命令行参数复用：

# ✅ 推荐
variables:
  input_path: "data/input.shp"
  buffer_dist: 500
  output_crs: "EPSG:3857"

# ❌ 避免（硬编码路径，难以复用）
params:
  path: "data/input.shp"
  distance: 500

2. 在距离相关分析前转换坐标系

任何涉及距离的操作（缓冲、服务区、邻近分析）前，先转换为米制投影坐标系：

# ✅ 标准做法
- id: reproject
  use: vector.reproject
  params:
    input: "$load-data"
    target_crs: "EPSG:3857"    # 或当地最适合的投影系

- id: buffer
  use: vector.buffer
  params:
    input: "$reproject"
    distance: 500              # 现在是 500 米

3. 关键步骤前做 QC

数据入库、关键分析前，添加几何检查和属性检查，防止脏数据污染分析结果：

- id: check-geometry
  use: qc.geometry_validity
  params: { input: "$load-data" }

- id: fix-geometry
  use: qc.geometry_validity
  params: { input: "$load-data", auto_fix: true }
  when: "$check-geometry.issues_count > 0"

- id: analysis-step     # 在检查/修复后进行分析
  use: vector.buffer
  params:
    input: "$fix-geometry"
    distance: 500

4. 为不稳定步骤配置合适的 `on_error`

# 网络操作 → retry
- id: geocode
  use: network.geocode
  params: { addresses: [...] }
  on_error: retry

# 可选优化步骤 → skip
- id: simplify-optional
  use: vector.simplify
  params: { input: "$data", tolerance: 5 }
  on_error: skip

# 核心分析步骤 → fail（默认，快速发现问题）
- id: core-analysis
  use: vector.buffer
  params: { ... }
  on_error: fail

5. 在 `outputs` 声明关键结果

outputs 中的值会出现在 JSON 报告的顶层，便于 AI 解读和下游系统提取：

outputs:
  result_path: "$save-result"           # 输出文件路径
  feature_count: "$process.feature_count"  # 结果要素数量
  geometry_issues: "$check.issues_count"   # 质检问题数（0 = 无问题）
  crs: "$reproject.target_crs"         # 实际使用的坐标系

6. 步骤 ID 使用描述性命名

# ✅ 清晰的步骤 ID
- id: load-buildings           # 描述数据来源
- id: check-geometry-validity  # 描述操作
- id: reproject-to-cgcs2000   # 描述目标坐标系
- id: save-final-result        # 描述输出

# ❌ 避免无意义命名
- id: step1
- id: s2
- id: tmp

7. 使用 `geopipe-agent info` 先了解数据

# 在编写流水线之前，了解数据的基本信息
geopipe-agent info data/input.shp

# 确认：CRS、字段名、要素数、几何类型
# 这些信息直接影响流水线参数的选择

8. 执行前用 `validate` 校验

# 在正式执行前校验语法和引用
geopipe-agent validate pipeline.yaml

# 只有通过校验才执行
geopipe-agent run pipeline.yaml

20.2 YAML 编写规范

格式规范

# ✅ 规范格式
pipeline:
  name: "简洁的流水线名称"
  description: "详细说明分析目的、输入输出和注意事项"

  variables:
    input_path: "data/input.shp"    # 每个变量一行，有注释

  steps:
    # 步骤前加注释，说明目的
    - id: load-data
      use: io.read_vector
      params:
        path: "${input_path}"

    # 空行分隔不同阶段的步骤
    - id: reproject
      use: vector.reproject
      params:
        input: "$load-data"
        target_crs: "EPSG:3857"

引用格式建议

为了清晰，建议在重要步骤中使用明确的 .output 引用：

# 明确：让读者一眼看出传的是什么
input: "$load-data.output"

# 简写：更简洁，但需要理解 $step 等价于 $step.output
input: "$load-data"

两种格式都正确，建议在同一流水线中保持一致。

20.3 综合案例：城市新建住宅用地分析

分析场景

某城市规划局需要：

分析 2020-2024 年新增住宅用地位置和面积
检查新增用地的数据质量
计算新增用地距离主干路的缓冲区覆盖情况
统计各行政区新增用地面积
输出质检报告和分析结果

数据准备

data/
├── landuse_2020.shp      # 2020 年土地利用数据（EPSG:4326）
├── landuse_2024.shp      # 2024 年土地利用数据（EPSG:4326）
├── main_roads.shp        # 主干路网（EPSG:4326）
└── districts.shp         # 行政区划（EPSG:4326）

属性字段：

landuse: landuse_type（R=住宅, C=商业, 等）
roads: road_class（primary/secondary）
districts: dist_name, dist_code

完整流水线

pipeline:
  name: "城市新增住宅用地分析"
  description: >
    分析 2020-2024 年新增住宅用地：
    1. 识别新增住宅用地
    2. 数据质检
    3. 与主干路缓冲区叠加分析
    4. 按行政区统计面积
    5. 输出结果和质检报告
  crs: "EPSG:4326"

  variables:
    lu_2020: "data/landuse_2020.shp"
    lu_2024: "data/landuse_2024.shp"
    roads_path: "data/main_roads.shp"
    districts_path: "data/districts.shp"
    target_crs: "EPSG:4549"           # CGCS2000，中国米制坐标系
    road_buffer_dist: 500             # 道路缓冲距离（米）

  steps:
    # ============================
    # 阶段 1：数据加载
    # ============================
    - id: load-lu-2020
      use: io.read_vector
      params:
        path: "${lu_2020}"
        encoding: "utf-8"

    - id: load-lu-2024
      use: io.read_vector
      params:
        path: "${lu_2024}"
        encoding: "utf-8"

    - id: load-roads
      use: io.read_vector
      params:
        path: "${roads_path}"

    - id: load-districts
      use: io.read_vector
      params:
        path: "${districts_path}"

    # ============================
    # 阶段 2：数据质检（2020 年）
    # ============================
    - id: check-geo-2020
      use: qc.geometry_validity
      params:
        input: "$load-lu-2020"
        severity: "error"

    - id: fix-geo-2020
      use: qc.geometry_validity
      params:
        input: "$load-lu-2020"
        auto_fix: true
      when: "$check-geo-2020.issues_count > 0"
      on_error: skip

    - id: check-crs-2020
      use: qc.crs_check
      params:
        input: "$load-lu-2020"
        expected_crs: "EPSG:4326"
      on_error: skip

    # ============================
    # 阶段 3：数据质检（2024 年）
    # ============================
    - id: check-geo-2024
      use: qc.geometry_validity
      params:
        input: "$load-lu-2024"
        severity: "error"

    - id: fix-geo-2024
      use: qc.geometry_validity
      params:
        input: "$load-lu-2024"
        auto_fix: true
      when: "$check-geo-2024.issues_count > 0"
      on_error: skip

    # ============================
    # 阶段 4：提取住宅用地
    # ============================
    - id: filter-residential-2020
      use: vector.query
      params:
        input: "$load-lu-2020"
        expr: "landuse_type == 'R'"

    - id: filter-residential-2024
      use: vector.query
      params:
        input: "$load-lu-2024"
        expr: "landuse_type == 'R'"

    # ============================
    # 阶段 5：识别新增住宅用地（差集）
    # ============================
    # 统一坐标系
    - id: reproject-2020
      use: vector.reproject
      params:
        input: "$filter-residential-2020"
        target_crs: "${target_crs}"

    - id: reproject-2024
      use: vector.reproject
      params:
        input: "$filter-residential-2024"
        target_crs: "${target_crs}"

    # 新增 = 2024 住宅 - 2020 住宅
    - id: new-residential
      use: vector.overlay
      params:
        input: "$reproject-2024"
        overlay_layer: "$reproject-2020"
        how: "difference"

    # ============================
    # 阶段 6：道路缓冲区覆盖分析
    # ============================
    - id: reproject-roads
      use: vector.reproject
      params:
        input: "$load-roads"
        target_crs: "${target_crs}"

    - id: filter-main-roads
      use: vector.query
      params:
        input: "$reproject-roads"
        expr: "road_class in ['primary', 'secondary']"

    - id: buffer-main-roads
      use: vector.buffer
      params:
        input: "$filter-main-roads"
        distance: "${road_buffer_dist}"
        cap_style: "round"

    - id: dissolve-road-buffer
      use: vector.dissolve
      params:
        input: "$buffer-main-roads"

    # 新增住宅用地中，在道路缓冲区内的部分
    - id: new-residential-near-road
      use: vector.overlay
      params:
        input: "$new-residential"
        overlay_layer: "$dissolve-road-buffer"
        how: "intersection"

    # ============================
    # 阶段 7：按行政区统计
    # ============================
    - id: reproject-districts
      use: vector.reproject
      params:
        input: "$load-districts"
        target_crs: "${target_crs}"

    # 将行政区属性空间连接到新增用地（若有自定义步骤）
    # 此处用 overlay 代替
    - id: assign-district
      use: vector.overlay
      params:
        input: "$new-residential"
        overlay_layer: "$reproject-districts"
        how: "intersection"

    - id: dissolve-by-district
      use: vector.dissolve
      params:
        input: "$assign-district"
        by: "dist_code"
        agg:
          area: "sum"
          dist_name: "first"

    # ============================
    # 阶段 8：保存结果
    # ============================
    - id: save-new-residential
      use: io.write_vector
      params:
        input: "$new-residential"
        path: "output/new_residential_2024.geojson"
        format: "GeoJSON"

    - id: save-near-road
      use: io.write_vector
      params:
        input: "$new-residential-near-road"
        path: "output/new_residential_near_road.geojson"
        format: "GeoJSON"

    - id: save-district-stats
      use: io.write_vector
      params:
        input: "$dissolve-by-district"
        path: "output/district_new_residential.geojson"
        format: "GeoJSON"

    # 保存问题数据（仅有问题时）
    - id: save-geo-issues-2020
      use: io.write_vector
      params:
        input: "$check-geo-2020.issues_gdf"
        path: "output/qc_issues_2020.geojson"
        format: "GeoJSON"
      when: "$check-geo-2020.issues_count > 0"
      on_error: skip

    - id: save-geo-issues-2024
      use: io.write_vector
      params:
        input: "$check-geo-2024.issues_gdf"
        path: "output/qc_issues_2024.geojson"
        format: "GeoJSON"
      when: "$check-geo-2024.issues_count > 0"
      on_error: skip

  outputs:
    new_residential_path: "$save-new-residential"
    new_residential_count: "$new-residential.feature_count"
    near_road_count: "$new-residential-near-road.feature_count"
    district_stats_path: "$save-district-stats"
    geo_issues_2020: "$check-geo-2020.issues_count"
    geo_issues_2024: "$check-geo-2024.issues_count"

执行

# 校验
geopipe-agent validate residential-analysis.yaml

# 执行
geopipe-agent run residential-analysis.yaml

# 覆盖参数
geopipe-agent run residential-analysis.yaml \
    --var road_buffer_dist=300 \
    --var target_crs=EPSG:32650

20.4 调试技巧总结

技巧一：分步验证

将复杂流水线拆分为多个小流水线，逐步验证每个阶段的输出是否正确，再合并。

技巧二：添加中间保存步骤

在调试阶段，在关键步骤后添加 io.write_vector 保存中间结果，验证数据状态：

- id: debug-save-after-reproject
  use: io.write_vector
  params:
    input: "$reproject"
    path: "/tmp/debug_reproject.geojson"
  on_error: skip    # 调试步骤失败不影响主流程

技巧三：使用 `info` 命令检查中间文件

geopipe-agent run pipeline.yaml
geopipe-agent info /tmp/debug_reproject.geojson

技巧四：DEBUG 日志

geopipe-agent run pipeline.yaml --log-level DEBUG 2>&1 | head -100

20.5 本章小结

本章总结了 GeoPipeAgent 的最佳实践，并通过城市新增住宅用地分析综合案例演示了框架在真实项目中的完整应用：

关键最佳实践：

变量化路径和参数，提高流水线复用性
投影转换先行，确保距离单位正确
关键分析前添加 QC 步骤
用 on_error 区分关键和可选步骤
outputs 声明关键结果，便于 AI 和下游系统提取
描述性步骤 ID，提高可读性

综合案例要点：

阶段化设计（数据加载→质检→分析→统计→保存）
有条件保存质检报告（when + on_error: skip）
多后端、多坐标系、多步骤协同工作

导航：← 第十九章：Cookbook 示例精讲｜返回教程目录 →