第20章：实战案例与最佳实践

本章通过多个完整的实战案例，综合运用前面所学的所有知识，并总结 GeoPipeAgent 流水线设计的最佳实践和项目集成建议。

20.1 概述

经过前 19 章的系统学习，您已经掌握了 GeoPipeAgent 的全部核心知识。本章将通过七个完整的实战案例，展示如何将这些知识综合运用到实际 GIS 分析场景中，并提供流水线设计、性能优化和项目集成方面的最佳实践指南。

20.1.1 本章案例总览

案例	场景	涉及步骤类别	难度
案例一	城市绿地缓冲区分析	IO + Vector	⭐
案例二	洪水风险区叠加分析	IO + Vector	⭐⭐
案例三	矢量数据全面质检	IO + QC	⭐⭐⭐
案例四	DEM 栅格处理与等值线提取	IO + Raster	⭐⭐
案例五	POI 空间聚类分析	IO + Vector + Analysis	⭐⭐⭐
案例六	路网最短路径分析	IO + Network	⭐⭐⭐
案例七	批量数据格式转换与简化	IO + Vector	⭐

20.2 案例一：城市绿地缓冲区分析

20.2.1 需求描述

城市规划部门需要分析城市绿地的服务覆盖范围。具体需求：

读取绿地矢量数据（Shapefile 格式）
将数据从 WGS84 投影转换到投影坐标系（确保距离计算准确）
对每个绿地生成 500 米缓冲区
将结果保存为 GeoJSON 格式

20.2.2 流水线设计

pipeline:
  name: "城市绿地缓冲区分析"
  description: "分析城市绿地的 500 米服务覆盖范围"

  variables:
    input_path: "data/green_spaces.shp"
    buffer_distance: 500
    target_crs: "EPSG:3857"
    output_path: "output/green_buffer_500m.geojson"

  steps:
    - id: load-green
      use: io.read_vector
      params:
        path: "${input_path}"

    - id: reproject
      use: vector.reproject
      params:
        input: "$load-green"
        target_crs: "${target_crs}"

    - id: buffer
      use: vector.buffer
      params:
        input: "$reproject"
        distance: "${buffer_distance}"
        cap_style: "round"

    - id: save-result
      use: io.write_vector
      params:
        input: "$buffer"
        path: "${output_path}"
        format: "GeoJSON"

  outputs:
    result: "$save-result"
    buffer_stats: "$buffer.stats"

20.2.3 流水线解析

关键设计要点：

变量抽取：将输入路径、缓冲距离、目标 CRS、输出路径都定义为变量，方便复用和命令行覆盖
投影转换：在缓冲区分析之前先转换到投影坐标系（EPSG:3857），确保距离单位为米
步骤引用：使用 $step_id 简写引用前一个步骤的输出
输出声明：声明 result 和 buffer_stats 两个输出，便于报告解读

执行命令：

# 使用默认参数
geopipe-agent run green_buffer.yaml

# 覆盖缓冲距离为 1000 米
geopipe-agent run green_buffer.yaml --var buffer_distance=1000

# 使用 DEBUG 日志查看详细执行过程
geopipe-agent run green_buffer.yaml --log-level DEBUG

20.2.4 执行报告示例

{
  "pipeline": "城市绿地缓冲区分析",
  "status": "success",
  "duration_seconds": 2.341,
  "steps": [
    {
      "id": "load-green",
      "step": "io.read_vector",
      "status": "success",
      "duration": 0.523,
      "output_summary": {
        "feature_count": 156,
        "crs": "EPSG:4326",
        "geometry_types": ["Polygon", "MultiPolygon"]
      }
    },
    {
      "id": "reproject",
      "step": "vector.reproject",
      "status": "success",
      "duration": 0.312,
      "output_summary": {
        "feature_count": 156,
        "source_crs": "EPSG:4326",
        "target_crs": "EPSG:3857"
      }
    },
    {
      "id": "buffer",
      "step": "vector.buffer",
      "status": "success",
      "duration": 0.876,
      "output_summary": {
        "feature_count": 156,
        "total_area": 245678901.5
      }
    },
    {
      "id": "save-result",
      "step": "io.write_vector",
      "status": "success",
      "duration": 0.630,
      "output_summary": {
        "output_path": "output/green_buffer_500m.geojson",
        "format": "GeoJSON"
      }
    }
  ],
  "outputs": {
    "result": "output/green_buffer_500m.geojson",
    "buffer_stats": {
      "feature_count": 156,
      "total_area": 245678901.5
    }
  }
}

20.3 案例二：洪水风险区叠加分析

20.3.1 需求描述

防灾部门需要分析洪水淹没区域内的土地利用情况：

读取土地利用数据和洪水淹没范围数据
对两个图层进行交集叠加分析
保存结果

20.3.2 流水线设计

pipeline:
  name: "洪水风险区叠加分析"
  description: "分析洪水淹没区域内的土地利用分布"

  steps:
    - id: load-landuse
      use: io.read_vector
      params:
        path: "data/landuse.shp"

    - id: load-flood
      use: io.read_vector
      params:
        path: "data/flood_zone.shp"

    - id: overlay
      use: vector.overlay
      params:
        input: "$load-landuse"
        overlay_layer: "$load-flood"
        how: "intersection"

    - id: save-result
      use: io.write_vector
      params:
        input: "$overlay"
        path: "output/landuse_in_flood_zone.geojson"
        format: "GeoJSON"

  outputs:
    result: "$save-result"
    overlay_stats: "$overlay.stats"

20.3.3 设计要点

多数据源读取：流水线可以同时读取多个数据源（load-landuse 和 load-flood）
叠加方式选择：how: "intersection" 表示求交集，GeoPipeAgent 支持 5 种叠加方式：
- intersection：交集
- union：并集
- difference：差集
- symmetric_difference：对称差
- identity：标识叠加

20.3.4 扩展：增加质检和统计

在基础流水线的基础上，可以增加数据质检和统计步骤：

pipeline:
  name: "洪水风险区叠加分析（增强版）"
  description: "含数据质检的洪水淹没区分析"

  steps:
    - id: load-landuse
      use: io.read_vector
      params:
        path: "data/landuse.shp"

    - id: load-flood
      use: io.read_vector
      params:
        path: "data/flood_zone.shp"

    - id: check-crs-landuse
      use: qc.crs_check
      params:
        input: "$load-landuse"
        expected_crs: "EPSG:4326"

    - id: check-crs-flood
      use: qc.crs_check
      params:
        input: "$load-flood"
        expected_crs: "EPSG:4326"

    - id: check-geom
      use: qc.geometry_validity
      params:
        input: "$load-landuse"
        auto_fix: true

    - id: overlay
      use: vector.overlay
      params:
        input: "$check-geom"
        overlay_layer: "$load-flood"
        how: "intersection"

    - id: save-result
      use: io.write_vector
      params:
        input: "$overlay"
        path: "output/landuse_in_flood_zone.geojson"

  outputs:
    result: "$save-result"

20.4 案例三：矢量数据全面质检

20.4.1 需求描述

数据管理部门需要对建筑物矢量数据进行全面质量检查，包括：

几何有效性（自相交、空几何等）
坐标参考系正确性
拓扑关系（无重叠）
属性完整性（必填字段不为空）
属性值域（类型字段只允许特定值）
数值范围（高度字段合理范围）
重复要素检测

20.4.2 流水线设计

pipeline:
  name: "矢量数据质检"
  description: "对建筑物矢量数据进行全面质量检查"

  variables:
    input_path: "data/buildings.shp"
    expected_crs: "EPSG:4326"

  steps:
    - id: load-data
      use: io.read_vector
      params:
        path: "${input_path}"

    - id: check-crs
      use: qc.crs_check
      params:
        input: "$load-data"
        expected_crs: "${expected_crs}"

    - id: check-geometry
      use: qc.geometry_validity
      params:
        input: "$load-data"
        auto_fix: false
        severity: "error"

    - id: check-topology
      use: qc.topology
      params:
        input: "$load-data"
        rules: ["no_overlaps"]
        tolerance: 0.001

    - id: check-attrs
      use: qc.attribute_completeness
      params:
        input: "$load-data"
        required_fields: ["name", "height", "type"]
        severity: "warning"

    - id: check-domain
      use: qc.attribute_domain
      params:
        input: "$load-data"
        field: "type"
        allowed_values: ["residential", "commercial", "industrial", "public"]

    - id: check-height
      use: qc.value_range
      params:
        input: "$load-data"
        field: "height"
        min: 0
        max: 1000
        severity: "warning"

    - id: check-duplicates
      use: qc.duplicate_check
      params:
        input: "$load-data"
        check_geometry: true
        check_fields: ["name"]

    - id: save-geometry-issues
      use: io.write_vector
      params:
        input: "$check-geometry.issues_gdf"
        path: "output/geometry_issues.geojson"
        format: "GeoJSON"
      when: "$check-geometry.issues_count > 0"

  outputs:
    data: "$load-data"
    geometry_issues: "$check-geometry.issues_gdf"
    attribute_issues: "$check-attrs.issues_gdf"

20.4.3 QC 流水线设计要点

Check and Passthrough 模式：所有 QC 步骤都会将输入数据原样传递到输出，同时收集问题到 issues 列表中
多规则并行检查：多个 QC 步骤可以检查同一份数据的不同方面
条件输出：when: "$check-geometry.issues_count > 0" 只在发现问题时才保存问题要素
分级严重性：不同检查使用不同的 severity（error / warning / info）

20.4.4 QC 报告解读

执行后 JSON 报告中会自动包含 qc_summary 汇总：

{
  "pipeline": "矢量数据质检",
  "status": "success",
  "qc_summary": {
    "total_issues": 23,
    "by_severity": {
      "error": 5,
      "warning": 18
    },
    "by_rule": {
      "geometry_validity": 3,
      "crs_check": 0,
      "topology": 2,
      "attribute_completeness": 8,
      "attribute_domain": 4,
      "value_range": 3,
      "duplicate_check": 3
    },
    "issues": [...]
  }
}

报告解读方法：

total_issues：问题总数
by_severity：按严重级别分类统计
by_rule：按检查规则分类统计
issues：所有问题的详细列表，每条包含 rule_id、severity、feature_index、message

20.5 案例四：DEM 栅格处理与等值线提取

20.5.1 需求描述

地形分析任务需要：

读取 DEM（数字高程模型）栅格数据
裁剪到研究区范围
计算坡度（利用栅格计算）
提取等高线
保存等高线为矢量文件

20.5.2 流水线设计

pipeline:
  name: "DEM 处理与等值线提取"
  description: "读取 DEM，裁剪到研究区，提取等高线"

  variables:
    dem_path: "data/dem.tif"
    boundary_path: "data/study_area.shp"
    contour_interval: 50

  steps:
    - id: load-dem
      use: io.read_raster
      params:
        path: "${dem_path}"

    - id: load-boundary
      use: io.read_vector
      params:
        path: "${boundary_path}"

    - id: clip-dem
      use: raster.clip
      params:
        input: "$load-dem"
        mask: "$load-boundary"
        crop: true

    - id: dem-stats
      use: raster.stats
      params:
        input: "$clip-dem"

    - id: extract-contours
      use: raster.contour
      params:
        input: "$clip-dem"
        interval: "${contour_interval}"

    - id: save-contours
      use: io.write_vector
      params:
        input: "$extract-contours"
        path: "output/contours_50m.geojson"
        format: "GeoJSON"

  outputs:
    contours: "$save-contours"
    dem_stats: "$dem-stats.stats"

20.5.3 栅格流水线设计要点

栅格数据模型：io.read_raster 返回包含 data、transform、crs、profile 的字典
混合数据类型：流水线中可以同时处理矢量和栅格数据（如用矢量边界裁剪栅格）
栅格到矢量转换：raster.contour 将栅格等高线转换为矢量 GeoDataFrame
统计步骤：raster.stats 可以获取 DEM 的高程统计信息（最小值、最大值、平均值、标准差）

20.5.4 NDVI 植被指数计算示例

pipeline:
  name: "NDVI 植被指数计算"
  description: "利用多光谱影像计算 NDVI"

  steps:
    - id: load-image
      use: io.read_raster
      params:
        path: "data/multispectral.tif"

    - id: calc-ndvi
      use: raster.calc
      params:
        input: "$load-image"
        expression: "(B4 - B3) / (B4 + B3)"
        output_dtype: "float32"

    - id: save-ndvi
      use: io.write_raster
      params:
        input: "$calc-ndvi"
        path: "output/ndvi.tif"

  outputs:
    result: "$save-ndvi"
    stats: "$calc-ndvi.stats"

raster.calc 表达式说明：

使用 B1、B2、B3、B4 等引用栅格的各个波段
支持基本数学运算：+、-、*、/、**
支持安全的 NumPy 函数：np.sqrt、np.log、np.where、np.clip 等
表达式通过 AST 白名单验证，防止任意代码执行

20.6 案例五：POI 空间聚类分析

20.6.1 需求描述

商业分析需要对城市 POI（兴趣点）数据进行空间聚类分析，找出 POI 密集区域：

读取 POI 数据
按类型过滤
使用 DBSCAN 算法进行空间聚类
生成热力图
保存结果

20.6.2 流水线设计

pipeline:
  name: "POI 空间聚类分析"
  description: "对餐饮类 POI 进行聚类分析和热力图生成"

  variables:
    input_path: "data/poi.shp"
    filter_type: "type == '餐饮'"
    cluster_eps: 500
    cluster_min_samples: 5

  steps:
    - id: load-poi
      use: io.read_vector
      params:
        path: "${input_path}"

    - id: reproject
      use: vector.reproject
      params:
        input: "$load-poi"
        target_crs: "EPSG:3857"

    - id: filter-type
      use: vector.query
      params:
        input: "$reproject"
        expression: "${filter_type}"

    - id: cluster
      use: analysis.cluster
      params:
        input: "$filter-type"
        method: "dbscan"
        eps: "${cluster_eps}"
        min_samples: "${cluster_min_samples}"

    - id: heatmap
      use: analysis.heatmap
      params:
        input: "$filter-type"
        resolution: 100
        radius: 1000

    - id: save-clusters
      use: io.write_vector
      params:
        input: "$cluster"
        path: "output/poi_clusters.geojson"

    - id: save-heatmap
      use: io.write_raster
      params:
        input: "$heatmap"
        path: "output/poi_heatmap.tif"

  outputs:
    clusters: "$save-clusters"
    heatmap: "$save-heatmap"
    cluster_stats: "$cluster.stats"

20.6.3 设计要点

投影转换：DBSCAN 的 eps 参数单位取决于 CRS，投影到 EPSG:3857 后 eps=500 表示 500 米
分支输出：同一个过滤结果（$filter-type）同时被聚类和热力图两个步骤引用
混合输出格式：聚类结果保存为矢量（GeoJSON），热力图保存为栅格（GeoTIFF）
analysis.cluster 输出：会在原数据基础上添加 cluster_id 列，标记每个要素所属的聚类

20.6.4 泰森多边形分析变体

    - id: voronoi
      use: analysis.voronoi
      params:
        input: "$filter-type"
        clip_to_bounds: true
        buffer: 1000

    - id: save-voronoi
      use: io.write_vector
      params:
        input: "$voronoi"
        path: "output/poi_voronoi.geojson"

20.7 案例六：路网最短路径分析

20.7.1 需求描述

物流规划需要基于路网数据计算两点之间的最短路径和服务区分析：

读取路网数据
计算起点到终点的最短路径
计算起点的服务区（等距圈）
保存结果

20.7.2 流水线设计

pipeline:
  name: "路网最短路径分析"
  description: "基于路网数据计算最短路径和服务区"

  variables:
    road_path: "data/roads.shp"
    origin_x: 116.397
    origin_y: 39.908
    dest_x: 116.472
    dest_y: 39.955

  steps:
    - id: load-roads
      use: io.read_vector
      params:
        path: "${road_path}"

    - id: shortest-path
      use: network.shortest_path
      params:
        input: "$load-roads"
        origin: [116.397, 39.908]
        destination: [116.472, 39.955]
        weight_field: "length"

    - id: service-area
      use: network.service_area
      params:
        input: "$load-roads"
        origin: [116.397, 39.908]
        cutoffs: [1000, 2000, 5000]
        weight_field: "length"

    - id: save-path
      use: io.write_vector
      params:
        input: "$shortest-path"
        path: "output/shortest_path.geojson"

    - id: save-service-area
      use: io.write_vector
      params:
        input: "$service-area"
        path: "output/service_areas.geojson"

  outputs:
    path: "$save-path"
    service_area: "$save-service-area"
    path_stats: "$shortest-path.stats"

20.7.3 网络分析要点

路网数据要求：输入矢量数据应为线要素（LineString / MultiLineString）
权重字段：weight_field 指定路段权重（长度、通行时间等），用于计算最短路径
NetworkX 图构建：框架自动将路网矢量数据转换为 NetworkX 图结构
服务区等级：cutoffs: [1000, 2000, 5000] 生成 1km、2km、5km 三个等距圈
依赖安装：网络分析步骤需要安装 [network] 可选依赖：pip install -e ".[network]"

20.7.4 地理编码示例

    - id: geocode
      use: network.geocode
      params:
        address: "北京市朝阳区建国门外大街1号"
        provider: "nominatim"
        exactly_one: true

20.8 案例七：批量数据格式转换与简化

20.8.1 需求描述

数据发布部门需要将内部使用的 Shapefile 数据转换为 Web 友好的 GeoJSON 格式：

读取 Shapefile 数据
投影转换到 WGS84
按条件筛选大面积地块
简化几何减小文件体积
输出为 GeoJSON

20.8.2 流水线设计

pipeline:
  name: "数据筛选、简化与转换"
  description: "Shapefile → 筛选 → 简化 → GeoJSON 转换流水线"

  variables:
    input_path: "data/parcels.shp"
    filter_expression: "area_sqm > 1000"
    simplify_tolerance: 1.0
    output_path: "output/parcels_web.geojson"

  steps:
    - id: read-data
      use: io.read_vector
      params:
        path: "${input_path}"

    - id: reproject
      use: vector.reproject
      params:
        input: "$read-data"
        target_crs: "EPSG:4326"

    - id: filter
      use: vector.query
      params:
        input: "$reproject"
        expression: "${filter_expression}"

    - id: simplify
      use: vector.simplify
      params:
        input: "$filter"
        tolerance: "${simplify_tolerance}"
        preserve_topology: true

    - id: save
      use: io.write_vector
      params:
        input: "$simplify"
        path: "${output_path}"
        format: "GeoJSON"

  outputs:
    result: "$save"
    filter_stats: "$filter.stats"

20.8.3 执行命令

# 执行转换
geopipe-agent run convert.yaml

# 覆盖输入输出路径
geopipe-agent run convert.yaml \
  --var input_path=data/roads.shp \
  --var output_path=output/roads_web.geojson

# 调整简化容差
geopipe-agent run convert.yaml --var simplify_tolerance=0.5

# 先验证流水线格式是否正确
geopipe-agent validate convert.yaml

20.8.4 融合分析变体

pipeline:
  name: "融合分析"
  description: "按类型字段融合面要素"

  steps:
    - id: read-data
      use: io.read_vector
      params:
        path: "data/landuse.shp"

    - id: dissolve
      use: vector.dissolve
      params:
        input: "$read-data"
        by: "landuse_type"
        aggfunc: "first"

    - id: save
      use: io.write_vector
      params:
        input: "$dissolve"
        path: "output/landuse_dissolved.geojson"

  outputs:
    result: "$save"
    stats: "$dissolve.stats"

20.9 流水线设计最佳实践

20.9.1 步骤 ID 命名规范

规则	说明	示例
使用小写字母	step_id 只允许 `[a-z0-9_-]`	`load-roads` ✅，`LoadRoads` ❌
语义化命名	名称应反映步骤功能	`buffer-500m` ✅，`step1` ❌
动词开头	以动作动词开头便于理解	`load-data`、`check-crs`、`save-result`
不使用点号	点号保留给步骤引用	`load-data` ✅，`load.data` ❌
使用连字符分隔	推荐连字符而非下划线	`load-roads` ✅

20.9.2 变量设计原则

# ✅ 推荐：将可能变化的参数抽取为变量
variables:
  input_path: "data/roads.shp"
  buffer_distance: 500
  output_format: "GeoJSON"

# ❌ 不推荐：硬编码所有参数
steps:
  - id: load
    use: io.read_vector
    params:
      path: "data/roads.shp"  # 硬编码，难以复用

变量设计指南：

输入输出路径应始终使用变量
关键分析参数（距离、容差、阈值）应使用变量
CRS 和格式等配置信息建议使用变量
变量命名使用下划线分隔的小写字母：input_path、buffer_dist

20.9.3 错误处理策略

steps:
  # 核心步骤：使用默认 fail 策略（失败则停止）
  - id: load-data
    use: io.read_vector
    params:
      path: "${input_path}"

  # 可选步骤：使用 skip 策略（失败不影响主流程）
  - id: optional-clip
    use: vector.clip
    params:
      input: "$load-data"
      clip_geometry: "$boundary"
    on_error: skip

  # 网络步骤：使用 retry 策略（自动重试最多 3 次）
  - id: geocode
    use: network.geocode
    params:
      address: "北京市天安门"
    on_error: retry

错误策略选择指南：

策略	适用场景
`fail`（默认）	核心步骤，失败应立即停止
`skip`	可选步骤，失败不影响主流程
`retry`	网络请求等可能临时失败的步骤

20.9.4 后端选择指南

后端	优点	缺点	适用场景
`native_python`	安装简单，纯 Python	大数据集较慢	日常分析，中小数据
`gdal_cli`	性能好，功能全面	需安装 GDAL CLI	大规模数据处理
`gdal_python`	性能好，API 丰富	需编译安装 GDAL	需要底层控制
`qgis_process`	算法最丰富	需安装 QGIS	复杂空间分析
`pyqgis`	完整 QGIS API	需 QGIS Python	QGIS 插件集成

# 在步骤中指定特定后端
- id: buffer
  use: vector.buffer
  params:
    input: "$data"
    distance: 500
  backend: gdal_cli    # 使用 GDAL CLI 后端处理大数据

20.9.5 性能优化建议

提前过滤数据：在分析步骤之前使用 vector.query 减少数据量

steps:
  - id: load
    use: io.read_vector
    params:
      path: "data/large_dataset.shp"

  # 先过滤再分析，减少后续步骤的数据量
  - id: filter
    use: vector.query
    params:
      input: "$load"
      expression: "status == 'active'"

  - id: buffer
    use: vector.buffer
    params:
      input: "$filter"       # 使用过滤后的小数据集
      distance: 500

选择合适的后端：大数据集优先使用 gdal_cli 或 gdal_python
避免不必要的投影转换：如果数据已在正确的 CRS 下，跳过 reproject 步骤
使用条件执行：when 条件避免执行不需要的步骤

  - id: fix-geometry
    use: qc.geometry_validity
    params:
      input: "$data"
      auto_fix: true
    when: "$check.issues_count > 0"  # 只在有问题时修复

20.10 项目集成建议

20.10.1 CI/CD 集成

GeoPipeAgent 可以轻松集成到 CI/CD 流程中进行自动化 GIS 数据处理：

# GitHub Actions / GitLab CI 示例
pip install -e ".[analysis,network]"

# 1. 验证流水线格式
geopipe-agent validate pipelines/data-check.yaml

# 2. 执行数据质检
geopipe-agent run pipelines/data-check.yaml \
  --json-log \
  --var input_path=$DATA_PATH \
  > report.json

# 3. 检查质检结果
python -c "
import json, sys
report = json.load(open('report.json'))
if report.get('qc_summary', {}).get('total_issues', 0) > 0:
    print('Quality check failed!')
    sys.exit(1)
print('Quality check passed!')
"

20.10.2 与 AI 助手集成

利用 GeoPipeAgent 的 AI Skill 生成功能，可以实现自然语言驱动的 GIS 分析：

# 1. 生成 Skill 文件
geopipe-agent generate-skill --output-dir skills/geopipe-agent

# 2. 将 Skill 文件提供给 AI（如 ChatGPT、Claude）
# 3. 用自然语言描述需求：
#    "请对 data/roads.shp 做 500 米缓冲区分析，结果保存为 GeoJSON"
# 4. AI 生成 YAML 流水线
# 5. 执行 AI 生成的流水线
geopipe-agent run ai_generated_pipeline.yaml

20.10.3 与其他工具配合

工具	配合方式
QGIS	使用 qgis_process / pyqgis 后端，结合 QGIS 的丰富算法库
PostGIS	通过 io.read_vector 读取 PostGIS 数据库（GeoDataFrame 支持）
Jupyter Notebook	在 Notebook 中调用 GeoPipeAgent API 进行交互式分析
Web API	将 GeoPipeAgent 封装为 REST API，提供在线分析服务
Docker	构建包含所有依赖的 Docker 镜像，确保环境一致性

20.10.4 大规模数据处理建议

对于大规模数据集，建议采取以下策略：

分块处理：将大数据拆分为多个小块，分别运行流水线
后端选择：优先使用 gdal_cli 或 gdal_python 后端
提前过滤：在读取后立即过滤，减少内存占用
输出格式：大数据推荐 GPKG 格式（支持索引、事务）

20.11 学习资源与社区

20.11.1 GeoPipeAgent 资源

资源	链接
GitHub 仓库	https://github.com/znlgis/GeoPipeAgent
Cookbook 示例	仓库 `cookbook/` 目录下的 7 个示例流水线
Skill 文件	通过 `geopipe-agent generate-skill` 生成的参考文档

20.11.2 相关技术栈学习路径

Python GIS 生态：

库	用途	文档
GeoPandas	矢量数据处理	geopandas.org
Shapely	几何运算	shapely.readthedocs.io
Fiona	矢量数据 I/O	fiona.readthedocs.io
Rasterio	栅格数据处理	rasterio.readthedocs.io
NetworkX	图论与网络分析	networkx.org
SciPy	科学计算（Voronoi、插值）	scipy.org
scikit-learn	机器学习（聚类）	scikit-learn.org

GIS 基础知识：

坐标参考系（CRS）：理解 EPSG 编码、地理坐标系与投影坐标系的区别
矢量数据模型：点（Point）、线（LineString）、面（Polygon）及其多部分变体
栅格数据模型：像素、波段、分辨率、NoData 值
空间分析方法：缓冲区、叠加、插值、网络分析等

20.11.3 推荐学习顺序

1. Python 基础 → 2. GIS 基础概念
       ↓                ↓
3. GeoPandas/Shapely → 4. Rasterio
       ↓                ↓
5. GeoPipeAgent 基础（第 1-7 章）
       ↓
6. GeoPipeAgent 进阶（第 8-15 章）
       ↓
7. GeoPipeAgent 高级（第 16-20 章）
       ↓
8. 实际项目实践

20.12 本章小结

本章通过七个完整的实战案例，展示了 GeoPipeAgent 在不同 GIS 分析场景中的应用：

城市绿地缓冲区分析：IO + Vector 基础流水线
洪水风险区叠加分析：多数据源叠加分析
矢量数据全面质检：QC 步骤组合的完整质检流程
DEM 栅格处理与等值线：栅格数据处理和格式转换
POI 空间聚类分析：高级空间分析（DBSCAN + 热力图）
路网最短路径分析：网络分析（最短路径 + 服务区）
批量数据格式转换：数据转换和简化流水线

此外，本章总结了以下最佳实践：

步骤 ID 命名：小写字母 + 连字符，语义化命名
变量设计：将可变参数抽取为变量，支持命令行覆盖
错误处理：根据步骤重要性选择 fail / skip / retry
后端选择：根据数据规模和功能需求选择合适后端
性能优化：提前过滤、选择高效后端、条件执行
项目集成：CI/CD、AI 助手、Web API、Docker 等集成方式

恭喜您完成了 GeoPipeAgent 教程的全部 20 章学习！您现在已经掌握了从基础到高级的所有知识，可以利用 GeoPipeAgent 构建强大的 AI 驱动 GIS 分析流水线。祝您在实际项目中取得成功！