第八章：矢量分析步骤详解

8.1 概述

矢量分析步骤是 GeoPipeAgent 中最常用的步骤类别，涵盖了 GIS 中常见的空间分析操作。所有矢量步骤都通过 Backend 执行，支持 gdal_python、gdal_cli 和 qgis_process 三种后端。

GeoPipeAgent 提供 7 个矢量分析步骤：

Step ID	名称	说明
`vector.buffer`	缓冲区分析	生成几何缓冲区
`vector.clip`	矢量裁剪	用裁剪范围裁剪数据
`vector.reproject`	投影转换	CRS 坐标系转换
`vector.dissolve`	融合	按字段融合要素
`vector.simplify`	几何简化	Douglas-Peucker 简化
`vector.query`	属性查询	按属性条件筛选
`vector.overlay`	叠加分析	两图层叠加运算

8.2 vector.buffer — 缓冲区分析

8.2.1 功能说明

对输入的矢量数据中每个要素生成指定距离的缓冲区（Buffer）。缓冲区分析是 GIS 中最基本的空间分析操作之一，常用于：

道路噪音影响范围分析
水源保护区划定
设施服务范围分析

8.2.2 参数

参数	类型	必需	默认值	说明
`input`	geodataframe	✅	—	输入矢量数据
`distance`	number	✅	—	缓冲区距离（单位取决于 CRS）
`cap_style`	string	❌	`round`	端点样式：round/flat/square

8.2.3 输出

属性	说明
`output`	缓冲区结果 GeoDataFrame
`stats.feature_count`	要素数量
`stats.total_area`	缓冲区总面积

8.2.4 使用示例

基本缓冲区：

- id: buffer
  use: vector.buffer
  params:
    input: "$read.output"
    distance: 500

指定端点样式：

- id: buffer
  use: vector.buffer
  params:
    input: "$read.output"
    distance: 100
    cap_style: "flat"       # 平头端点（适合道路缓冲区）

8.2.5 注意事项

距离单位取决于输入数据的 CRS。如果 CRS 是地理坐标系（如 EPSG:4326），距离单位是度；如果是投影坐标系（如 EPSG:3857），距离单位是米。建议在缓冲区分析前先进行投影转换。
cap_style 的三种选项：
- round：圆形端点（默认，最常用）
- flat：平头端点（线要素的端点被截断）
- square：方形端点

8.2.6 典型工作流

pipeline:
  name: "道路缓冲区分析"
  steps:
    - id: read
      use: io.read_vector
      params:
        path: "data/roads.shp"
    - id: reproject
      use: vector.reproject
      params:
        input: "$read.output"
        target_crs: "EPSG:3857"    # 投影到米单位的坐标系
    - id: buffer
      use: vector.buffer
      params:
        input: "$reproject.output"
        distance: 500               # 500 米
    - id: save
      use: io.write_vector
      params:
        input: "$buffer.output"
        path: "output/road_buffer.geojson"

8.3 vector.clip — 矢量裁剪

8.3.1 功能说明

使用裁剪范围（clip geometry）裁剪输入数据。只保留落在裁剪范围内的部分。

8.3.2 参数

参数	类型	必需	默认值	说明
`input`	geodataframe	✅	—	待裁剪的矢量数据
`clip_layer`	geodataframe	✅	—	裁剪范围数据

8.3.3 使用示例

pipeline:
  name: "矢量裁剪"
  steps:
    - id: read-data
      use: io.read_vector
      params:
        path: "data/roads.shp"
    - id: read-boundary
      use: io.read_vector
      params:
        path: "data/city_boundary.shp"
    - id: clip
      use: vector.clip
      params:
        input: "$read-data.output"
        clip_layer: "$read-boundary.output"
    - id: save
      use: io.write_vector
      params:
        input: "$clip.output"
        path: "output/clipped_roads.geojson"

8.4 vector.reproject — 投影转换

8.4.1 功能说明

将矢量数据从一个坐标参考系统（CRS）转换到另一个。投影转换是 GIS 分析中非常重要的前处理步骤。

8.4.2 参数

参数	类型	必需	默认值	说明
`input`	geodataframe	✅	—	输入矢量数据
`target_crs`	string	✅	—	目标 CRS（如 `EPSG:4326`）

8.4.3 使用示例

转为 Web Mercator：

- id: reproject
  use: vector.reproject
  params:
    input: "$read.output"
    target_crs: "EPSG:3857"

转为 WGS84：

- id: reproject
  use: vector.reproject
  params:
    input: "$read.output"
    target_crs: "EPSG:4326"

8.4.4 常用 CRS

EPSG 代码	名称	说明
`EPSG:4326`	WGS 84	GPS 坐标系，全球通用
`EPSG:3857`	Web Mercator	Web 地图标准投影
`EPSG:4490`	CGCS2000	中国大地坐标系
`EPSG:4547`	CGCS2000 / 3-degree Zone 39	中国 3° 带投影
`EPSG:32650`	WGS 84 / UTM zone 50N	UTM 50 北带

8.5 vector.dissolve — 融合

8.5.1 功能说明

将要素按指定字段进行融合（Dissolve），相同属性值的要素合并为一个。可以指定聚合函数处理其他属性。

8.5.2 参数

参数	类型	必需	默认值	说明
`input`	geodataframe	✅	—	输入矢量数据
`by`	string	❌	—	融合字段名（不指定则全部融合为一个要素）
`aggfunc`	string	❌	`first`	聚合函数（first/sum/mean/count 等）

8.5.3 使用示例

按用地类型融合：

- id: dissolve
  use: vector.dissolve
  params:
    input: "$read.output"
    by: "landuse_type"
    aggfunc: "first"

全部融合为单个要素：

- id: dissolve
  use: vector.dissolve
  params:
    input: "$read.output"

8.6 vector.simplify — 几何简化

8.6.1 功能说明

使用 Douglas-Peucker 算法简化几何形状，减少顶点数量，适用于需要减小数据量的场景。

8.6.2 参数

参数	类型	必需	默认值	说明
`input`	geodataframe	✅	—	输入矢量数据
`tolerance`	number	✅	—	简化容差（单位取决于 CRS）
`preserve_topology`	boolean	❌	`true`	是否保持拓扑关系

8.6.3 使用示例

- id: simplify
  use: vector.simplify
  params:
    input: "$read.output"
    tolerance: 0.001         # 约 100 米（EPSG:4326 下）
    preserve_topology: true

8.6.4 容差选择指南

CRS 类型	容差值示例	效果
地理坐标系（度）	0.0001	约 10 米精度
地理坐标系（度）	0.001	约 100 米精度
投影坐标系（米）	1.0	1 米精度
投影坐标系（米）	10.0	10 米精度

8.7 vector.query — 属性查询

8.7.1 功能说明

使用 Pandas query 表达式筛选满足条件的要素。

8.7.2 参数

参数	类型	必需	默认值	说明
`input`	geodataframe	✅	—	输入矢量数据
`expression`	string	✅	—	Pandas query 表达式

8.7.3 使用示例

筛选面积大于 1000 的地块：

- id: filter
  use: vector.query
  params:
    input: "$read.output"
    expression: "area_sqm > 1000"

多条件筛选：

- id: filter
  use: vector.query
  params:
    input: "$read.output"
    expression: "type == 'residential' and area > 500"

使用变量：

pipeline:
  variables:
    min_area: 1000
    filter_expression: "area_sqm > 1000"
  steps:
    - id: filter
      use: vector.query
      params:
        input: "$read.output"
        expression: "${filter_expression}"

8.7.4 表达式语法

vector.query 使用 Pandas DataFrame 的 query() 方法，支持以下语法：

操作	示例
比较	`area > 1000`
等于	`type == 'residential'`
逻辑与	`area > 100 and type == 'park'`
逻辑或	`type == 'road' or type == 'highway'`
包含	`type in ['park', 'garden']`
字符串方法	`name.str.contains('road')`

8.8 vector.overlay — 叠加分析

8.8.1 功能说明

对两个矢量图层进行叠加分析，支持交集、并集、差集和对称差集四种运算。

8.8.2 参数

参数	类型	必需	默认值	说明
`input`	geodataframe	✅	—	输入图层
`overlay_layer`	geodataframe	✅	—	叠加图层
`how`	string	❌	`intersection`	叠加方式

8.8.3 叠加方式

how 值	名称	说明
`intersection`	交集	两个图层重叠的部分
`union`	并集	两个图层的合集
`difference`	差集	在输入图层中但不在叠加图层中的部分
`symmetric_difference`	对称差集	两个图层不重叠的部分

8.8.4 使用示例

求两图层交集：

pipeline:
  name: "叠加分析"
  steps:
    - id: load-landuse
      use: io.read_vector
      params:
        path: "data/landuse.shp"
    - id: load-flood
      use: io.read_vector
      params:
        path: "data/flood_zone.shp"
    - id: overlay
      use: vector.overlay
      params:
        input: "$load-landuse.output"
        overlay_layer: "$load-flood.output"
        how: "intersection"
    - id: save
      use: io.write_vector
      params:
        input: "$overlay.output"
        path: "output/landuse_in_flood.geojson"

求差集：

- id: difference
  use: vector.overlay
  params:
    input: "$parks.output"
    overlay_layer: "$buildings.output"
    how: "difference"

8.9 矢量步骤与 Backend 的关系

所有矢量步骤都通过 Backend 执行实际的 GIS 操作。步骤函数负责参数获取和结果封装，具体的空间计算委托给 Backend：

# 步骤函数模式
def vector_buffer(ctx: StepContext) -> StepResult:
    gdf = ctx.input("input")          # 1. 获取参数
    distance = ctx.param("distance")

    result = ctx.backend.buffer(gdf, distance)  # 2. 委托给 Backend

    return StepResult(output=result, stats={...})  # 3. 封装结果

这意味着用户可以通过指定不同的 Backend 来改变步骤的内部实现：

# 使用默认后端（gdal_python）
- id: buffer
  use: vector.buffer
  params:
    input: "$read.output"
    distance: 500

# 使用 GDAL CLI 后端（适合大文件）
- id: buffer
  use: vector.buffer
  params:
    input: "$read.output"
    distance: 500
  backend: gdal_cli

# 使用 QGIS 后端
- id: buffer
  use: vector.buffer
  params:
    input: "$read.output"
    distance: 500
  backend: qgis_process

8.10 综合案例：完整矢量分析流水线

pipeline:
  name: "城市公园可达性分析"
  description: "分析城市公园的500米服务范围覆盖情况"
  variables:
    parks_path: "data/parks.shp"
    buildings_path: "data/buildings.shp"
    buffer_distance: 500

  steps:
    # 读取数据
    - id: read-parks
      use: io.read_vector
      params:
        path: "${parks_path}"

    - id: read-buildings
      use: io.read_vector
      params:
        path: "${buildings_path}"

    # 投影转换
    - id: reproject-parks
      use: vector.reproject
      params:
        input: "$read-parks.output"
        target_crs: "EPSG:3857"

    - id: reproject-buildings
      use: vector.reproject
      params:
        input: "$read-buildings.output"
        target_crs: "EPSG:3857"

    # 生成公园缓冲区
    - id: park-buffer
      use: vector.buffer
      params:
        input: "$reproject-parks.output"
        distance: "${buffer_distance}"

    # 融合缓冲区
    - id: dissolve-buffer
      use: vector.dissolve
      params:
        input: "$park-buffer.output"

    # 查找在缓冲区内的建筑
    - id: covered-buildings
      use: vector.overlay
      params:
        input: "$reproject-buildings.output"
        overlay_layer: "$dissolve-buffer.output"
        how: "intersection"

    # 保存结果
    - id: save
      use: io.write_vector
      params:
        input: "$covered-buildings.output"
        path: "output/park_coverage.geojson"

  outputs:
    result: "$save.output"
    total_parks: "$read-parks.feature_count"
    covered_buildings: "$covered-buildings.feature_count"
    total_buildings: "$read-buildings.feature_count"