第七章：IO 步骤——数据读写

7.1 概述

IO（输入/输出）步骤是 GeoPipeAgent 流水线的起点和终点，负责数据的读取和写入。IO 步骤是框架中唯一 不使用 Backend 的步骤类别——它们直接使用 GeoPandas（矢量）和 Rasterio（栅格）进行文件操作。

GeoPipeAgent 提供 4 个 IO 步骤：

Step ID	名称	说明
`io.read_vector`	读取矢量数据	从文件读取矢量数据到 GeoDataFrame
`io.write_vector`	写入矢量数据	将 GeoDataFrame 写入文件
`io.read_raster`	读取栅格数据	从文件读取栅格数据到字典
`io.write_raster`	写入栅格数据	将栅格数据写入文件

7.2 io.read_vector — 读取矢量数据

7.2.1 功能说明

从文件读取矢量数据，支持 Shapefile、GeoJSON、GeoPackage（GPKG）等 GDAL/OGR 支持的所有矢量格式，返回 GeoDataFrame。

7.2.2 参数

参数	类型	必需	默认值	说明
`path`	string	✅	—	矢量数据文件路径
`layer`	string	❌	—	图层名称（多图层文件时使用）
`encoding`	string	❌	`utf-8`	文件编码

7.2.3 输出

属性	类型	说明
`output`	GeoDataFrame	读取的矢量数据
`stats.feature_count`	int	要素数量
`stats.crs`	string	坐标参考系统
`stats.geometry_types`	list	几何类型列表
`stats.columns`	list	列名列表

7.2.4 使用示例

基本用法——读取 Shapefile：

- id: read
  use: io.read_vector
  params:
    path: "data/roads.shp"

读取 GeoJSON：

- id: read
  use: io.read_vector
  params:
    path: "data/buildings.geojson"

读取 GeoPackage 的指定图层：

- id: read
  use: io.read_vector
  params:
    path: "data/city.gpkg"
    layer: "buildings"

使用变量参数化：

pipeline:
  variables:
    input_path: "data/roads.shp"
  steps:
    - id: read
      use: io.read_vector
      params:
        path: "${input_path}"

7.2.5 支持的格式

格式	文件扩展名	说明
Shapefile	`.shp`	最常见的矢量格式
GeoJSON	`.geojson`, `.json`	Web 友好的格式
GeoPackage	`.gpkg`	OGC 标准格式
KML	`.kml`	Google Earth 格式
GML	`.gml`	OGC 标准交换格式
CSV	`.csv`	需包含坐标列
MapInfo	`.tab`	MapInfo 格式

7.2.6 实现原理

def io_read_vector(ctx: StepContext) -> StepResult:
    import geopandas as gpd

    path = ctx.param("path")
    layer = ctx.param("layer")
    encoding = ctx.param("encoding", "utf-8")

    kwargs = {}
    if layer:
        kwargs["layer"] = layer
    if encoding:
        kwargs["encoding"] = encoding

    gdf = gpd.read_file(path, **kwargs)

    stats = {
        "feature_count": len(gdf),
        "crs": str(gdf.crs) if gdf.crs else None,
        "geometry_types": list(gdf.geometry.geom_type.unique()),
        "columns": list(gdf.columns),
    }

    return StepResult(output=gdf, stats=stats)

7.3 io.write_vector — 写入矢量数据

7.3.1 功能说明

将 GeoDataFrame 写入文件。支持多种输出格式，自动创建输出目录。

7.3.2 参数

参数	类型	必需	默认值	说明
`input`	geodataframe	✅	—	输入矢量数据
`path`	string	✅	—	输出文件路径
`format`	string	❌	`GeoJSON`	输出格式
`encoding`	string	❌	`utf-8`	文件编码

7.3.3 输出

属性	类型	说明
`output`	string	输出文件路径
`stats.feature_count`	int	要素数量
`stats.output_path`	string	输出路径
`stats.format`	string	使用的驱动名称

7.3.4 格式名称映射

框架内置了常用格式名称的映射，不区分大小写：

用户输入	GDAL 驱动名称
`geojson`	`GeoJSON`
`shapefile` / `shp`	`ESRI Shapefile`
`gpkg` / `geopackage`	`GPKG`
其他值	直接作为驱动名称使用

7.3.5 使用示例

输出为 GeoJSON：

- id: save
  use: io.write_vector
  params:
    input: "$buffer.output"
    path: "output/result.geojson"
    format: "GeoJSON"

输出为 Shapefile：

- id: save
  use: io.write_vector
  params:
    input: "$buffer.output"
    path: "output/result.shp"
    format: "Shapefile"

输出为 GeoPackage：

- id: save
  use: io.write_vector
  params:
    input: "$buffer.output"
    path: "output/result.gpkg"
    format: "GPKG"

7.3.6 自动目录创建

io.write_vector 会自动创建输出路径中不存在的目录：

# output/subdir/ 会被自动创建
- id: save
  use: io.write_vector
  params:
    input: "$buffer.output"
    path: "output/subdir/result.geojson"

7.4 io.read_raster — 读取栅格数据

7.4.1 功能说明

从文件读取栅格数据（如 GeoTIFF），返回包含栅格数据数组和元信息的字典。

7.4.2 参数

参数	类型	必需	默认值	说明
`path`	string	✅	—	栅格数据文件路径

7.4.3 输出

属性	类型	说明
`output`	dict	栅格数据信息字典
`output.data`	numpy.ndarray	栅格数据数组（shape: bands×height×width）
`output.transform`	Affine	仿射变换矩阵
`output.crs`	CRS	坐标参考系统
`output.profile`	dict	rasterio 元信息
`output.path`	string	文件路径
`stats.width`	int	宽度（像素）
`stats.height`	int	高度（像素）
`stats.crs`	string	CRS 字符串
`stats.band_count`	int	波段数
`stats.dtype`	string	数据类型
`stats.bounds`	list	地理范围

7.4.4 使用示例

读取 GeoTIFF：

- id: read-dem
  use: io.read_raster
  params:
    path: "data/dem.tif"

读取卫星影像进行 NDVI 计算：

pipeline:
  name: "NDVI 计算"
  steps:
    - id: read-satellite
      use: io.read_raster
      params:
        path: "data/landsat.tif"
    - id: calc-ndvi
      use: raster.calc
      params:
        input: "$read-satellite.output"
        expression: "(B4-B3)/(B4+B3)"

7.4.5 栅格数据字典结构

io.read_raster 返回的 output 是一个字典，包含以下键：

{
    "data": numpy_array,       # shape: (bands, height, width)
    "transform": affine,       # 仿射变换
    "crs": crs_object,         # CRS 对象
    "profile": {               # rasterio 配置信息
        "driver": "GTiff",
        "dtype": "float32",
        "width": 1000,
        "height": 800,
        "count": 4,
        "crs": CRS.from_epsg(4326),
        "transform": Affine(...),
    },
    "path": "data/dem.tif",    # 原始文件路径
}

这个字典格式是框架内部栅格数据的标准传递格式，所有栅格步骤都使用这个格式。

7.5 io.write_raster — 写入栅格数据

7.5.1 功能说明

将栅格数据字典写入文件（通常为 GeoTIFF 格式）。

7.5.2 参数

参数	类型	必需	默认值	说明
`input`	raster_info	✅	—	输入栅格数据字典
`path`	string	✅	—	输出文件路径

7.5.3 输出

属性	类型	说明
`output`	string	输出文件路径
`stats.output_path`	string	输出路径
`stats.width`	int	宽度
`stats.height`	int	高度

7.5.4 使用示例

- id: save-raster
  use: io.write_raster
  params:
    input: "$calc-ndvi.output"
    path: "output/ndvi.tif"

7.6 IO 步骤的设计特点

7.6.1 不使用 Backend

IO 步骤直接使用底层库（GeoPandas 和 Rasterio），不经过 Backend 抽象层。这是因为：

文件读写操作比较标准化，不像空间分析有多种实现方式
GeoPandas 和 Rasterio 的文件 IO 能力已经足够强大
避免了 Backend 层的额外开销

7.6.2 矢量和栅格的数据传递约定

数据类型	Python 类型	传递方式
矢量数据	`GeoDataFrame`	直接传递对象引用
栅格数据	`dict`	包含 data、transform、crs、profile 的字典

7.6.3 编码处理

对于中文环境下常见的 Shapefile 编码问题，可以通过 encoding 参数指定：

- id: read
  use: io.read_vector
  params:
    path: "data/chinese_data.shp"
    encoding: "gbk"      # 或 "gb2312"

7.7 完整的 IO 流水线示例

7.7.1 矢量格式转换

pipeline:
  name: "Shapefile 转 GeoJSON"
  variables:
    input_path: "data/buildings.shp"
    output_path: "output/buildings.geojson"
  steps:
    - id: read
      use: io.read_vector
      params:
        path: "${input_path}"
        encoding: "utf-8"
    - id: write
      use: io.write_vector
      params:
        input: "$read.output"
        path: "${output_path}"
        format: "GeoJSON"
  outputs:
    result: "$write.output"
    feature_count: "$read.feature_count"

7.7.2 栅格读取与统计

pipeline:
  name: "栅格数据统计"
  steps:
    - id: read
      use: io.read_raster
      params:
        path: "data/elevation.tif"
    - id: stats
      use: raster.stats
      params:
        input: "$read.output"
  outputs:
    statistics: "$stats.stats"

7.7.3 多格式输出

pipeline:
  name: "多格式输出"
  steps:
    - id: read
      use: io.read_vector
      params:
        path: "data/roads.shp"
    - id: save-geojson
      use: io.write_vector
      params:
        input: "$read.output"
        path: "output/roads.geojson"
        format: "GeoJSON"
    - id: save-gpkg
      use: io.write_vector
      params:
        input: "$read.output"
        path: "output/roads.gpkg"
        format: "GPKG"
  outputs:
    geojson: "$save-geojson.output"
    gpkg: "$save-gpkg.output"