如何将 zip 文件(包含 shp)从 s3 存储桶加载到 Geopandas?



我压缩了name.shp,name.shx,name.dbf文件并将它们上传到AWS s3存储桶中。所以现在,我想加载这个zip文件并将包含的形状文件转换为geopandas的GeoDataFrame。

如果文件是压缩的 geojson 而不是压缩的形状文件,我可以完美地做到这一点。

import io
import boto3
import geopandas as gpd
import zipfile
cliente = boto3.client("s3", aws_access_key_id=ak, aws_secret_access_key=sk)
bucket_name = 'bucketname'
object_key = 'myfolder/locations.zip'
bytes_buffer = io.BytesIO()
cliente.download_fileobj(Bucket=bucket_name, Key=object_key, Fileobj=bytes_buffer)
geojson = bytes_buffer.getvalue()
with zipfile.ZipFile(bytes_buffer) as zi:
with zi.open("locations.shp") as file:
print(gpd.read_file(file.read().decode('ISO-8859-9')))

我收到此错误:

ç¤íEÀÁËÆ3À: 没有这样的文件或目录

基本上geopandas包允许直接从S3读取文件。如上面的答案所述,它也允许读取zip文件。因此,您可以在下面看到将从s3读取zip文件而无需下载的代码。您需要在开头输入zip+s3://,然后在 S3 中添加路径。

geopandas.read_file(f'zip+s3://bucket-name/file.zip')

可以直接读取zip,无需使用zipfile。你需要Shapefile的所有部分,而不仅仅是.shp本身。这就是为什么它适用于geojson。你只需要用zip:///传递它.所以而不是

gpd.read_file('path/file.shp')

你去

gpd.read_file('zip:///path/file.zip')

我对boto3不够熟悉,不知道你在什么时候真正有这条路,但我认为它会有所帮助。

我不知道它是否有任何帮助,但我最近遇到了类似的问题,尽管我只想用fiona阅读.shp。我最终像其他人一样在桶上压缩相关的shpdbfcpgshx

从桶中阅读,我确实喜欢:

from io import BytesIO
from pathlib import Path
from typing import List
from typing import Union
import boto3
from fiona.io import ZipMemoryFile
from pydantic import BaseSettings
from shapely.geometry import Point
from shapely.geometry import Polygon
import fiona
class S3Configuration(BaseSettings):
"""
S3 configuration class
"""
s3_access_key_id: str = ''
s3_secret_access_key: str = ''
s3_region_name: str = ''
s3_endpoint_url: str = ''
s3_bucket_name: str = ''
s3_use: bool = False
S3_CONF = S3Configuration()
S3_STR = 's3'
S3_SESSION = boto3.session.Session()
S3 = S3_SESSION.resource(
service_name=S3_STR,
aws_access_key_id=S3_CONF.s3_access_key_id,
aws_secret_access_key=S3_CONF.s3_secret_access_key,
endpoint_url=S3_CONF.s3_endpoint_url,
region_name=S3_CONF.s3_region_name,
use_ssl=True,
verify=True,
) 
BUCKET = S3_CONF.s3_bucket_name
CordexShape = Union[Polygon, List[Polygon], List[Point]]
ZIP_EXT = '.zip'

def get_shapefile_data(file_path: Path, s3_use: S3_CONF.s3_use) -> CordexShape:
"""
Retrieves the shapefile content associated to the passed file_path (either on disk or on S3).
file_path is a .shp file.
"""
if s3_use:
return load_zipped_shp(get_s3_object(file_path.with_suffix(ZIP_EXT)), file_path)
return load_shp(file_path)

def get_s3_object(file_path: Path) -> bytes:
"""
Retrieve as bytes the content associated to the passed file_path
"""
return S3.Object(bucket_name=BUCKET, key=forge_key(file_path)).get()['Body'].read()

def forge_key(file_path: Path) -> str:
"""
Edit this code at your convenience to forge the bucket key out of the passed file_path
"""
return str(file_path.relative_to(*file_path.parts[:2]))

def load_shp(file_path: Path) -> CordexShape:
"""
Retrieve a list of Polygons stored at file_path location
"""
with fiona.open(file_path) as shape:
parsed_shape = list(shape)
return parsed_shape

def load_zipped_shp(zipped_data: bytes, file_path: Path) -> CordexShape:
"""
Retrieve a list of Polygons stored at file_path location
"""
with ZipMemoryFile(BytesIO(zipped_data)) as zip_memory_file:
with zip_memory_file.open(file_path.name) as shape:
parsed_shape = list(shape)
return parsed_shape

代码相当多,但第一部分对于轻松为本地开发人员使用 minio 代理非常有帮助(只需更改 .env)。

对我来说解决问题的关键是使用没有那么充分记录的fiona(在我看来),但救生员(在我的情况下:))ZipMemoryFile

最新更新