使用Python上传大型csv文件到云存储



你好,我试图上传一个大的csv文件,但我得到以下错误:

https://connectionpool (host='storage.googleapis.com', port=443):最大重试次数超过了url:/upload/storage/v1/b/de-桶-my-stg/o?uploadType=resumable&upload_id=ADPycdsyu6gSlyfklixvDgL7RLpAQAg6REm9j1ICarKvmdif3tASOl9MaqjQIZ5dHWpTeWqs2HCsL4hoqfrtVQAH1WpfYrp4sFRn(由SSLError(SSLWantWriteError(3, 'The operation did not complete (write) (_ssl.c:2396)')引起)

有人能帮我一下吗?下面是我的代码:

import os
import pandas as pd
import io
import requests
from google.cloud import storage

try:
url = "https://cb-test-dataset.s3.ap-south-1.amazonaws.com/analytics/analytics.csv"
cont = requests.get(url).content
file_to_upload = pd.read_csv(io.StringIO(cont.decode('utf-8')))
except Exception as e:
print('Error getting file: ' +  str(e))

try:
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'C:/Users/haris/Desktop/de-project/xxx.json' --xxx is replaced here.
storage_client = storage.Client()
bucket_name = storage_client.get_bucket('de-bucket-my-stg')
blob = bucket_name.blob('analytics.csv')
blob.upload_from_string(file_to_upload.to_csv(),'text/csv')
except Exception as e:
print('Error uploading file: ' +  str(e))

如文档中所述,

我的建议是在发送之前对文件进行gzip压缩。文本文件具有较高的压缩率(可达100倍),并可摄取直接将gzip文件放入BigQuery中而不解压缩

上传到云存储最快的方法是使用组合API和组合对象

要了解更多信息,您可以参考stackoverflow线程,OP面临类似的错误。

相关内容

最新更新