如何通过Pandas在谷歌云功能中从谷歌云存储访问csv文件



我是云函数的新手,所以我遵循了默认的GCP云函数"你好世界";辅导的它工作得很好;你好世界";正如预期的那样。我只更改了requirements.txt文件,将panda和谷歌云存储包括在内。同样,我对main.py脚本的所有编辑都在函数定义之前的imports部分中,并且在函数的else部分中。

requirements.txt

pandas 
google-cloud-storage

main.py:

import pandas as pd
from google.cloud import storage   
def hello_world(request):
"""Responds to any HTTP request.
Args:
request (flask.Request): HTTP request object.
Returns:
The response text or any set of values that can be turned into a
Response object using
`make_response <http://flask.pocoo.org/docs/1.0/api/#flask.Flask.make_response>`.
"""
request_json = request.get_json()
if request.args and 'message' in request.args:
return request.args.get('message')
elif request_json and 'message' in request_json:
return request_json['message']
else:       
storage_client = storage.Client()
bucket = storage_client.bucket('my_bucket')
model_filename = "my_file.csv"
blob = bucket.blob(model_filename)
blob.download_to_filename('temp.csv')        
with open('temp.csv','rb') as f:
df = pd.read_csv(f)

return str(df.columns)

当我测试GCP的";测试云功能";区域,日志中会捕获以下错误。前7行似乎是样板错误,而后两行是针对我的实际程序的。CCD_ 1。我不知道为什么会触发这个错误。

错误:

Traceback (most recent call last): File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 2447, in wsgi_app response = self.full_dispatch_request() 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1952, in full_dispatch_request rv = self.handle_user_exception(e) 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1821, in handle_user_exception reraise(exc_type, exc_value, tb) 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/_compat.py", line 39, in reraise raise value 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1950, in full_dispatch_request rv = self.dispatch_request() 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/flask/app.py", line 1936, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/functions_framework/__init__.py", line 87, in view_func return function(request._get_current_object()) 
File "/workspace/main.py", line 25, in hello_world blob.download_to_filename('temp.csv') 
File "/layers/google.python.pip/pip/lib/python3.8/site-packages/google/cloud/storage/blob.py", line 1183, in download_to_filename with open(filename, "wb") as file_obj: OSError: [Errno 30] Read-only file system: 'temp.csv'

就上下文而言,我已经向适当的服务帐户添加了凭据,该云功能根据我设置的配置使用该凭据。所以,撇开授权不谈,我不知道为什么功能会失败我应该更改什么

对于上下文,我只是试图从panda中的云存储中打开一个任意的csv文件,并将列的名称作为字符串返回。这没有实际价值,只是在构建有价值的东西之前进行的功能测试。

第1版:根据我所知,赋予与有问题的云功能相对应的服务帐户的特定IAM角色是"角色/编辑器",这应该足够了。

第2版:GCP云函数似乎在只读环境中运行。因此,必须有其他方法来打开文件,而不使用blob.download_to_filename命令。

您是云函数的新手,有一些东西需要了解,也有一些陷阱需要避免。其中之一:云函数是无状态的,你不能在文件系统上写。

除了/tmp目录上。这是一个内存文件系统(根据您的应用程序内存占用量+存储在/tmp目录中的文件大小,正确调整您的云功能内存大小(

像一样更新您的云功能

....
else:       
storage_client = storage.Client()
bucket = storage_client.bucket('my_bucket')
model_filename = "my_file.csv"
blob = bucket.blob(model_filename)
blob.download_to_filename('/tmp/temp.csv')        
with open('/tmp/temp.csv','rb') as f:
df = pd.read_csv(f)

return str(df.columns)

最新更新