GCS将CSV中的所有数据加载到一个列中

我使用API来获取一些数据，从中我提取2列加载到GCS桶:

response = requests.get(url,headers=headers)
response.encoding = "UTF-8"
reader = csv.reader(response.text.splitlines(), delimiter=';', quotechar='"')

tabResult = []
for r in reader:
try:
tabResult.append(f"{r[0]};{r[6]}")
except Exception:
print(r)
stringResult = "n".join(tabResult)
blob = bucket.blob("users.csv")
blob.upload_from_string(stringResult)

然后我想将这些数据导入到BigQuery表中，但问题是GCS认为我的csv文件中只有1列，所以里面的数据看起来像这样:

<表类>col1tbody><<tr>1; 2022-01-012; 2022-01-02

不要转换文件，只需使用自定义的field_delimiter!!

from google.cloud import bigquery
client = bigquery.Client()
job_config = bigquery.LoadJobConfig(
autodetect=True, 
source_format=bigquery.SourceFormat.CSV,
field_delimiter=";"
)
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.json"
load_job = client.load_table_from_uri(
uri, table_id, job_config=job_config
)  # Make an API request.
load_job.result()  # Waits for the job to complete.
destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))

我发现答案隐藏在文档

Schema auto-detection for CSV data
CSV delimiter
BigQuery detects the following delimiters:
comma ( , )
pipe ( | )
tab ( t )

我只需要替换

for r in reader:
try:
tabResult.append(f"{r[0]};{r[6]}")

for r in reader:
try:
tabResult.append(f"{r[0]},{r[6]}")

相关内容

最新更新

热门标签：