我必须在本地笔记本电脑中使用python将BigQuery中的多个表中的数据分别提取到同一数据帧中



我已经能够一次从一个表中获取数据。只有表名中的日期更改,我必须提取300天的数据想问的是,是否可以在列表/数组中传递表名,然后运行循环并将数据存储到一个数据帧中。

一次一张表的代码

import os
import pandas as pd
from google.cloud import bigquery
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="C:/Users/GA/credentials.json"
client = bigquery.Client()
dflist = []
def query_data(QUERY,client,df):  
query_job = client.query(QUERY) 
query_result = query_job.result()
df = query_result.to_dataframe()
print(df.shape)
dflist.append(df)
return df,dflist
QUERY = (
"""
SELECT 
fullVisitorId,
visitNumber,
visitId,
visitStartTime,
date,
totals.bounces,
totals.hits,
totals.newVisits,
totals.pageviews,
totals.screenviews,
totals.sessionQualityDim,
totals.timeOnScreen,
totals.timeOnSite,
totals.totalTransactionRevenue,
totals.transactionRevenue,
totals.transactions
FROM 
bigquery-public-data:google_analytics_sample.ga_sessions_20170715
order by fullVisitorId, date, visitNumber, hitNumber
"""
)
df = pd.DataFrame()
datafr,datals = query_data(QUERY,client,df)
finaldf = pd.concat(datals,axis = 0)

当然,您可以在这里使用一个f字符串来将您的表名插入到查询中

import os
import pandas as pd
from google.cloud import bigquery
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="C:/Users/GA/credentials.json"
client = bigquery.Client()
dflist = []
QUERY = (
"""
SELECT 
fullVisitorId,
visitNumber,
visitId,
visitStartTime,
date,
totals.bounces,
totals.hits,
totals.newVisits,
totals.pageviews,
totals.screenviews,
totals.sessionQualityDim,
totals.timeOnScreen,
totals.timeOnSite,
totals.totalTransactionRevenue,
totals.transactionRevenue,
totals.transactions
FROM 
bigquery-public-data:google_analytics_sample.ga_sessions_{date}
order by fullVisitorId, date, visitNumber, hitNumber
"""
)
def query_data(date_:str) -> pd.DataFrame:  
query_job = client.query(QUERY.format(date=date_)) 
query_result = query_job.result()
return query_result.to_dataframe()
dates = ["2021", "2022", ...]
finaldf = pd.concat((query_data(date) for date in dates),axis = 0)

请注意,您不必显式地将客户端和查询传递给函数,因为它可以从周围的范围中使用它们。

最新更新