我需要构建一个小api,从谷歌云平台公共表返回数据。目前,我试图获得数据与python,但查询表需要永远,因为它有数百万行。是否有一种方法来查询表,而排除空等,并在2分钟内返回响应客户端?如果是,我应该怎么做呢?我现在的代码是:
import google.auth
from google.cloud import bigquery
from google.cloud import bigquery_storage
import pandas
import pandas_gbq
from google.oauth2 import service_account
credentials = service_account.Credentials.from_service_account_file(
**you need service account credentials**
)
sql = SELECT passenger_count, trip_distance, pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude
FROM `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2016`
WHERE pickup_longitude IS NOT NULL AND pickup_latitude IS NOT NULL
AND dropoff_longitude IS NOT NULL AND dropoff_latitude IS NOT NULL
AND passenger_count < 7 AND passenger_count > 0
AND pickup_longitude != 0.0 AND pickup_latitude != 0.0
AND dropoff_longitude != 0.0 AND dropoff_latitude != 0.0
和我使用pandas:
df = pandas_gbq.read_gbq(sql,project_id='disco-parsec-303220',credentials=credentials)
使用BigQuery Storage API更快地下载大型结果。
我在1分钟内得到了你的结果。
import pandas
sql ="SELECT passenger_count, trip_distance, pickup_longitude, pickup_latitude, dropoff_longitude, dropoff_latitude
FROM `bigquery-public-data.new_york_taxi_trips.tlc_yellow_trips_2016`
WHERE pickup_longitude IS NOT NULL
AND pickup_latitude IS NOT NULL
AND dropoff_longitude IS NOT NULL
AND dropoff_latitude IS NOT NULL
AND passenger_count < 7 AND passenger_count > 0
AND pickup_longitude != 0.0 AND pickup_latitude != 0.0
AND dropoff_longitude != 0.0 AND dropoff_latitude != 0.0"
df = pandas.read_gbq(sql, dialect='standard', use_bqstorage_api=True)
你应该能够得到更快的结果与google-cloud-bigquery
库和Apache箭头(参考)