IllegalArgumentException:此服务需要项目ID,但无法从生成器或环境中确定



我正在尝试将BigQuery数据集连接到Databrick并使用Pyspark运行Script。

我做过的程序:

  • 我将BigQuery Json API修补到dbfs中的databrick以进行连接访问。

  • 然后,我在集群库中添加了spark-bigquery-latest.jar,并运行了我的Script。

当我运行这个脚本时,我没有遇到任何错误。

from pyspark.sql import SparkSession
spark = (
SparkSession.builder
.appName('bq')
.master('local[4]')
.config('parentProject', 'google-project-ID')
.config('spark.jars', 'dbfs:/FileStore/jars/jarlocation.jar') 
.getOrCreate()
)
df = spark.read.format("bigquery").option("credentialsFile", "/dbfs/FileStore/tables/bigqueryapi.json") 
.option("parentProject", "google-project-ID") 
.option("project", "Dataset-Name") 
.option("table","dataset.schema.tablename") 
.load()
df.show()

但是,我没有调用该模式中的单个表,而是尝试使用以下查询调用它下的所有表:

from pyspark.sql import SparkSession
from google.cloud import bigquery
spark = (
SparkSession.builder
.appName('bq')
.master('local[4]')
.config('parentProject', 'google-project-ID')
.config('spark.jars', 'dbfs:/FileStore/jars/jarlocation.jar') 
.getOrCreate()
)
client = bigquery.Client()
table_list = 'dataset.schema'
tables = client.list_tables(table_list)
for table in tables:
tlist = tlist.append(table)
for i in tlist:
sql_query = """select * from `dataset.schema.' + i +'`"""
df = spark.read.format("bigquery").option("credentialsFile", "/dbfs/FileStore/tables/bigqueryapi.json") 
.option("parentProject", "google-project-ID") 
.option("project", "Dataset-Name") 
.option("query", sql_query).load()
df.show()

此脚本:

from pyspark.sql import SparkSession
spark = (
SparkSession.builder
.appName('bq')
.master('local[4]')
.config('parentProject', 'google-project-ID')
.config('spark.jars', 'dbfs:/FileStore/jars/jarlocation.jar') 
.getOrCreate()
)
sql_query = """select * from `dataset.schema.tablename`"""
df = spark.read.format("bigquery").option("credentialsFile", "/dbfs/FileStore/tables/bigqueryapi.json") 
.option("parentProject", "google-project-ID") 
.option("project", "Dataset-Name") 
.option("query", sql_query).load()
df.show()

我得到这个不寻常的错误:

IllegalArgumentException: A project ID is required for this service but could not be determined from the builder or the environment.  Please set a project ID using the builder.
---------------------------------------------------------------------------
IllegalArgumentException                  Traceback (most recent call last)
<command-131090852> in <module>
35   .option("parentProject", "google-project-ID") 
36   .option("project", "Dataset-Name") 
---> 37   .option("query", sql_query).load()
38 #df.show()
39 
/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, **options)
182             return self._df(self._jreader.load(self._spark._sc._jvm.PythonUtils.toSeq(path)))
183         else:
--> 184             return self._df(self._jreader.load())
185 
186     @since(1.4)
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in __call__(self, *args)
1303         answer = self.gateway_client.send_command(command)
1304         return_value = get_return_value(
-> 1305             answer, self.gateway_client, self.target_id, self.name)
1306 
1307         for temp_arg in temp_args:
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
131                 # Hide where the exception came from that shows a non-Pythonic
132                 # JVM exception message.
--> 133                 raise_from(converted)
134             else:
135                 raise
/databricks/spark/python/pyspark/sql/utils.py in raise_from(e)
IllegalArgumentException: A project ID is required for this service but could not be determined from the builder or the environment.  Please set a project ID using the builder.

当我将它作为表调用时,它确实识别出我的项目ID,但当我将其作为查询运行时,我会收到这个错误。

我试图弄清楚它,并通过许多网站寻找答案,但无法得到明确的答案。

非常感谢您的帮助。。。提前感谢。。。

您能避免使用查询而只使用表选项吗?

from pyspark.sql import SparkSession
from google.cloud import bigquery
spark = (
SparkSession.builder
.appName('bq')
.master('local[4]')
.config('parentProject', 'google-project-ID')
.config('spark.jars', 'dbfs:/FileStore/jars/jarlocation.jar') 
.getOrCreate()
)
client = bigquery.Client()
table_list = 'dataset.schema'
tables = client.list_tables(table_list)
for table in tables:
tlist = tlist.append(table)
for i in tlist:
df = spark.read.format("bigquery").option("credentialsFile", "/dbfs/FileStore/tables/bigqueryapi.json") 
.option("parentProject", "google-project-ID") 
.option("project", "Dataset-Name") 
.option("table","dataset.schema." + str(i)) 
.load()
df.show()

在我的案例中,我遇到了同样的异常,但因为我没有指定配置值parentProject,这是我连接到的BigQuery项目ID

最新更新