模糊参考字段StructField砖三角洲生活表



我已经设置了Auto Loader定期读取json文件并将它们存储在"bronze"使用Delta Live Tables在Databricks中调用fixture_raw表。这工作得很好,json数据存储在指定的表中,但是当我添加"银"表叫fixture_prepared并试图提取一些json元素从青铜表,我得到一个错误:

org.apache.spark.sql.AnalysisException: Ambiguous reference to fields StructField(id,LongType,true), StructField(id,LongType,true)

我怎样才能绕过这个?

Delta Live Table代码:

CREATE OR REFRESH STREAMING LIVE TABLE fixture_raw AS 
SELECT *, input_file_name() AS InputFile, now() AS LoadTime FROM cloud_files(
"/mnt/input/fixtures/", 
"json",
map(
"cloudFiles.inferColumnTypes", "true",
"cloudFiles.schemaLocation", "/mnt/dlt/schema/fixture",
"cloudFiles.schemaEvolutionMode", "addNewColumns"
)
);
CREATE OR REFRESH LIVE TABLE fixture_prepared AS
WITH FixtureData (
SELECT 
explode(response) AS FixtureJson
FROM live.fixture_raw
)
SELECT
FixtureJson.fixture.id AS FixtureID,
FixtureJson.fixture.date AS StartTime,
FixtureJson.fixture.venue.name AS Venue,
FixtureJson.teams.home.id AS HomeTeamID,
FixtureJson.teams.home.name AS HomeTeamName,
FixtureJson.teams.away.id AS AwayTeamID,
FixtureJson.teams.away.name AS AwayTeamName
FROM FixtureData;

Json数据:

{
"get": "fixtures",
"parameters": {
"league": "39",
"season": "2022"
},
"response": [
{
"fixture": {
"id": 867946,
"date": "2022-08-05T19:00:00+00:00",
"venue": {
"id": 525,
"name": "Selhurst Park"
}
},
"teams": {
"home": {
"id": 52,
"name": "Crystal Palace"
},
"away": {
"id": 42,
"name": "Arsenal"
}
}
},
{
"fixture": {
"id": 867947,
"date": "2022-08-06T11:30:00+00:00",
"venue": {
"id": 535,
"name": "Craven Cottage"
}
},
"teams": {
"home": {
"id": 36,
"name": "Fulham"
},
"away": {
"id": 40,
"name": "Liverpool"
}
}
}
]
}

指定数据帧的大小和调用数据帧是有区别的。请在加入前检查分配数据帧大小和调用数据帧。请仔细阅读官方文件。我在自己的环境中使用样例代码遵循了相同的场景。我添加了一个银色的桌子,它对我来说工作得很好,没有错误。遵循这个GitHub参考,它有详细的信息。

参考:

https://learn.microsoft.com/en-us/azure/databricks/data-engineering/delta-live-tables/delta-live-tables-quickstart sqlDelta Live Tables Demo: ETL处理的现代软件工程。

最新更新