使用datform增量更新表时出错



我在BigQuery中有一个表,每当新数据进入另一个表时,我想每天更新这个表。现在,我还需要对原始数据框中嵌套列中的变量进行一些类型转换。因此,我做了下面的事情:


config {
type: "incremental",
bigquery: {
partitionBy: "DATE(ts)",
clusterBy: ["IN","SN", "CB","B"]
}
}

SELECT distinct 
timestamp as ts,
storeName,
DATE(timestamp) as Date,
itemId,
runId,
source,
storeUuid,
CASE WHEN (P.value IS NOT NULL AND SAFE_CAST(REPLACE(P.value, ",", ".") AS FLOAT64) IS NOT NULL)
THEN CAST(REPLACE(P.value, ",", ".") AS FLOAT64)/100
ELSE NULL END as P,
CASE WHEN (BUP.value IS NOT NULL AND SAFE_CAST(REPLACE(BUP.value, ",", ".") AS FLOAT64) IS NOT NULL)
THEN CAST(REPLACE(BUP.value, ",", ".") AS FLOAT64)/100
ELSE NULL END as BUP,
CASE WHEN (NP.value IS NOT NULL AND SAFE_CAST(REPLACE(NP.value, ",", ".") AS FLOAT64) IS NOT NULL)
THEN CAST(REGEXP_REPLACE(NP.value, ",", ".") AS FLOAT64)/100
ELSE NULL END as NP,
CASE WHEN (Quantity.value IS NOT NULL AND SAFE_CAST(REPLACE(Quantity.value, ",", ".") AS INT64) IS NOT NULL)
THEN CAST(REPLACE(Quantity.value, ",", ".") AS INT64)
ELSE NULL END as Quantity,
CASE WHEN (CQ.value IS NOT NULL AND SAFE_CAST(REPLACE(CQ.value, ",", ".") AS INT64) IS NOT NULL)
THEN CAST(REGEXP_REPLACE(CQ.value, ",", ".") AS INT64)
ELSE NULL END as CQ, 

CAST(AverageSalesOverAWeek.value AS FLOAT64) AS AverageSalesOverAWeek,
CAST(CentralMainBarcode.value AS STRING) AS CentralMainBarcode,
CAST(MainBarcode.value AS STRING) AS MainBarcode,
CASE WHEN CAST(MB.value AS STRING) IS NULL
THEN CAST(CMB.value AS STRING)
ELSE CAST(MB.value AS STRING) END as B,
itemName,
itemGroup.value as itemGroup,
CAST(PT.value as String) as  PT,
department.value as Department,

UC.value as UC,
KP.value as IKP,
NOCD.value as NOCD,
NPCEPEON.value as NPCEPEON ,
CCP.value as CCP,
LDWA.value as LDWA,
IOOS.value as IOOS,
CASE
WHEN ID.value = 'true' THEN 1
WHEN ID.value = 'false' THEN 0
ELSE NULL
END as ID,
CASE
WHEN DT.value = 'true' THEN 1
WHEN DT.value = 'false' THEN 0
ELSE NULL
END as DT,
CASE
WHEN CID.value = 'true' THEN 1
WHEN CID.value = 'false' THEN 0
ELSE NULL
END as CID,
CASE
WHEN HP.value = 'true' THEN 1
WHEN HP.value = 'false' THEN 0
ELSE NULL
END as HP,
CAST(presentation.value AS STRING) AS presentation,
CASE WHEN presentation.value = 'promo' THEN 1 ELSE 0 END as PF,
CAST(InStock.value AS INT64) AS InStock,
CAST(Brand.value AS STRING) AS Brand,
CAST(CB.value AS STRING) AS CentralBrand,
CPD.value as CPD,
CPPD.value as CPPD,
..........
5 more like these 
FROM
`table`
LEFT OUTER JOIN UNNEST(itemProperties) AS ReceiptProductDescription ON ReceiptProductDescription.name = 'ReceiptProductDescription'
LEFT OUTER JOIN UNNEST(itemProperties) AS GD ON GD.name = 'GD'
LEFT OUTER JOIN UNNEST(itemProperties) AS MB ON MD.name = 'MB'
LEFT OUTER JOIN UNNEST(itemProperties) AS B ON B.name = 'B'
LEFT OUTER JOIN UNNEST(itemProperties) AS CB ON CB.name = 'CB'
LEFT OUTER JOIN UNNEST(itemProperties) AS IG ON IG.name = 'IG'
LEFT OUTER JOIN UNNEST(itemProperties) AS PT ON PT.name = 'PT'
LEFT OUTER JOIN UNNEST(itemProperties) AS CMB ON CMB .name = 'CMB '
LEFT OUTER JOIN UNNEST(itemProperties) AS CPD ON CPD.name = 'CPD '
LEFT OUTER JOIN UNNEST(itemProperties) AS CPPD ON CPPD .name = 'CPPD '
LEFT OUTER JOIN UNNEST(itemProperties) AS IS ON IS.name = 'IS'
LEFT OUTER JOIN UNNEST(itemProperties) AS Quantity ON Quantity.name = 'Quantity'
LEFT OUTER JOIN UNNEST(itemProperties) AS CCQ ON CCQ.name = 'CCQ '
LEFT OUTER JOIN UNNEST(itemProperties) AS ISD ON ISD.name = 'ISD '
LEFT OUTER JOIN UNNEST(itemProperties) AS DeliveredToday ON DeliveredToday.name = 'DeliveredToday'
LEFT OUTER JOIN UNNEST(itemProperties) AS IOOS ON IOOS.name = 'IOOS '
LEFT OUTER JOIN UNNEST(itemProperties) AS LDWA ON LDWA.name = 'LDWA '
LEFT OUTER JOIN UNNEST(itemProperties) AS NPCEPEON NPCEPEON.name = 'NPCEPEON'
LEFT OUTER JOIN UNNEST(itemProperties) AS NP ON NP.name = 'NP'
LEFT OUTER JOIN UNNEST(itemProperties) AS BUP ON BUP.name = 'BUP'
LEFT OUTER JOIN UNNEST(itemProperties) AS HP ON HP.name = 'HP'
LEFT OUTER JOIN UNNEST(itemProperties) AS CCP ON CCP.name = 'CCP '
LEFT OUTER JOIN UNNEST(itemProperties) AS NPD ON NPD.name = 'NPD '
LEFT OUTER JOIN UNNEST(itemProperties) AS UnitCode ON UnitCode.name = 'UnitCode'
LEFT OUTER JOIN UNNEST(itemProperties) AS P ON P.name = 'P'
.........
5 more like that
${when(incremental(), `where timestamp > (select max(timestamp) from ${self()})`)}

我所面临的问题与查询无关,因为它在BQ中运行良好。问题是,我不能从Dataform运行增量更新。

我得到以下错误信息:

Error: Resources exceeded during query execution: The query could not be executed in the allotted memory. Peak usage: 162% of limit. Top memory consumer(s): JOIN operations: 99% other/unattributed: 1%

我已经看到了解决方案,如果groupby不用于连接。

我该如何解决这个问题?

我应该指出(对于大多数读这篇文章的人来说很明显),我很少使用SQL。如果有其他比左外连接更有效地嵌套数据的方法,我将洗耳恭听。

有些操作必须在单台机器上执行[1]。当数据溢出插槽的内存时,就会遇到出现的错误。我的方法是将插槽类型更改为更高内存的机器[2],或者切换到固定费率定价以保留资源[3]。

否则,您将不得不遵循查询最佳实践,以避免对太多数据进行单个槽操作。如果您还没有迁移到STANDARD SQL,那么迁移到STANDARD SQL通常会有所帮助。另外,我附加了一些关于BigQuery在执行过程中内存如何工作的信息[4]。

[1] https://cloud.google.com/bigquery/docs/best-practices-performance-output use_a_limit_clause_with_large_sorts

[2] https://cloud.google.com/bigquery/docs/slots

[3] https://cloud.google.com/bigquery/docs/reservations-intro

[4] https://cloud.google.com/blog/products/bigquery/in-memory-query-execution-in-google-bigquery

相关内容

  • 没有找到相关文章

最新更新