BigQuery-选择序列中最后一个非null值



我正试图带回用户旅程中的最后一个非null值。

就上下文而言,我在我的网站上查看用户在旅途中的每一个动作,最后一个非null值将为我提供用户将产品添加到购物车之前的最后一个产品列表名称。

在我的示例中,"ProductList"列中的最后一个非null值将是"EventAction"列中"Add to Cart"行之前的"Face Masks">

这是我在子查询中使用first_value之前的结果:

**fullVisitorId|visitStartTime|visitId|visitNumber|hitNumber|ProductList|ProductSKU|EventCat|EventAction**
12345XX | 1608202 | 56789 | 50 | 161 | Face Masks | ABC1401 | Ecommerce | Product Impression
12345XX | 1608202 | 56789 | 50 | 161 | Face Masks | ABC1501 | Ecommerce | Product Impression
12345XX | 1608202 | 56789 | 50 | 161 | Face Masks | ABC1601 | Ecommerce | Product Impression 
12345XX | 1608202 | 56789 | 50 | 161 | Face Masks | ABC1701 | Ecommerce | Product Impression
12345XX | 1608202 | 56789 | 50 | 162 | Face Masks | ABC1801 | Ecommerce | Product Click
12345XX | 1608202 | 56789 | 50 | 163 | NULL | ABC1801 | Ecommerce | Product View
12345XX | 1608202 | 56789 | 50 | 164 | NULL | ABC1801 | Ecommerce | Add to Cart

然而,在使用"first_value"函数后的代码中,我仍然得到null值。我哪里错了?有更简单的方法吗?

SELECT
FIRST_VALUE(ProductList IGNORE NULLS) 
OVER (PARTITION BY EventAction ORDER BY VisitNumber ASC, hitNumber ASC) AS NewProductList
FROM(
SELECT
fullVisitorId,
visitStartTime,
visitId,
visitNumber,
hits.hitNumber AS hitNumber,
CASE WHEN product.productListName = '(not set)' THEN NULL ELSE product.productListName END AS ProductList,
product.productSKU AS ProductSKU,
hits.eventInfo.eventCategory AS EventCategory,
hits.eventInfo.eventAction AS EventAction,
FROM
`tablename.ga_sessions_20200914`,
UNNEST(hits) AS hits,
UNNEST(hits.product) product
WHERE
geoNetwork.country = 'United Kingdom'
AND fullVisitorId = '1000104589833493743'
ORDER BY
1,
4 ASC,
5 ASC
LIMIT 10000) WHERE EventAction = 'Add to Cart'

我发现您的查询有一些问题

  1. 您按EventAction进行分区-这是不正确的-您应该使用一些列来真正显示您要在其中标识last non-null value的组-这样您就可以使用fullVisitorId
  2. 您的WHERE EventAction = 'Add to Cart'子句在应用LAST_VALUE之前应用,因此NULL

因此,考虑到以上内容(应用正确的分区并将WHERE子句移到外部(-以下内容应该有效(BigQuery Standard SQL(

#standardSQL
SELECT NewProductList
FROM (
SELECT *, 
LAST_VALUE(ProductList IGNORE NULLS) 
OVER (PARTITION BY fullVisitorId ORDER BY VisitNumber, hitNumber) AS NewProductList
FROM (
SELECT
fullVisitorId,
visitStartTime,
visitId,
visitNumber,
hits.hitNumber AS hitNumber,
CASE WHEN product.productListName = '(not set)' THEN NULL ELSE product.productListName END AS ProductList,
product.productSKU AS ProductSKU,
hits.eventInfo.eventCategory AS EventCategory,
hits.eventInfo.eventAction AS EventAction,
FROM
`tablename.ga_sessions_20200914`,
UNNEST(hits) AS hits,
UNNEST(hits.product) product
WHERE
geoNetwork.country = 'United Kingdom'
AND fullVisitorId = '1000104589833493743'
ORDER BY
1,
4 ASC,
5 ASC
LIMIT 10000
)
)
WHERE EventAction = 'Add to Cart'   

如果想要最后一个值,则需要降序或last_value()。事实上,对于您想要的,first_value()是更简单的解决方案:

FIRST_VALUE(ProductList IGNORE NULLS) OVER
(PARTITION BY VisitNumber ORDER BY hitNumber DESC) AS NewProductList

我从表达式中删除了eventaction。这似乎与你想要什么无关。

最新更新