我最近开始学习sql,之前没有任何编程经验,所以这可能只是一个愚蠢的错误(在这种情况下,很抱歉发了很长的帖子:))。如果你能帮我解决目前的问题,那就太好了。
我有一张看起来像的桌子
id/n(特定事件的名称)/utc(时间戳)/json_data(一个包含多个参数的json字符串)。
我的目标很简单:我试图获得json_data中的值参数的总和,按n分组。不幸的是,有些问题使执行过程变得更加复杂。
-
我们有一个垃圾邮件问题,这会导致相同的事件被发送数百或数千次,需要将其过滤掉。我通常通过在group子句中使用utc(时间戳)来解决它,该子句还将包括其他选定的列,并获得每个特定事件的一个实例。
-
有些事件在其"值字段"中返回负值,在所有计数和求和过程中需要忽略这些负值。
-
由于事情并不容易,json_data列中的值字段的名称总是不同的,这取决于发送的事件类型。然而,我通过您可以在查询中看到的各种字符串操作基本上解决了这个问题。
以下是我到目前为止得到的
SELECT
b.Event_Name as Event_Name
, COUNT(b.Event_Name) as event_occurrences
, SUM(b.item_value) as user_spendings
FROM
(SELECT
a.id as Player_ID
, a.n as Event_Name
, a.utc as timing
, CASE
WHEN
MAX( a.ALPHA_Value
+ a.BETA_Value
+ a.GAMMA_Value
+ a.DELTA_Value
+ a.EPSILON_Value
+ a.BETAUPGRADE_Value
+ a.ZETA_Value
+ a.ALPHASKIN_Value
+ a.UPGRADEALPHA_Value) <= 0
THEN 0
ELSE
MAX(a.ALPHA_Value
+ a.BETA_Value
+ a.GAMMA_Value
+ a.DELTA_Value
+ a.EPSILON_Value
+ a.BETAUPGRADE_Value
+ a.ZETA_Value
+ a.ALPHASKIN_Value
+ a.UPGRADEALPHA_Value) END as item_value
FROM
(SELECT
id
, n
, utc
, MAX(TRIM(get_json_object(json_data, '$. ALPHA_Value '))) as ALPHA_Value
, MAX(TRIM(get_json_object(json_data, '$. BETA_Value '))) as BETA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. GAMMA_Value ')), 6,
(LOCATE(' resource 2',
SUBSTR
(TRIM(get_json_object(json_data, '$. GAMMA_Value ')), 6))-1))) as GAMMA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. DELTA_Value ')), 6)) as DELTA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. EPSILON_Value ')), 6)) as EPSILON_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. BETAUPGRADE_Value ')), 6)) as BETAUPGRADE_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. ZETA_Value ')), 6)) as ZETA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. ALPHASKIN_Value ')), 6)) as ALPHASKIN_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. UPGRADEALPHA_Value ')), 6,
(LOCATE(' resource 2',
SUBSTR
(TRIM(get_json_object(json_data, '$. UPGRADEALPHA_Value ')), 6))-1))) as UPGRADEALPHA_Value
FROM application_events
WHERE
month = 201409
AND FROM_UNIXTIME(utc_timestamp) > '2014-09-04 12:00:00'
GROUP BY id, n, utc
ORDER BY id, n
) a
GROUP by a.id, a.n, a.utc
ORDER by timing, Event_Name
) b
WHERE b.item_value > 0
GROUP by b.Event_Name
ORDER by user_spendings
我的理由如下:
我从json_data中获取值,同时通过id,n,utc使用GROUP清除垃圾邮件。我在get_json_object上使用MAX来允许与前面的列进行分组。由于id、名称和时间戳的组合是唯一的(垃圾邮件ofc除外),MAX将使用相同的值。由于每个事件只有一个值字段(根据事件类型有不同的名称),所以我将拥有所有列,但只有一列有值(其他列为空)。
我去掉了负值:现在,由于我无法在where子句中放入和,我唯一能想到的方法是创建另一个表(b),它将检查a中所有值列的和是否为负数(正如我所说,除了一个外,它们都是空的,所以如果有负数,和也会为负数),如果不是,则返回和(别名为item_value)。
第三个表将最终统计事件的数量并对值求和。
我目前的问题是第二步。当我运行子查询a时,它看起来很好,我会得到结果。当我在原始查询(计算事件并求和值的查询)内部运行时,我也会得到结果。所以我想我设置条件的方式有问题,因为完整的查询在表中没有结果。
我试着把总和放在WHERE子句中,但没有成功。任何想法都是受欢迎的,尤其是如果你知道更简单的方法。
非常感谢大家。
您的查询看起来是正确的,我删除了一些额外的部分(但这不是必要的):
SELECT
b.Event_Name as Event_Name
, COUNT(b.Event_Name) as event_occurrences
, SUM(b.item_value) as user_spendings
FROM (SELECT
a.id as Player_ID
, a.n as Event_Name
, a.utc as timing
COALESCE(a.ALPHA_Value, CAST(0 AS BIGINT))
+ COALESCE(a.BETA_Value, CAST(0 AS BIGINT))
+ COALESCE(a.GAMMA_Value, CAST(0 AS BIGINT))
+ COALESCE(a.DELTA_Value, CAST(0 AS BIGINT))
+ COALESCE(a.EPSILON_Value, CAST(0 AS BIGINT))
+ COALESCE(a.BETAUPGRADE_Value, CAST(0 AS BIGINT))
+ COALESCE(a.ZETA_Value, CAST(0 AS BIGINT))
+ COALESCE(a.ALPHASKIN_Value, CAST(0 AS BIGINT))
+ COALESCE(a.UPGRADEALPHA_Value, CAST(0 AS BIGINT)) as item_value
FROM (SELECT
id
, n
, utc
, MAX(TRIM(get_json_object(json_data, '$. ALPHA_Value '))) as ALPHA_Value
, MAX(TRIM(get_json_object(json_data, '$. BETA_Value '))) as BETA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. GAMMA_Value ')), 6,
(LOCATE(' resource 2',
SUBSTR
(TRIM(get_json_object(json_data, '$. GAMMA_Value ')), 6))-1))) as GAMMA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. DELTA_Value ')), 6)) as DELTA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. EPSILON_Value ')), 6)) as EPSILON_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. BETAUPGRADE_Value ')), 6)) as BETAUPGRADE_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. ZETA_Value ')), 6)) as ZETA_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. ALPHASKIN_Value ')), 6)) as ALPHASKIN_Value
, MAX(SUBSTR
(TRIM(get_json_object(json_data, '$. UPGRADEALPHA_Value ')), 6,
(LOCATE(' resource 2',
SUBSTR
(TRIM(get_json_object(json_data, '$. UPGRADEALPHA_Value ')), 6))-1))) as UPGRADEALPHA_Value
FROM application_events
WHERE
month = 201409
AND FROM_UNIXTIME(utc_timestamp) > '2014-09-04 12:00:00'
GROUP BY id, n, utc
) a
) b
WHERE b.item_value > 0
GROUP by b.Event_Name
ORDER by user_spendings
我想您试图求和的某些值为NULL。所以我添加了COALESCE
另外,你不需要子查询"b",你可以在子查询"a"中做同样的事情,但为了更好的可读性,我没有碰这个