MySQL 忽略异常值



我必须向同事提供一些数据,我在MySQL中分析它时遇到了问题。

我有一张名为"付款"的表。每笔付款都有以下列:

  1. 客户(我们的客户,例如银行(
  2. Amount_gbp(相当于交易价值的英镑(
  3. 货币
  4. Origin_country
  5. Client_type(个人或公司(

我写了非常简单的查询,例如:

SELECT  
AVG(amount_GBP), 
COUNT(client) AS '#Of Results'
FROM payments
WHERE client_type = 'individual'
AND amount_gbp IS NOT NULL
AND currency = 'TRY'
AND country_origin = 'GB'
AND date_time BETWEEN '2017/1/1' AND '2017/9/1'

但我真正需要做的是从平均值 AND/OR 中消除异常值,仅包括与平均值的多个标准偏差内的结果。

例如,忽略 2% 结果的前/后 10 个结果等。 和/或忽略任何超出平均值 2 个 STDEV 的结果

谁能帮忙?

---编辑的答案 - 尝试让我知道

---最好的方法是创建一个带有 avg 和 std_dev 值的临时表,并与它们进行比较。如果这不可行,请告诉我:

CREATE TEMPORARY TABLE payment_stats AS
SELECT
AVG(p.amount_gbp) as avg_gbp,
STDDEV(amount_gbp) as std_gbp,
(SELECT MIN(srt.amount_gbp) as max_gbp
FROM (SELECT amount_gbp
FROM payments
<... repeat where no p. ...>
ORDER BY amount_gbp DESC
LIMIT <top_numbers to ignore>
) srt
) max_g,
(SELECT MAX(srt.amount_gbp) as min_gbp
FROM (SELECT amount_gbp
FROM payments
<... repeat where no p. ...>
ORDER BY amount_gbp ASC
LIMIT <top_numbers to ignore>
) srt
) min_g
FROM payments
WHERE client_type = 'individual'
AND amount_gbp IS NOT NULL
AND currency = 'TRY'
AND country_origin = 'GB'
AND date_time BETWEEN '2017/1/1' AND '2017/9/1';

然后,您可以与临时表进行比较

SELECT  
AVG(p.amount_gbp) as avg_gbp, 
COUNT(p.client) AS '#Of Results'
FROM payments p
WHERE
p.amount_gbp >= (SELECT (avg_gbp - std_gbp*2) 
FROM payment_stats)
AND p.amount_gbp <= (SELECT (avg_gbp + std_gbp*2) 
FROM payment_stats)
AND p.amount_gbp > (SELECT min_g FROM payment_stats)
AND p.amount_gbp < (SELECT max_g FROM payment_stats)
AND p.client_type = 'individual'
AND p.amount_gbp IS NOT NULL
AND p.currency = 'TRY'
AND p.country_origin = 'GB'
AND p.date_time BETWEEN '2017/1/1' AND '2017/9/1';

-- 后来

DROP TEMPORARY TABLE payment_stats;

请注意,我必须重复 WHERE 条件。还可以将*2更改为您需要的任何<factor>

还是噗!

每次比较将检查不同的统计数据

让我知道这是否更好

相关内容

  • 没有找到相关文章