如何在 BigQuery 中运行配对 t 检验.是否有任何可用的 UDF



我正在尝试使用 JavaScript 统计.js库在 bigquery 中运行配对的 t 检验,但我得到的数字不一致。

我的代码如下;

 CREATE TEMPORARY FUNCTION ttest(a ARRAY<FLOAT64>, b ARRAY<FLOAT64>)
 RETURNS FLOAT64
 LANGUAGE js AS
"""
   return statistical.methods.tTestTwoSample(a, b); 
   //statistical.methods.tTestTwoSample(sample1: Array, sample2: Array) 
"""
OPTIONS (
library="gs://my_bucket/statistical.js"
);

SELECT 
 ttest([9.96,   3.76,   1.17,   8.66,   5.25,   7.61,   5.8,    1.84,   7.06, 9.4,  2.99,   9.3,    9.01,   4.24,   3.52,   9.6,    7.59,   6.99,   9.62,   2.18] , 
    [3.96,  5.76,   7.17,   7.66,   9.25,   3.61,   4.8,    8.84,   6.06,   4.4,    1.99,   8.3,    9.01,   1.24,   5.52,   8.6,    5.59,   1.99,   7.62,   3.18]  ) ttest 

gs://my_bucket/statistical.js作为JS包保存在谷歌存储中的位置。

该软件包可在此处获得https://github.com/adrien2p/statistical-js

两个样本 t 检验statistical.methods.tTestTwoSample(sample1: Array, sample2: Array)应返回 p 值。我无法确定测试是否配对或其他方式。当我运行代码时,我得到 4.099,据我所知这不是一个 p 值。 我在 Excel 中运行了测试以使用 =T.TEST() 验证结果,并且在配对的双侧检验中获得了 0.4871 的 p 值。我还尝试了除配对以外的其他 t 检验,并尝试将 4.099 结果视为 T 统计量,而没有任何数字匹配的运气。

所以我的问题是,如何在 BigQuery 中运行配对的 t 检验,并得到一个结果 p 值?我应该使用其他软件包吗?

提前感谢任何帮助。

没有太多工作,我决定手动计算 t 统计量并使用 jStat 库(github.com/jstat/jstat 提供(将统计量转换为 p 值。

我的代码如下;

CREATE TEMPORARY FUNCTION tscore_to_p(a FLOAT64, b FLOAT64, c FLOAT64)
 RETURNS FLOAT64
 LANGUAGE js AS
"""
  return jStat.ttest(a,b,c); //jStat.ttest( tscore, n, sides)
"""
OPTIONS (
 library="gs://my_bucket/jstat.min.js"
);

WITH data AS ( 
 SELECT * FROM 
 (SELECT 9.96 AS A, 3.96 AS B) UNION ALL
 (SELECT 3.76 AS A, 5.76 AS B) UNION ALL
 (SELECT 1.17 AS A, 7.17 AS B) UNION ALL
 (SELECT 8.66 AS A, 7.66 AS B) UNION ALL
 (SELECT 5.25 AS A, 9.25 AS B) UNION ALL
 (SELECT 7.61 AS A, 3.61 AS B) UNION ALL
 (SELECT 5.80 AS A, 4.80 AS B) UNION ALL
 (SELECT 1.84 AS A, 8.84 AS B) UNION ALL
 (SELECT 7.06 AS A, 6.06 AS B) UNION ALL
 (SELECT 9.40 AS A, 4.40 AS B) UNION ALL
 (SELECT 2.99 AS A, 1.99 AS B) UNION ALL
 (SELECT 9.30 AS A, 8.30 AS B) UNION ALL
 (SELECT 9.01 AS A, 9.01 AS B) UNION ALL
 (SELECT 4.24 AS A, 1.24 AS B) UNION ALL
 (SELECT 3.52 AS A, 5.52 AS B) UNION ALL
 (SELECT 9.60 AS A, 8.60 AS B) UNION ALL
 (SELECT 7.59 AS A, 5.59 AS B) UNION ALL
 (SELECT 6.99 AS A, 1.99 AS B) UNION ALL
 (SELECT 9.62 AS A, 7.62 AS B) UNION ALL
 (SELECT 2.18 AS A, 3.18 AS B)
 ) 

SELECT 
   COUNT(*) n
 , COUNT(*)-1 dof
 , AVG(difference) mean
 , STDDEV_SAMP(difference) SD 
 , STDDEV_SAMP(difference)/SQRT(COUNT(*)) SE
 , AVG(difference)/ (STDDEV_SAMP(difference)/SQRT(COUNT(*))) t 
 , tscore_to_p((AVG(difference)/ (STDDEV_SAMP(difference)/SQRT(COUNT(*)))) COUNT(*), 2) p_value 
FROM 
  (SELECT *, (A-B) difference FROM data)

这导致 p 值为 0.4871,与 Excel 中配对 t 检验运行的 p 值相同。

感谢任何考虑过这个问题的人。

最新更新