Python，计算二项式P值:此代码看起来正确

我有此数据集：

ItemNumber  Successes   Trials    Prob
15          14           95       0.047
9625        20           135      0.047
19          14           147      0.047
24          12           120      0.047
20          15           133      0.047
22          8            91       0.047
9619        16           131      0.047
10006       8            132      0.047
25          15           127      0.047

我想确定每个项目的辅助二项式分布p值，以了解观察等于或更高数字的概率

我使用了此代码：

import sys
import scipy
from scipy.stats.distributions import binom
import sys
for line in open(sys.argv[1], 'r').readlines():
    line = line.strip().split()
    Item,num_succ,num_trials,prob = line[0],int(line[1]),int(line[2]),float(line[3])
    print Item + "t" + str(num_succ) + "t" + str(num_trials) + "t" + str(prob) + "t" + str(1 - (binom.cdf(num_succ, num_trials, prob)))

输出看起来像这样：

Item    NumSucc NumTrials   Prob    Binomial
15      14      95         0.047    3.73e-05
9625    20      135        0.047    1.48e-06
19      14      147        0.047    0.004
24      12      120        0.047    0.0043
20      15      133        0.047    0.00054
22      8       91         0.047    0.027
9619    16      131        0.047    0.0001
10006   8       132        0.047    0.169
25      15      127        0.047    0.0003

问题：当我选择一条线并检查获得的累积二项式P值时，相对于这样的在线工具：http：//stattrek.com/online-calculator/binomial.aspx，结果不一样。<<<<<<<<<<<<<<

例如，

项目20（＃成功= 15，＃试验= 133，prob = 0.047）：

My Binomial P Val = 0.00054
StatTrek P Val = 0.0015

但是，我可以从Stattrek中看到，我查找的是尖端的概率： p（x> 15），但是由于我想要"等于或大于或大于"，所以我实际上要计算的是 P（X> = 15）（是0.0015）。

我正在努力正确编辑上述代码，以更改从返回的p值"查找大于" > "的发病率的数量到" 。如果有人能证明我会很感激。如果您查看这个问题，我正在尝试遵循Volodymyr的评论。

二元分布是一个离散的分布。因此以下是正确的p（x> 14）= p（x> = 15）。

因此，如果binom.cdf计算p（x> n）的概率（是否？我没有找到文档），则必须将其更改为p（x> n -1），如果要测试对于p（x> = n）。

如果要计算每个记录的p_value，请使用此代码，这很容易：

#alternative : {‘two-sided’, ‘greater’, ‘less’},
from scipy.stats import binom_test
binom_test(x= number_of_occurance, n = number_of_trail, p= probability, alternative= 'greater')

相关内容

最新更新

热门标签：