将数据从 Python 发送到 R 以使用 rpy2 执行统计测试



我想使用R的Fisher精确测试功能(特别是MC模拟功能)与Python接口。我正在尝试使用 rpy2 来做到这一点,但这比我想象的要困难得多。

我可以使用以下代码获取 Fisher 测试方法的接口:

import rpy2.robjects as robjects
fisher = robjects.r['fisher.test']

但是,如何将2xN矩阵传递给函数并检索 p 值?

考虑导入 R 的 stats 包,并将 Fisher Test 作为 Python 函数运行。请注意,result对象是<class 'rpy2.robjects.vectors.ListVector'>的,因此必须转换为 Python 字典,如下所示。

import rpy2
from rpy2.robjects.numpy2ri import numpy2ri
from rpy2.robjects.packages import importr
import numpy as np
cont = np.reshape(np.arange(0,4), (2,2))
statspackage = importr('stats',  robject_translations={'format_perc': '_format_perc'})    
result = statspackage.fisher_test(numpy2ri(cont), simulate_p_value = True, B = 100)
# DEPRECATED CONVERSION
import pandas.rpy.common as com
pyresultdict = com.convert_robj(result)
for k, v in  pyresultdict.items():
    print(k, v)
# data.name ['structure(c(0L, 2L, 1L, 3L), .Dim = c(2L, 2L))']
# p.value [1.0]
# estimate odds ratio    0.0
# dtype: float64
# null.value odds ratio    1.0
# dtype: float64
# conf.int [0.0, 77.90626902008512]
# alternative ['two.sided']
# method ["Fisher's Exact Test for Count Data"]

另外请注意,您可能会收到有关弃用 com.convert_to_r_dataframecom.convert_robj(rdf) 的警告,应按照此处的建议将其替换为 pandas2ri.pandas2ri()pandas2ri。但是,我这边的转换不适用于 ListVector 对象。理想情况下,上述转换将替换为以下内容:

# CURRENT CONVERSION
from rpy2.robjects import pandas2ri
pandas2ri.activate()
pyresultdict = pandas2ri.ri2py(result)
for k, v in  pyresultdict.items():
    print(k, v)

这是执行此操作的一种方法:

import rpy2.robjects as robjects
from rpy2.robjects import r
from rpy2.robjects.numpy2ri import numpy2ri
from rpy2.robjects.packages import importr
import numpy as np
cont = np.reshape(np.arange(0,4), (2,2))
print cont
r_cont = numpy2ri(cont)
r.assign("cont", r_cont)
r("res <- fisher.test(cont, simulate.p.value = TRUE, B = 100)")
r_result = r("res")
p_value = r_result[0][0]
print r_result
print p_value

最新更新