使用带有rpy2的Python调用R包数据



我想在Python中使用来自R包library(ISLR)Auto数据。受rpy2简介的启发,我做了一些测试,如下所示:

from rpy2 import robjects
from rpy2.robjects.packages import importr, data
from rpy2.robjects import pandas2ri
pandas2ri.activate()
datasets = importr('datasets') # data(mtcars) in library(datasets)
mtcars = data(datasets).fetch('mtcars')['mtcars']
ISLR = importr('ISLR') # data(Auto) in library(ISLR)
Auto = data(ISLR).fetch('Auto')['Auto']
#r_df = mtcars # success!!!
r_df = Auto # fail???
df = pandas2ri.ri2py(robjects.DataFrame(r_df))
df.info()

然后我可以成功测试data(mtcars) in library(datasets),同时进行测试data(Auto) in library(ISLR)将错误显示为

Parameter 'categories' must be list-like

如何解决此问题?

您使用的是什么版本的rpy2?我使用的是rpy2-3.3.6,它是在Conda环境中使用pip安装的,带有R-4.0.3(来自Conda forge(和Python-3.6.11(来自Con达forge(,我能够从数据集读取mtcars,以及从ISLR中读取Auto。请检查我在下面得到的结果

我认为您看到的错误可能是一个错误,也可能是配置/依赖关系的副作用。您应该将您的rpy2版本升级到最新的>=3.3.0并仔细检查依赖关系。

请查看这篇文章,了解rpy2 Pandas的功能是如何随着时间的推移而变化的——如何将r数据帧转换回Pandas?

以下是我的命令行中的整个序列:

Python 3.6.11|由conda-forge打包|(默认,2020年8月5日20:09:42([GCC 7.5.0]在linux上类型";"帮助"版权"学分";或";许可证";了解更多信息。

Importing relevant libraries
>>> import rpy2.robjects as ro
>>> import rpy2.robjects.packages as rpackages
>>> from rpy2.robjects.vectors import StrVector
>>> from rpy2.robjects.packages import importr, data
Importing packages and reading in the data
>>> datasets = importr('datasets')
>>> mtcars = data(datasets).fetch('mtcars')['mtcars']
>>> ISLR = importr('ISLR')
>>> Auto = data(ISLR).fetch('Auto')['Auto']
>>> r_df_mtcars = mtcars (using labels to clarify origin of data)
>>> r_df_Auto = Auto
Converting R Data frames into Pandas Data frames
*Note* the function **conversion.rpy2py** New from rpy2 version 3.3.0
>>> pd_df_mtcars = ro.conversion.rpy2py(r_df_mtcars)
>>> pd_df_Auto = ro.conversion.rpy2py(r_df_Auto)
Examine the data using the Pandas head() for both
>>> pd_df_mtcars.head()
mpg  cyl   disp     hp  drat     wt   qsec   vs   am  gear  carb
Mazda RX4          21.0  6.0  160.0  110.0  3.90  2.620  16.46  0.0  1.0   4.0   4.0
Mazda RX4 Wag      21.0  6.0  160.0  110.0  3.90  2.875  17.02  0.0  1.0   4.0   4.0
Datsun 710         22.8  4.0  108.0   93.0  3.85  2.320  18.61  1.0  1.0   4.0   1.0
Hornet 4 Drive     21.4  6.0  258.0  110.0  3.08  3.215  19.44  1.0  0.0   3.0   1.0
Hornet Sportabout  18.7  8.0  360.0  175.0  3.15  3.440  17.02  0.0  0.0   3.0   2.0
>>> pd_df_Auto.head()
mpg  cylinders  displacement  horsepower  weight  acceleration  year  origin                       name
1  18.0        8.0         307.0       130.0  3504.0          12.0  70.0     1.0  chevrolet chevelle malibu
2  15.0        8.0         350.0       165.0  3693.0          11.5  70.0     1.0          buick skylark 320
3  18.0        8.0         318.0       150.0  3436.0          11.0  70.0     1.0         plymouth satellite
4  16.0        8.0         304.0       150.0  3433.0          12.0  70.0     1.0              amc rebel sst
5  17.0        8.0         302.0       140.0  3449.0          10.5  70.0     1.0                ford torino
To convert Pandas df to R df you can use:
>>> r_mtcars_df = ro.conversion.py2rpy(pd_df_mtcars)
>>> r_Auto_df = ro.conversion.py2rpy(pd_df_mtcars)

最新更新