如何在多重线性重回归中测试所有可能的迭代，并返回最佳的R平方和P值组合

我正在尝试获得最佳组合，以达到最佳的R平方和p值。在这种情况下，我有6列来运行代码，但我有仅用于此组合的R-Squared和P值([col0，col1，col2，col3，col4，col5]与[col6](。我想测试所有可能的组合，比如：

[col0]与[col6]

[col0+col1]与[col6]

[col0+col1+col2]与[col6]。。。

有没有办法把它自动化？所以我不需要手头上所有可能的组合。

import statsmodels.api as sm
from sklearn import linear_model
X = df_norm[["col0", 
"col1", 
"col2", 
"col3", 
"col4", 
"col5"]]
y = df_norm["col6"]
import statsmodels.api as sm
# with statsmodels
X = sm.add_constant(X)

model = sm.OLS(y, X).fit()
print_model = model.summary()

您想要实现的是iterools文档中显示的powerset函数：

from itertools import chain, combinations
def powerset(iterable):
#"powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
s = list(iterable)
return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

然后，您可以对列的每个子集进行迭代，并根据需要处理结果。你的循环应该是这样的：

for subset in powerset(X.columns):
if len(subset) > 0:
model = sm.OLS(y, X[list(subset)]).fit()

相关内容

最新更新

热门标签：