在Python中动态运行ML算法



我有一个像这样的数据框架

id    Alg
--------------
1     RandomForestClassifier(max_features="sqrt", n_jobs=3) 
2     LogisticRegression(C=0.1, solver="liblinear")
3     RandomForestClassifier(n_estimators=1000) 
4     LogisticRegression(C=1.0, solver="liblinear")
.
.

我想动态地构建这些模型所以我有这样的代码

for i in df["id"]:
print("i = " + str(i))
Alg =  str(df[df["id"]==i]["Alg"]).strip()
clf = eval(Alg)

但是我得到了这个错误

Traceback (most recent call last):
File "<stdin>", line 9, in <module>
File "<string>", line 1
1    RandomForestClassifier()
^
SyntaxError: invalid syntax  

有什么好办法吗?

这是我尝试过的

1-在算法名前加4个空格(无效)

Alg =  "    " + str(df[df["id"]==i]["Alg"]).strip()

2-将算法替换为1+1(工作clf值为2)

Alg =  "1+1" #+ str(df[df["id"]==i]["Alg"]).strip()

考虑这个解决方案。我存储了一个包含函数名的列表和一个包含所有参数的字典,而不是函数调用。然后,我使用安全的ast.literal_eval对其进行解析(尽管您也可以使用JSON进行解析),并将参数传递给适当的函数:

import pandas as pd
import ast
names = [
"['RandomForestClassifier',{'max_features':'sqrt', 'n_jobs':3}]",
"['LogisticRegression',{'C':0.1, 'solver':'liblinear'}]",
"['RandomForestClassifier',{'n_estimators':1000}]",
"['LogisticRegression',{'C':1.0,'solver':'liblinear'}]"
]
df = pd.DataFrame( names, columns=['Alg'] )
def RandomForestClassifier( **kwargs ):
print( "Received", kwargs )
def LogisticRegression( **kwargs ):
print( "Received", kwargs )
for row in df['Alg']:
info = ast.literal_eval(row)
print(info)
if info[0] == 'RandomForestClassifier':
clf = RandomForestClassifier( **info[1] )
elif info[0] == 'LogisticRegression':
clf = LogisticRegression( **info[1] )

输出:

['RandomForestClassifier', {'max_features': 'sqrt', 'n_jobs': 3}]
Received {'max_features': 'sqrt', 'n_jobs': 3}
['LogisticRegression', {'C': 0.1, 'solver': 'liblinear'}]
Received {'C': 0.1, 'solver': 'liblinear'}
['RandomForestClassifier', {'n_estimators': 1000}]
Received {'n_estimators': 1000}
['LogisticRegression', {'C': 1.0, 'solver': 'liblinear'}]
Received {'C': 1.0, 'solver': 'liblinear'}

最新更新