隔离林-TypeError:类型提升无效



我正试图在从事件日志转换的数据上应用隔离林,但我得到了"TypeError:无效的类型提升";是因为约会时间吗?我不明白我做错了什么!

我的表格部分(处理后(:

+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+
| org:resource | lifecycle:transition | concept:name |   time:timestamp   |   case:REG_DATE    | case:concept:name | case:AMOUNT_REQ |
+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+
|           52 |                    0 |            9 | 2011 10-01 38:44.5 | 2011 10-01 38:44.5 |                 0 |           20000 |
|           52 |                    0 |            6 | 2011 10-01 38:44.9 | 2011 10-01 38:44.5 |                 2 |           20000 |
|           52 |                    0 |            7 | 2011 10-01 39:37.9 | 2011 10-01 38:44.5 |                 0 |           20000 |
|           52 |                    1 |           19 | 2011 10-01 39:38.9 | 2011 10-01 38:44.5 |                 1 |           20000 |
|           68 |                    2 |           19 | 2011 10-01 36:46.4 | 2011 10-01 38:44.5 |                 3 |           20000 |
+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+

打印时

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 262200 entries, 0 to 262199
Data columns (total 7 columns):
#   Column                Non-Null Count   Dtype         
---  ------                --------------   -----         
0   org:resource          262200 non-null  int64         
1   lifecycle:transition  262200 non-null  int64         
2   concept:name          262200 non-null  int64         
3   time:timestamp        262200 non-null  datetime64[ns]
4   case:REG_DATE         262200 non-null  datetime64[ns]
5   case:concept:name     262200 non-null  int64         
6   case:AMOUNT_REQ       262200 non-null  int32         
dtypes: datetime64[ns](2), int32(1), int64(4)
memory usage: 13.0 MB

我的代码是:

from sklearn.ensemble import IsolationForest
contamination = 0.05
model = IsolationForest(contamination=contamination, n_estimators=10000)
model.fit(df)
df["iforest"] = pd.Series(model.predict(df))
df["iforest"] = df["iforest"].map({1: 0, -1: 1})
df["score"] = model.decision_function(df)
df.sort_values("score")

然而,我得到了以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-5edb86351ac8> in <module>
4 
5 model = IsolationForest(contamination=contamination, n_estimators=10000)
----> 6 model.fit(df)
7 
8 df["iforest"] = pd.Series(model.predict(df))
~.condaenvsprocess_mininglibsite-packagessklearnensemble_iforest.py in fit(self, X, y, sample_weight)
261                 )
262 
--> 263         X = check_array(X, accept_sparse=['csc'])
264         if issparse(X):
265             # Pre-sort indices to avoid that each individual tree of the
~.condaenvsprocess_mininglibsite-packagessklearnutilsvalidation.py in inner_f(*args, **kwargs)
70                           FutureWarning)
71         kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72         return f(**kwargs)
73     return inner_f
74 
~.condaenvsprocess_mininglibsite-packagessklearnutilsvalidation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
531 
532         if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig):
--> 533             dtype_orig = np.result_type(*dtypes_orig)
534 
535     if dtype_numeric:
<__array_function__ internals> in result_type(*args, **kwargs)
TypeError: invalid type promotion

我在这个答案的帮助下找到了解决方案:Python-线性回归类型错误:类型提升无效

从技术上讲,你需要将时间戳转换为序号,它会起作用,我使用进行转换

df['time:timestamp'] = df['time:timestamp'].map(dt.datetime.toordinal)
df['case:REG_DATE'] = df['case:REG_DATE'].map(dt.datetime.toordinal)

最新更新