我正试图在从事件日志转换的数据上应用隔离林,但我得到了"TypeError:无效的类型提升";是因为约会时间吗?我不明白我做错了什么!
我的表格部分(处理后(:
+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+
| org:resource | lifecycle:transition | concept:name | time:timestamp | case:REG_DATE | case:concept:name | case:AMOUNT_REQ |
+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+
| 52 | 0 | 9 | 2011 10-01 38:44.5 | 2011 10-01 38:44.5 | 0 | 20000 |
| 52 | 0 | 6 | 2011 10-01 38:44.9 | 2011 10-01 38:44.5 | 2 | 20000 |
| 52 | 0 | 7 | 2011 10-01 39:37.9 | 2011 10-01 38:44.5 | 0 | 20000 |
| 52 | 1 | 19 | 2011 10-01 39:38.9 | 2011 10-01 38:44.5 | 1 | 20000 |
| 68 | 2 | 19 | 2011 10-01 36:46.4 | 2011 10-01 38:44.5 | 3 | 20000 |
+--------------+----------------------+--------------+--------------------+--------------------+-------------------+-----------------+
打印时
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 262200 entries, 0 to 262199
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 org:resource 262200 non-null int64
1 lifecycle:transition 262200 non-null int64
2 concept:name 262200 non-null int64
3 time:timestamp 262200 non-null datetime64[ns]
4 case:REG_DATE 262200 non-null datetime64[ns]
5 case:concept:name 262200 non-null int64
6 case:AMOUNT_REQ 262200 non-null int32
dtypes: datetime64[ns](2), int32(1), int64(4)
memory usage: 13.0 MB
我的代码是:
from sklearn.ensemble import IsolationForest
contamination = 0.05
model = IsolationForest(contamination=contamination, n_estimators=10000)
model.fit(df)
df["iforest"] = pd.Series(model.predict(df))
df["iforest"] = df["iforest"].map({1: 0, -1: 1})
df["score"] = model.decision_function(df)
df.sort_values("score")
然而,我得到了以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-23-5edb86351ac8> in <module>
4
5 model = IsolationForest(contamination=contamination, n_estimators=10000)
----> 6 model.fit(df)
7
8 df["iforest"] = pd.Series(model.predict(df))
~.condaenvsprocess_mininglibsite-packagessklearnensemble_iforest.py in fit(self, X, y, sample_weight)
261 )
262
--> 263 X = check_array(X, accept_sparse=['csc'])
264 if issparse(X):
265 # Pre-sort indices to avoid that each individual tree of the
~.condaenvsprocess_mininglibsite-packagessklearnutilsvalidation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
~.condaenvsprocess_mininglibsite-packagessklearnutilsvalidation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
531
532 if all(isinstance(dtype, np.dtype) for dtype in dtypes_orig):
--> 533 dtype_orig = np.result_type(*dtypes_orig)
534
535 if dtype_numeric:
<__array_function__ internals> in result_type(*args, **kwargs)
TypeError: invalid type promotion
我在这个答案的帮助下找到了解决方案:Python-线性回归类型错误:类型提升无效
从技术上讲,你需要将时间戳转换为序号,它会起作用,我使用进行转换
df['time:timestamp'] = df['time:timestamp'].map(dt.datetime.toordinal)
df['case:REG_DATE'] = df['case:REG_DATE'].map(dt.datetime.toordinal)