Python - ValueError:找到具有 0 个样本的数组(缩放函数)



我一直在尝试修复一个错误很长一段时间。

可以通过删除数据帧的前一行(可能是前两行)来解决此问题(我认为是这样)。 顺便说一句。我在谷歌合作实验室工作..x

有人知道如何解决问题吗?

def preprocess_df(df):
df = df.drop("future", 1) 
for col in df.columns:  
if col != "target":  
df[col] = df[col].pct_change()  
df.dropna(inplace=True)
df[col] = preprocessing.scale(df[col].values) 
df.dropna(inplace=True) 

main_df = pd.DataFrame() 
ratios = ["EURCZK=X"]
for ratio in ratios:
dataset = f'EURCZK=X/{ratio}.csv'
df = pd.read_csv('EURCZK=X.csv', names=['Date', 'High', 'Low', 'Open', 'Close', 'Volume', 'Adj Close'], skiprows=2) 

df.rename(columns={"close": f"{ratio}_close", "volume": f"{ratio}_volume"}, inplace=True)

df.set_index("Date", inplace=True)
df = df[[f"Close", f"Volume"]]  
if len(main_df)==0:  
main_df = df  
else:  
main_df = main_df.join(df)
main_df.fillna(method="ffill", inplace=True) 
main_df.dropna(inplace=True)
#print(main_df.head())  
main_df['future'] = main_df[f'{RATIO_TO_PREDICT}'].shift(-FUTURE_PERIOD_PREDICT)
main_df['target'] = list(map(classify, main_df[f'Close'], main_df['future']))
main_df.dropna(inplace=True)
#print(main_df.tail(10)) 
Date = sorted(main_df.index.values)
last_5pct = sorted(main_df.index.values)[-int(0.05*len(Date))]  
validation_main_df = main_df[(main_df.index >= last_5pct)]  
main_df = main_df[(main_df.index < last_5pct)]  
print(preprocess_df)
print(df.head)
imputer = imputer(missing_values="NaN", strategy="mean", axis=0)
train_x, train_y = preprocess_df(main_df)
validation_x, validation_y = preprocess_df(validation_main_df) #Preprocess dat
#print(f"train data: {len(train_x)} validation: {len(validation_x)}")
#print(f"Dont buys: {train_y.count(0)}, buys: {train_y.count(1)}")
#print(f"VALIDATION Dont buys: {validation_y.count(0)}, buys: {validation_y.count(1)}")

输出为:

<function preprocess_df at 0x7fc2568ceb70>
<bound method NDFrame.head of                 Close  Volume     future  target
Date                                            
2003-12-02  32.337502     0.0  32.580002       1
2003-12-03  32.410000     0.0  32.349998       0
2003-12-04  32.580002     0.0  32.020000       0
2003-12-05  32.349998     0.0  32.060001       0
2003-12-08  32.020000     0.0  32.099998       1
...               ...     ...        ...     ...
2020-07-28  26.263800     0.0  26.212500       0
2020-07-29  26.196301     0.0  26.238400       1
2020-07-30  26.212500     0.0  26.258400       1
2020-08-02  26.238400     0.0  26.105101       0
2020-08-03  26.258400     0.0  26.228500       0
[4302 rows x 4 columns]>
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-49204f0a12cf> in <module>()
80 
81 #imputer = imputer(missing_values="NaN", strategy="mean", axis=0)
---> 82 train_x, train_y = preprocess_df(main_df)
83 validation_x, validation_y = preprocess_df(validation_main_df)
84 
2 frames
<ipython-input-10-49204f0a12cf> in preprocess_df(df)
28             df[col] = df[col].pct_change() 
29             df.dropna(inplace=True) 
---> 30             df[col] = preprocessing.scale(df[col].values) 
31 
32     df.dropna(inplace=True)
/usr/local/lib/python3.6/dist-packages/sklearn/preprocessing/_data.py in scale(X, axis, with_mean, with_std, copy)
140     X = check_array(X, accept_sparse='csc', copy=copy, ensure_2d=False,
141                     estimator='the scale function', dtype=FLOAT_DTYPES,
--> 142                     force_all_finite='allow-nan')
143     if sparse.issparse(X):
144         if with_mean:
/usr/local/lib/python3.6/dist-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
584                              " minimum of %d is required%s."
585                              % (n_samples, array.shape, ensure_min_samples,
--> 586                                 context))
587 
588     if ensure_min_features > 0 and array.ndim == 2:
ValueError: Found array with 0 sample(s) (shape=(0,)) while a minimum of 1 is required by the scale function.

当我删除第 84 行的"#"("imputer = imputer(missing_values="NaN", strategy="mean", axis=0)"),返回答案:">name"imputer 'is not defined'。问题是,我不知道如何定义这个"推算者"..

就像 Joe 上面说的,根据交给imputer调用的参数,我猜这是这个 scikit-learn 类的实例:

https://scikit-learn.org/0.16/modules/generated/sklearn.preprocessing.Imputer.html

从scikit-learn版本0.20开始,这个类现在已经被Joe发现的SimpleImputer类所取代:

https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html

因此,如果您从其他地方获得此代码,则其他来源可能将旧的preprocessing.Imputer类导入为小写imputer。你可以通过在代码顶部添加import sklearn.preprocessing.Imputer as imputer来做同样的事情,假设你使用的是sklearn版本<=0.20。但是,实例化似乎没有用于上述代码中的任何内容;fit从来没有被要求过,所以我认为在你做的地方注释掉它不会引起问题。(同样,我仅基于共享代码。

相反,我建议你在将main_df文件交给preprocess方法时注意它的内容。该数据中有一些列(pandas.Series),当它经过pct_changedropna转换时,其中没有剩余的值,这就是导致scale函数耸耸肩的原因。

最新更新