在sklearn中使用datasets.fetch_mldata()时发生IO错误

我导入了fetch_mldata从sklearn.datasets导入fetch_mldata并称为：

dataset = fetch_mldata('MNIST original')

但我得到的是：

> Traceback (most recent call last):   File "<stdin>", line 1, in
> <module>   File
> "C:UsersJacobDevelopmentAnacondalibsite-packagesspyderlibwidgetsexternalshellsitecustomize.py",
> line 540, in runfile
>     execfile(filename, namespace)   File "C:/Users/Jacob/Documents/Dropbox/Technion/Semester 8/Machine
> learning/Demo3/Demo3.py", line 75, in <module>
>     dataset = fetch_mldata('MNIST original')    File "C:UsersJacobDevelopmentAnacondalibsite-packagessklearndatasetsmldata.py",
> line 158, in fetch_mldata
>     matlab_dict = io.loadmat(matlab_file, struct_as_record=True)   File
> "C:UsersJacobDevelopmentAnacondalibsite-packagesscipyiomatlabmio.py",
> line 126, in loadmat
>     matfile_dict = MR.get_variables(variable_names)   File "C:UsersJacobDevelopmentAnacondalibsite-packagesscipyiomatlabmio5.py",
> line 288, in get_variables
>     res = self.read_var_array(hdr, process)   File "C:UsersJacobDevelopmentAnacondalibsite-packagesscipyiomatlabmio5.py",
> line 248, in read_var_array
>     return self._matrix_reader.array_from_header(header, process)   File "mio5_utils.pyx", line 616, in
> scipy.io.matlab.mio5_utils.VarReader5.array_from_header
> (scipyiomatlabmio5_utils.c:5903)   File "mio5_utils.pyx", line 645,
> in scipy.io.matlab.mio5_utils.VarReader5.array_from_header
> (scipyiomatlabmio5_utils.c:5332)   File "mio5_utils.pyx", line 713,
> in scipy.io.matlab.mio5_utils.VarReader5.read_real_complex
> (scipyiomatlabmio5_utils.c:6323)   File "mio5_utils.pyx", line 417,
> in scipy.io.matlab.mio5_utils.VarReader5.read_numeric
> (scipyiomatlabmio5_utils.c:3873)   File "mio5_utils.pyx", line 353,
> in scipy.io.matlab.mio5_utils.VarReader5.read_element
> (scipyiomatlabmio5_utils.c:3595)   File "streams.pyx", line 324, in
> scipy.io.matlab.streams.FileStream.read_string
> (scipyiomatlabstreams.c:4343) IOError: could not read bytes

我试着下载sklearn的新版本，但没有帮助。我对这个问题又发了一条帖子，但那里提供的解决方案对我没有帮助。如何在sklearn中使用datasets.fetch_mldata（）？

有什么想法吗？

供您/他人参考，我收到了几乎相同的错误（Ubuntu），包括"IOError:无法读取字节"错误。

我刚刚在发布了一个解决方案

如何在sklearn中使用datasets.fetch_mldata（）？

简短回答-使用以下内容：

from sklearn.datasets.mldata import fetch_mldata
    data = fetch_mldata('mnist-original')
dataset = fetch_mldata('mnist-original', data_home='***')

将***（保留引号）替换为您的首选位置（数据目录）。

在我的案例中，根本原因是mnist-original.mat文件损坏。该文件已损坏，因为我在文件完全下载之前终止了Python。这在C:userTaimiscikit_learn_datamldata处留下了部分下载的mnist-original.mat。

上面的解决方案对我有效，因为它只是在一个新的位置获取了一个新副本。更直接的解决方案是找到损坏的mnist-original.mat文件，将其删除，然后尝试再次运行代码。运行代码将再次下载mnist-original.mat。完整的mnist-original.mat大小为54,142 KB，因此如果连接速度较慢，fetch_mldata()将需要几分钟才能完成。

相关内容

最新更新

热门标签：