如何在Python中加载我自己的数据或在线数据集以训练CNN或自动编码器



我在python中加载数据集时遇到了一个简单的问题。我想定义名为 loading_dataset() 的函数以在训练自动编码器中使用它我的代码是

import matplotlib
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt
from urllib import urlretrieve
import cPickle as pickle
import os
import gzip
rom urllib import urlretrieve
import cPickle as pickle
import os
import gzip
import matplotlib.cm as cm
import theano
import lasagne
from lasagne import layers
from lasagne.updates import nesterov_momentum
from nolearn.lasagne import NeuralNet
from nolearn.lasagne import visualize
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
#############################I tried to load data from open source
def load_dataset():
    url = 'ftp://ftp nrg.wustl.edu/data/oasis_cross-sectional_disc2.tar.gz'
    filename ='oasis_cross-sectional_disc2.tar.gz'
    if not os.path.exists(filename):
        print("Downloading MNIST dataset...")
        urlretrieve(url, filename)
    with gzip.open(filename, 'rb') as f:
        data = pickle.load(f)
    X_train, y_train = data[0]
        X_val, y_val = data[1]
        X_test, y_test = data[2]
        X_train = X_train.reshape((-1, 1, 28, 28))
        X_val = X_val.reshape((-1, 1, 28, 28))
        X_test = X_test.reshape((-1, 1, 28, 28))
        y_train = y_train.astype(np.uint8)
        y_val = y_val.astype(np.uint8)
        y_test = y_test.astype(np.uint8)
        return X_train, y_train, X_val, y_val, X_test, y_test
X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()

正在下载 MNIST 数据集...

Traceback (most recent call last):
  File "<pyshell#46>", line 1, in <module>
    X_train, y_train, X_val, y_val, X_test, y_test = load_dataset()
  File "<pyshell#45>", line 6, in load_dataset
    urlretrieve(url, filename)
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 98, in urlretrieve
    return opener.retrieve(url, filename, reporthook, data)
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 245, in retrieve
    fp = self.open(url, data)
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 213, in open
    return getattr(self, name)(url)
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib.py", line 526, in open_ftp
    host = socket.gethostbyname(host)
IOError: [Errno socket error] [Errno 8] nodename nor servname provided, or not known

出现此错误

我还尝试使用此代码从桌面加载数据对于 OS.walk(pat) 中的路径、目录、文件:对于文件中的文件名: fullpath = os.path.join(path, filename) 使用 open(fullpath, 'r') 作为 f: s=np.load(f) 数据 = f.read() 打印数据

但我无法将数据加载为X_train、y_train、X_val、y_val、X_test y_test的值我不知道我是否应该压缩 .pkl 中的数据集.gz或使用不同的函数来加载数据你能帮我吗?

如果您可以使用 keras 构建网络,这是加载 mnist 数据集的方法

import keras
from keras.datasets import mnist
from keras.layers import Dense, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.models import Sequential

加载 MNIST 数据集,该数据集已经为我们拆分为训练集和测试集

(x_train, y_train), (x_test, y_test) = mnist.load_data()

如果收到下载数据集的任何错误,从 https://s3.amazonaws.com/img-datasets/mnist.npz 下载数据集并将其放入名为~/.keras/dataset的文件夹中

相关内容

  • 没有找到相关文章

最新更新