我必须使用tensorflow和keras通过jupyter笔记本电脑用python构建一个机器学习模型。我有一个1000张图片的数据集。其中800个我想用于训练模型,200个用于测试和验证。这是一个性别和年龄预测模型。现在我该如何导入我的数据集,或者我该如何在upyter笔记本或谷歌colab中写入路径来导入数据集。
我所做的是为我的项目导入包。
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.utils import to_categorical, plot_model
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import BatchNormalization, Conv2D, MaxPooling2D, Activation, Flatten, Dropout, Dense
from tensorflow.keras import backend as K
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
import numpy as np
import random
import cv2
import os
import glob
import pandas as pd
致以亲切的问候。
如果数据集在本地系统中,则有两种方法可以在googlecolab中上传数据集。
- 您可以将数据集上传到Google驱动器
共享文件的最简单方法是将Google Drive安装在Google Colab笔记本中。
为此,请在代码单元中运行以下操作:
from google.colab import drive
drive.mount('/content/drive')
它将要求您访问ALLOW"的链接;谷歌文件流";访问您的驱动器。之后,将显示一个需要在Colab的笔记本中输入的长字母数字身份验证代码。
之后,您的驱动器文件将被安装,您可以使用侧面板中的文件浏览器进行浏览。
- 您可以通过浏览本地文件系统手动上传文件
用这种方法上传需要更长的时间。
from google.colab import files
uploaded = files.upload()
这里有两个例子供您参考:
- https://colab.research.google.com/drive/1srw_HFWQ2SMgmWIawucXfusGzrj1_U0q
- 带jpg导入jupyter笔记本的数据集
这里我在Tensorflow中以一种简单的方式解释了如何直接从TXT文件加载图像和标签。希望这能帮助到你。下面的代码说明了我是如何做到这一点的。然而,这并不意味着这是最好的方法,而且这种方法将有助于其他步骤。
例如,我在单个整数值{0,1}中加载标签,而文档使用单个热向量[0,1]。
#Learning how to import images and labels from a TXT file
#
#TXT file format
#
#path/to/imagefile_1 label_1
#path/to/imagefile_2 label_2
#... ...
#where label_X is either {0,1}
#Importing Libraries
import os
import tensorflow as tf
import matplotlib.pyplot as plt
from tensorflow.python.framework import ops
from tensorflow.python.framework import dtypes
#File containing the path to images and the labels [path/to/images label]
filename = '/path/to/List.txt'
#Lists where to store the paths and labels
filenames = []
labels = []
#Reading file and extracting paths and labels
with open(filename, 'r') as File:
infoFile = File.readlines() #Reading all the lines from File
for line in infoFile: #Reading line-by-line
words = line.split() #Splitting lines in words using space character as separator
filenames.append(words[0])
labels.append(int(words[1]))
NumFiles = len(filenames)
#Converting filenames and labels into tensors
tfilenames = ops.convert_to_tensor(filenames, dtype=dtypes.string)
tlabels = ops.convert_to_tensor(labels, dtype=dtypes.int32)
#Creating a queue which contains the list of files to read and the value of the labels
filename_queue = tf.train.slice_input_producer([tfilenames, tlabels], num_epochs=10, shuffle=True, capacity=NumFiles)
#Reading the image files and decoding them
rawIm= tf.read_file(filename_queue[0])
decodedIm = tf.image.decode_png(rawIm) # png or jpg decoder
#Extracting the labels queue
label_queue = filename_queue[1]
#Initializing Global and Local Variables so we avoid warnings and errors
init_op = tf.group(tf.local_variables_initializer() ,tf.global_variables_initializer())
#Creating an InteractiveSession so we can run in iPython
sess = tf.InteractiveSession()
with sess.as_default():
sess.run(init_op)
# Start populating the filename queue.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(coord=coord)
for i in range(NumFiles): #length of your filenames list
nm, image, lb = sess.run([filename_queue[0], decodedIm, label_queue])
print image.shape
print nm
print lb
#Showing the current image
plt.imshow(image)
plt.show()
coord.request_stop()
coord.join(threads)
如果您使用panda来定位CSV文件,请尝试提供完整路径
df = pd.read_csv(r"C:UsersmaheDesktophomeprices.csv")
用这种方式或
import matplotlib.pyplot as plt
import os
import cv2
from tqdm import tqdm
DATADIR = "X:/Datasets/PetImages" #(give your full path)