MNIST数据集上的特征归一化



我正在处理MNIST数据集的一个子集,我想在其中规范化数据集中样本的特征。我正在尝试以.mat文件的形式加载数据集。有人能指导我如何将.mat转换为numpy数组,这样我就可以对特征向量执行均值和标准偏差等基本操作吗?

这是我加载.mat文件并转换为numpy数组的代码:

import scipy.io
import numpy as np
train_0 = scipy.io.loadmat('data/training_data_0.mat')
train_1 = scipy.io.loadmat('data/training_data_1.mat')
test_0 = scipy.io.loadmat('data/testing_data_0.mat')
test_1 = scipy.io.loadmat('data/testing_data_1.mat')
# to return a group of the key-value
# pairs in the dictionary
result = train_0.items()
# Convert object to a list
data = list(result)
# Convert list to an array
numpyArray = np.array(data)
print(numpyArray.mean())

然而,在执行后,我得到了这个错误:

numpyArray = np.array(data)
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_bundle/pydev_umd.py", line 197, in runfile
pydev_imports.execfile(filename, global_vars, local_vars)  # execute the script
File "/Applications/PyCharm CE.app/Contents/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"n", file, 'exec'), glob, loc)
File "/Users/mish/Work/ASU/Fall20/CSE 569/main.py", line 20, in <module>
print(numpyArray.mean())
File "/usr/local/lib/python3.8/site-packages/numpy/core/_methods.py", line 160, in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims)
TypeError: can only concatenate str (not "bytes") to str

如果将元组列表(键、值(传递给numpy.array,则numpy数组已经使用了train_0['<some variable name here>']

要获得变量名称,只需使用:print(train_0.keys())

这可能回答了你的问题:将加载的mat文件转换回numpy数组

scipy.io.loadmat返回一个字典:

Returns
mat_dictdict
dictionary with variable names as keys, and loaded matrices as values.

https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html

最新更新