在很多书籍或者博客中介绍代码案例的时候,用到的MNIST数据集都是在代码中直接下载使用,这样做可以直接运行不用考虑每个人机器的情况,但是存在着数据集可能无法下载、运行处理速度慢的弊端。
故本博客将给出将本地下载好的MNIST数据集解压使用的代码。它能根据需要给定是否将数据展开成一维数组、数据归一化、one-hot编码 的参数,便于我们进行训练。
One-Hot编码是分类变量作为二进制向量的表示。这首先要求将分类值映射到整数值。然后,每个整数值被表示为二进制向量,除了整数的索引之外,它都是零值,它被标记为1。
import numpy as np
import os
import gzip
import pickle
# 定义加载数据的函数,data_folder为保存gz数据的文件夹,该文件夹下有4个文件
# 'train-labels-idx1-ubyte.gz', 'train-images-idx3-ubyte.gz',
# 't10k-labels-idx1-ubyte.gz', 't10k-images-idx3-ubyte.gz'
"""读入MNIST数据集
Parameters
----------
normalize : 将图像的像素值正规化为0.0~1.0
one_hot_label :
one_hot_label为True的情况下,标签作为one-hot数组返回
one-hot数组是指[0,0,1,0,0,0,0,0,0,0]这样的数组
flatten : 是否将图像展开为一维数组
Returns
-------
(训练图像, 训练标签), (测试图像, 测试标签)
"""
train_num = 60000
test_num = 10000
img_dim = (1, 28, 28)
img_size = 784
def _change_one_hot_label(X):
T = np.zeros((X.size, 10))
for idx, row in enumerate(T):
row[X[idx]] = 1
return T
def load_data(data_folder, normalize=True, flatten=True, one_hot_label=False):
files = [
'train-labels-idx1-ubyte.gz', 'train-images-idx3-ubyte.gz',
't10k-labels-idx1-ubyte.gz', 't10k-images-idx3-ubyte.gz'
]
paths = []
for fname in files:
paths.append(os.path.join(data_folder,fname))
with gzip.open(paths[0], 'rb') as lbpath:
y_train = np.frombuffer(lbpath.read(), np.uint8, offset=8)
with gzip.open(paths[1], 'rb') as imgpath:
x_train = np.frombuffer(
imgpath.read(), np.uint8, offset=16).reshape(len(y_train), 28, 28)
with gzip.open(paths[2], 'rb') as lbpath:
y_test = np.frombuffer(lbpath.read(), np.uint8, offset=8)
with gzip.open(paths[3], 'rb') as imgpath:
x_test = np.frombuffer(
imgpath.read(), np.uint8, offset=16).reshape(len(y_test), 28, 28)
if normalize:
x_train = x_train.astype(np.float32) / 255.0 # 归一化处理
x_test = x_test.astype(np.float32) / 255.0
if one_hot_label:
y_train = _change_one_hot_label(y_train)
y_test = _change_one_hot_label(y_test)
if flatten:
x_train = x_train.reshape(60000, 784)
x_test = x_test.reshape(10000, 784)
return (x_train, y_train), (x_test, y_test)
斋藤康毅. 《深度学习入门——基于python的理论与实现》[M]. 2016
https://blog.csdn.net/AugustMe/article/details/90604473