备注
点击此处下载完整示例代码
简介 || 张量 || Autograd || 模型构建 || TensorBoard 支持 || 训练模型 || 模型理解
PyTorch TensorBoard 支持
创建于:2025 年 4 月 1 日 | 最后更新:2025 年 4 月 1 日 | 最后验证:2024 年 11 月 5 日
按照下面的视频或 YouTube 上的视频进行操作。
开始之前
运行本教程之前,您需要安装 PyTorch、TorchVision、Matplotlib 和 TensorBoard。
使用 conda
:
conda install pytorch torchvision -c pytorch
conda install matplotlib tensorboard
使用 pip
:
pip install torch torchvision matplotlib tensorboard
在安装了依赖项后,请重新启动您安装它们的 Python 环境中的这个笔记本。
简介
在这个笔记本中,我们将训练 LeNet-5 的一个变体,以对抗 Fashion-MNIST 数据集。Fashion-MNIST 是一组图像瓷砖,描绘了各种服装,有十个类别标签表示所描绘的服装类型。
# PyTorch model and training necessities
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
# Image datasets and image manipulation
import torchvision
import torchvision.transforms as transforms
# Image display
import matplotlib.pyplot as plt
import numpy as np
# PyTorch TensorBoard support
from torch.utils.tensorboard import SummaryWriter
# In case you are using an environment that has TensorFlow installed,
# such as Google Colab, uncomment the following code to avoid
# a bug with saving embeddings to your TensorBoard directory
# import tensorflow as tf
# import tensorboard as tb
# tf.io.gfile = tb.compat.tensorflow_stub.io.gfile
在 TensorBoard 中显示图像
让我们从将我们的数据集的样本图像添加到 TensorBoard 开始:
# Gather datasets and prepare them for consumption
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))])
# Store separate training and validations splits in ./data
training_set = torchvision.datasets.FashionMNIST('./data',
download=True,
train=True,
transform=transform)
validation_set = torchvision.datasets.FashionMNIST('./data',
download=True,
train=False,
transform=transform)
training_loader = torch.utils.data.DataLoader(training_set,
batch_size=4,
shuffle=True,
num_workers=2)
validation_loader = torch.utils.data.DataLoader(validation_set,
batch_size=4,
shuffle=False,
num_workers=2)
# Class labels
classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot')
# Helper function for inline image display
def matplotlib_imshow(img, one_channel=False):
if one_channel:
img = img.mean(dim=0)
img = img / 2 + 0.5 # unnormalize
npimg = img.numpy()
if one_channel:
plt.imshow(npimg, cmap="Greys")
else:
plt.imshow(np.transpose(npimg, (1, 2, 0)))
# Extract a batch of 4 images
dataiter = iter(training_loader)
images, labels = next(dataiter)
# Create a grid from the images and show them
img_grid = torchvision.utils.make_grid(images)
matplotlib_imshow(img_grid, one_channel=True)
如上所述,我们使用了 TorchVision 和 Matplotlib 来创建输入数据的 minibatch 的可视化网格。下面,我们使用 add_image()
在 SummaryWriter
上调用以记录图像供 TensorBoard 消费,并且我们还调用 flush()
以确保它立即写入磁盘。
# Default log_dir argument is "runs" - but it's good to be specific
# torch.utils.tensorboard.SummaryWriter is imported above
writer = SummaryWriter('runs/fashion_mnist_experiment_1')
# Write image data to TensorBoard log dir
writer.add_image('Four Fashion-MNIST Images', img_grid)
writer.flush()
# To view, start TensorBoard on the command line with:
# tensorboard --logdir=runs
# ...and open a browser tab to http://localhost:6006/
如果你在命令行启动 TensorBoard 并在新浏览器标签页中打开它(通常为 localhost:6006),你应该能在 IMAGES 标签页下看到图像网格。
绘制标量以可视化训练
TensorBoard 有助于跟踪训练的进度和效果。下面,我们将运行一个训练循环,跟踪一些指标,并将数据保存供 TensorBoard 使用。
让我们定义一个模型来分类我们的图像块,以及用于训练的优化器和损失函数:
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 4 * 4, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 4 * 4)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
现在让我们训练一个单次迭代,并每 1000 个批次评估训练集和验证集的损失:
print(len(validation_loader))
for epoch in range(1): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(training_loader, 0):
# basic training loop
inputs, labels = data
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 1000 == 999: # Every 1000 mini-batches...
print('Batch {}'.format(i + 1))
# Check against the validation set
running_vloss = 0.0
# In evaluation mode some model specific operations can be omitted eg. dropout layer
net.train(False) # Switching to evaluation mode, eg. turning off regularisation
for j, vdata in enumerate(validation_loader, 0):
vinputs, vlabels = vdata
voutputs = net(vinputs)
vloss = criterion(voutputs, vlabels)
running_vloss += vloss.item()
net.train(True) # Switching back to training mode, eg. turning on regularisation
avg_loss = running_loss / 1000
avg_vloss = running_vloss / len(validation_loader)
# Log the running loss averaged per batch
writer.add_scalars('Training vs. Validation Loss',
{ 'Training' : avg_loss, 'Validation' : avg_vloss },
epoch * len(training_loader) + i)
running_loss = 0.0
print('Finished Training')
writer.flush()
切换到您的 TensorBoard,并查看“标量”标签。
可视化您的模型
TensorBoard 还可以用来检查您模型中的数据流。为此,请使用带有模型和样本输入的 add_graph()
方法:
# Again, grab a single mini-batch of images
dataiter = iter(training_loader)
images, labels = next(dataiter)
# add_graph() will trace the sample input through your model,
# and render it as a graph.
writer.add_graph(net, images)
writer.flush()
当你切换到 TensorBoard 时,你应该能看到一个“图形”标签。双击“NET”节点,以查看模型内部的层和数据流。
使用嵌入可视化您的数据集
我们使用的 28x28 图像瓦片可以表示为 784 维向量(28 * 28 = 784)。将此投影到低维表示可能很有教育意义。 add_embedding()
方法将数据集投影到具有最高方差的前三个维度,并以交互式 3D 图表的形式显示它们。 add_embedding()
方法自动执行此操作,通过投影到具有最高方差的前三个维度。
下面,我们将从我们的数据中抽取一个样本,并生成这样的嵌入:
# Select a random subset of data and corresponding labels
def select_n_random(data, labels, n=100):
assert len(data) == len(labels)
perm = torch.randperm(len(data))
return data[perm][:n], labels[perm][:n]
# Extract a random subset of data
images, labels = select_n_random(training_set.data, training_set.targets)
# get the class labels for each image
class_labels = [classes[label] for label in labels]
# log embeddings
features = images.view(-1, 28 * 28)
writer.add_embedding(features,
metadata=class_labels,
label_img=images.unsqueeze(1))
writer.flush()
writer.close()
现在切换到 TensorBoard 并选择“投影”标签,你应该能看到投影的 3D 表示。你可以旋转和缩放模型。在大规模和小规模下检查它,看看你是否能在投影数据和标签聚类中找到模式。
为了更好的可见性,建议:
从左侧的“按颜色选择”下拉菜单中选择“标签”。
切换顶部的夜间模式图标,将浅色图像放置在深色背景上。
其他资源 ¶
想了解更多信息,请查看:
PyTorch 文档关于 torch.utils.tensorboard.SummaryWriter
PyTorch.org 教程中的 Tensorboard 教程内容
想了解更多关于 TensorBoard 的信息,请参阅 TensorBoard 文档
脚本总运行时间:(0 分钟 0.000 秒)