【图像识别】训练识别Vtuber的神经网络和可视化

发布于:2023-01-22 ⋅ 阅读:(13) ⋅ 点赞:(0) ⋅ 评论:(0)

前言:这几天又看了点东西,把之前的那个网络改了一下,还加了点可视化的东西。在改的时候遇见了很多问题,在这里记录一下

之前的网络的链接:【图像识别】训练一个最最简单的AI使其识别Vtuber

本人只是本科大一非计科专业,只是在课余时间了解了一点机器学习和计算机视觉的知识,所以学的比较慢,而且可能会有比较多的错误和理解不充分

与上次相比的改动有:

1.将简单的全连接神经网络改成了自己写的卷积神经网络,之后又直接用pytorch集成的VGG16(为了便于可视化)
2.不再将训练集用作之后的验证
3.增加了可视化部分,将每批次训练的损失和准确率绘制图表可视化
4.利用集成的VGG16,将k层的特征图可视化

修改神经网络

仿照VGG16的卷积神经网络

用卷积操作提取出特征图之后用全连接神经网络进行分类,在这里用到有Conv2d(2d卷积操作),后面常用的参数分别为输入通道数(其实就是对于图像这个三维数组,第三维度(除去图像长宽的维度)的尺寸),输出通道数(就是每次卷积输出几个通道,或者说就是用几组卷积核去卷积),卷积核尺寸步长边界填充
除此之外还用到了最大池化(Maxpool2d)和随机失活(Dropout2d)

函数参考:
Conv2d
Maxpool2d
Dropout

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1_1=nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3,stride=1)
        self.conv1_2=nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3,stride=1)
        self.maxpool1=nn.MaxPool2d(2,2)
        self.conv2_1=nn.Conv2d(64,128,3,1)
        self.conv2_2=nn.Conv2d(128,128,3,1)
        self.maxpool2=nn.MaxPool2d(2,2)
        self.conv3_1=nn.Conv2d(128,256,3,1)
        self.conv3_2=nn.Conv2d(256,256,3,1)
        self.conv3_3=nn.Conv2d(256,256,3,1)
        self.maxpool3=nn.MaxPool2d(2,2)
        self.conv4_1=nn.Conv2d(256,512,3,1)
        self.conv4_2=nn.Conv2d(512,512,3,1)
        self.conv4_3=nn.Conv2d(512,512,3,1)
        self.maxpool4=nn.MaxPool2d(2,2)
        self.conv5_1=nn.Conv2d(512,512,3,1)
        self.conv5_2=nn.Conv2d(512,512,3,1)
        self.conv5_3=nn.Conv2d(512,4096,3,1)
        self.maxpool5=nn.MaxPool2d(2,2)
        self.Dropout1=nn.Dropout(0.25)
        self.Dropout2=nn.Dropout(0.5)
        # 这里是随机失活层
        self.fc1=nn.Linear(4096,4096)
        self.relu1=nn.ReLU()
        self.fc2=nn.Linear(4096,1000)
        self.relu2=nn.ReLU()
        self.fc3=nn.Linear(1000,5)
    def forward(self, x):
        # 下面这部分就是设置神经网络的结构
        x = self.conv1_1(x.float())
        x = self.conv1_2(x.float())
        x = self.maxpool1(x.float())
        x = self.conv2_1(x.float())
        x = self.conv2_2(x.float())
        x = self.maxpool2(x.float())
        x = self.conv3_1(x.float())
        x = self.conv3_2(x.float())
        x = self.conv3_3(x.float())
        x = self.maxpool3(x.float())
        x = self.conv4_1(x.float())
        x = self.conv4_2(x.float())
        x = self.conv4_3(x.float())
        x = self.maxpool4(x.float())
        x = self.conv5_1(x.float())
        x = self.conv5_2(x.float())
        x = self.conv5_3(x.float())
        x = self.maxpool5(x.float())
        x = self.Dropout1(x.float())
        x = torch.flatten(x.float(),1)
        x = self.fc1(x.float())
        x = self.relu1(x.float())
        x = self.Dropout2(x.float())
        x = self.fc2(x.float())
        x = self.relu2(x.float())
        x = self.fc3(x.float())
        output = x
        return output

使用Pytorch集成的VGG16网络

初始化模型时用这部分代码,会自动下载所需的模型

model = models.vgg16(num_classes=5)

需要的头文件为

import torchvision.models as models

tips:vgg16可以传入自定义参数,我这里的num_classes就是需要分类的个数
详细参数可以直接在VScode按住crtl点击vgg16,直接看vgg16的定义即可

残差和准确率可视化

用subplot分图然后plot画出来就行了
这部分我放在test集里面了(number1和number2是外面初始化传进这里面控制x轴用的)

def test_loop(testset_dataloader, model, loss_fn,number1,number2):
    size = len(testset_dataloader)
    num_batches = len(testset_dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for batch,(X, y, i) in enumerate(trainset_dataloader):
            pred = model(X.float())
            test_loss += loss_fn(pred, y.to(device)).item()
            correct += (pred.argmax(1) == y.to(device)).type(torch.float).sum().item()

            if batch % 1 == 0:
                number1+=1
                loss = loss_fn(pred,y.to(device)).item()
                correct_plt = correct/8
                # 可视化loss
                ax1 = plt.subplot(1,2,1)
                plt.plot(number1,loss,'o',lw = 5)

            # softmax_0=nn.Softmax(dim = 1)
            # print(f"机器对每组判断的准确率的判定:\n{softmax_0(pred)}")
            # print(softmax_0(pred)[1][1])
            print(f"实际上是:\n{y}\n 机器预测的是:\n{pred.argmax(1)}")

    test_loss /= num_batches
    correct /= size*num_batches
    plt.subplot(1,2,2)
    number2+=1
    plt.plot(number2,correct,'o',lw = 5)
    plt.show()
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

效果图(使用集成的vgg16):
在这里插入图片描述
左边是残差,右边是每个epoch的准确率

效果图(自己仿照vgg16写的网络):
在这里插入图片描述

K层特征图可视化

想知道vgg16中第k层具体是什么可以print(model)或者等实例化之后去资源里面看vgg16的结构
如图:
在这里插入图片描述
这部分是参考着别人的代码
在这里k值可以根据实际需要进行修改
还有图像路径(image_dir = r"D:\Py workprojects\the new funny file\data\train\AzumaSeren\0.png")
在这部分:

if __name__ ==  '__main__':

    image_dir = r"D:\\Py workprojects\\the new funny file\\data\\train\\AzumaSeren\\0.png"
    # 定义提取第几层的feature map
    k = 1
    image_info = get_image_info(image_dir)

    model_layer= list(model.children())
    model_layer=model_layer[0]#这里选择model的第一个Sequential()

    feature_map = get_k_layer_feature_map(model_layer, k, image_info)
    show_feature_map(feature_map)

如果不初始化的话无法使用
其次我这里是读取的训练30个epoch的参数

参考链接:特征图可视化

详细代码:

import torch
from torchvision import models, transforms
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import scipy.misc
model = models.vgg16(num_classes=5)
model.load_state_dict(torch.load('test_2'))
model.eval()
import torchvision.models as models

#1.模型查看
# print(model)#可以看出网络一共有3层,两个Sequential()+avgpool
# model_features = list(model.children())
# print(model_features[0][3])#取第0层Sequential()中的第四层
# for index,layer in enumerate(model_features[0]):
#     print(layer)
#2. 导入数据
# 以RGB格式打开图像
# Pytorch DataLoader就是使用PIL所读取的图像格式
# 建议就用这种方法读取图像,当读入灰度图像时convert('')
def get_image_info(image_dir):
    image_info = Image.open(image_dir).convert('RGB')#是一幅图片
    # 数据预处理方法
    image_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
    image_info = image_transform(image_info)#torch.Size([3, 224, 224])
    image_info = image_info.unsqueeze(0)#torch.Size([1, 3, 224, 224])因为model的输入要求是4维,所以变成4维
    return image_info#变成tensor数据


#2. 获取第k层的特征图
'''
args:
k:定义提取第几层的feature map
x:图片的tensor
model_layer:是一个Sequential()特征层
'''
def get_k_layer_feature_map(model_layer, k, x):
    with torch.no_grad():
        for index, layer in enumerate(model_layer):#model的第一个Sequential()是有多层,所以遍历
            x = layer(x)#torch.Size([1, 64, 55, 55])生成了64个通道
            if k == index:
                return x


#  可视化特征图
def show_feature_map(feature_map):#feature_map=torch.Size([1, 64, 55, 55]),feature_map[0].shape=torch.Size([64, 55, 55])
                                                                         # feature_map[2].shape     out of bounds
    feature_map = feature_map.squeeze(0)#压缩成torch.Size([64, 55, 55])
    
    #以下4行,通过双线性插值的方式改变保存图像的大小
    feature_map =feature_map.view(1,feature_map.shape[0],feature_map.shape[1],feature_map.shape[2])#(1,64,55,55)
    upsample = torch.nn.UpsamplingBilinear2d(size=(256,256))#这里进行调整大小
    feature_map = upsample(feature_map)
    feature_map = feature_map.view(feature_map.shape[1],feature_map.shape[2],feature_map.shape[3])
    
    feature_map_num = feature_map.shape[0]#返回通道数
    row_num = np.ceil(np.sqrt(feature_map_num))#8
    row_num = int(row_num)
    plt.figure()
    for index in range(1, feature_map_num + 1):#通过遍历的方式,将64个通道的tensor拿出

        plt.subplot(row_num , row_num , index )
        plt.imshow(feature_map[index - 1], cmap='gray')#feature_map[0].shape=torch.Size([55, 55])
        #将上行代码替换成,可显示彩色 plt.imshow(transforms.ToPILImage()(feature_map[index - 1]))#feature_map[0].shape=torch.Size([55, 55])
        plt.axis('off')
        # scipy.misc.imsave( 'feature_map_save//'+str(index) + ".png", feature_map[index - 1])
    plt.show()


if __name__ ==  '__main__':

    image_dir = r"D:\\Py workprojects\\the new funny file\\data\\train\\AzumaSeren\\0.png"
    # 定义提取第几层的feature map
    k = 1
    image_info = get_image_info(image_dir)

    model_layer= list(model.children())
    model_layer=model_layer[0]#这里选择model的第一个Sequential()

    feature_map = get_k_layer_feature_map(model_layer, k, image_info)
    show_feature_map(feature_map)

第1层特征响应图
在这里插入图片描述
第10层特征响应图
在这里插入图片描述

第30层特征响应图
在这里插入图片描述
原图:
在这里插入图片描述

新的验证集与准确率

30个epoch后准确率大概是80%,但是模型很难识别Aibai(艾白)和AzumaSeren(东雪莲),应该是因为她们俩都是白头发
效果图(这些是准确的):
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
在这里插入图片描述
下面是一个出错的(挺离谱的):
在这里插入图片描述
记录的内容到此就结束了,下面把代码复制一下

源代码

模型训练和残差准确可视化的代码

from http.cookiejar import LWPCookieJar
import numpy as np
from torch import Tensor, nn, relu
from skimage import io
from skimage import transform
import matplotlib.pyplot as plt
import os
import torch
import torchvision
from torch.utils.data import Dataset, DataLoader
from torchvision.transforms import transforms
from torchvision.utils import make_grid
import torchvision.models as models

class MyDataset(Dataset):
    def __init__(self, root_dir, names_file, transform=None):
        self.root_dir = root_dir
        self.names_file = names_file
        self.transform = transform
        self.size = 0
        self.names_list = []

        if not os.path.isfile(self.names_file):
            print(self.names_file + "does not exist!")
        file = open(self.names_file)
        for f in file:
            self.names_list.append(f)
            self.size += 1

    def __len__(self):
        return self.size

    def __getitem__(self, idx):
        image_path = self.root_dir + self.names_list[idx].split(" ")[0]
        if not os.path.isfile(image_path):
            print(image_path + "does not exist!")
            return None
        image = io.imread(image_path) # use skitimage
        label = int(self.names_list[idx].split(" ")[1])
        # 这两段是新加入的
        image_new = np.transpose(image, (2, 1, 0))
        image_last = torch.from_numpy(image_new)
        # 加入的原因是图像读取的维度不太正确,所以在初始化时直接把他改了
        sample = {"image": image_last, "label": label}
        # if self.transform:
        #     sample = self.transform(sample)

        return image_last,label,sample



transformed_trainset = MyDataset(root_dir='./data/train',
                          names_file='./data/train/labels_train.txt',
                          transform=True)
trainset_dataloader = DataLoader(dataset=transformed_trainset,
                                 batch_size=4,
                                 shuffle=True,
                                 num_workers=0)
testset_dataloader = DataLoader(dataset=transformed_trainset,
                                 batch_size=4,
                                 shuffle=True,
                                 num_workers=0)

device = 'cpu'
model = models.vgg16(num_classes=5)
# model.load_state_dict(torch.load('data_Vtuber'))
# model.eval()
# 
learning_rate = 1e-5

plt.ion()
# 
def train_loop(trainset_dataloader, model, loss_fn, optimizer):
    size = len(trainset_dataloader)
    for batch, (X,y,i) in enumerate(trainset_dataloader):
        # Compute prediction and loss
        pred = model(X.float())
        loss = loss_fn(pred, y.to(device))

        # Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if batch % 1 == 0:
            loss, current = loss.item(), (batch+1) * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size*4:>5d}]")
            # 可视化loss
        #     plt.plot(number,loss,'o',lw = 5)
        #     plt.pause(0.001)
        # plt.show()
        # 明天把特征图给可视化了

            



def test_loop(testset_dataloader, model, loss_fn,number1,number2):
    size = len(testset_dataloader)
    num_batches = len(testset_dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for batch,(X, y, i) in enumerate(trainset_dataloader):
            pred = model(X.float())
            test_loss += loss_fn(pred, y.to(device)).item()
            correct += (pred.argmax(1) == y.to(device)).type(torch.float).sum().item()

            if batch % 1 == 0:
                number1+=1
                loss = loss_fn(pred,y.to(device)).item()
                correct_plt = correct/8
                # 可视化loss
                ax1 = plt.subplot(1,2,1)
                plt.plot(number1,loss,'o',lw = 5)

            # softmax_0=nn.Softmax(dim = 1)
            # print(f"机器对每组判断的准确率的判定:\n{softmax_0(pred)}")
            # print(softmax_0(pred)[1][1])
            print(f"实际上是:\n{y}\n 机器预测的是:\n{pred.argmax(1)}")

    test_loss /= num_batches
    correct /= size*4
    plt.subplot(1,2,2)
    number2+=1
    plt.plot(number2,correct,'o',lw = 5)
    plt.show()
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")

loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

epochs = 30
number1=0
number2=0
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(trainset_dataloader, model, loss_fn, optimizer)
    test_loop(testset_dataloader, model, loss_fn,number1,number2)
    number1+=10
    number2+=1
torch.save(model.state_dict(),'test_2')
plt.pause(1000)
print("结束咧!")

验证集的代码

from http.cookiejar import LWPCookieJar
import numpy as np
from torch import Tensor, nn, relu
from skimage import io
from skimage import transform
import matplotlib.pyplot as plt
import os
import torch
import torchvision
from torch.utils.data import Dataset, DataLoader
from torchvision.transforms import transforms
from torchvision.utils import make_grid
import torchvision.models as models

class MyDataset(Dataset):
    def __init__(self, root_dir, names_file, transform=None):
        self.root_dir = root_dir
        self.names_file = names_file
        self.transform = transform
        self.size = -1
        self.names_list = []

        if not os.path.isfile(self.names_file):
            print(self.names_file + "does not exist!")
        file = open(self.names_file)
        for f in file:
            self.names_list.append(f)
            self.size += 1

    def __len__(self):
        return self.size

    def __getitem__(self, idx):
        image_path = self.root_dir + self.names_list[idx].split(" ")[0]
        if not os.path.isfile(image_path):
            print(image_path + "does not exist!")
            return None
        image = io.imread(image_path) # use skitimage
        label = int(self.names_list[idx].split(" ")[1])
        # 这两段是新加入的
        image_new = np.transpose(image, (2, 1, 0))
        image_last = torch.from_numpy(image_new)
        # 加入的原因是图像读取的维度不太正确,所以在初始化时直接把他改了
        sample = {"image": image_last, "label": label}
        # if self.transform:
        #     sample = self.transform(sample)

        return image_last,label,sample





transformed_testset = MyDataset(root_dir='./last_data/test',
                          names_file='./last_data/test/test.txt',
                          transform=True)
testset_dataloader = DataLoader(dataset=transformed_testset,
                                 batch_size=1,
                                 shuffle=True,
                                 num_workers=0)

device = 'cpu'
model = models.vgg16(num_classes=5)
model.load_state_dict(torch.load('test_2'))
model.eval()
# 
learning_rate = 1e-5


            



def test_loop(testset_dataloader, model, loss_fn):
    size = len(testset_dataloader)
    num_batches = len(testset_dataloader)
    test_loss, correct = 0, 0

    with torch.no_grad():
        plt.figure()
        for X, y ,i in testset_dataloader:
            pred = model(X.float())
            test_loss += loss_fn(pred, y.to(device)).item()
            correct += (pred.argmax(1) == y.to(device)).type(torch.float).sum().item()

            softmax_0=nn.Softmax(dim = 1)
            result_tenor = softmax_0(pred)
            result_np = np.array(result_tenor)
            y_np = np.array(y)
            pred_np = np.array(pred.argmax(1))
            # print(result_np)
            
            if pred_np[0] ==0:
                name_Vtuber_pre = "Diana"
            elif pred_np[0] == 1:
                name_Vtuber_pre = "Wenjing"
            elif pred_np[0] == 2:
                name_Vtuber_pre = "Taffy"
            elif pred_np[0] == 3:
                name_Vtuber_pre = "Aibai"
            elif pred_np[0] == 4:
                name_Vtuber_pre = "AzumaSeren"
            
            def show_images_batch(sample_batched):
                images_batch, labels_batch =\
                sample_batched['image'], sample_batched['label']
                grid = make_grid(images_batch)
                plt.title('The prediction result of AI is:  {}'.format(name_Vtuber_pre))
                plt.imshow(grid.numpy().transpose(2, 1, 0))
                # 这一行代码不懂捏
            
            show_images_batch(i)
            plt.axis('off')
            plt.ioff()
            plt.show()

    test_loss /= num_batches*testset_dataloader.batch_size
    correct /= size*testset_dataloader.batch_size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")


loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

epochs = 1
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    test_loop(testset_dataloader, model, loss_fn)
# torch.save(model.state_dict(),'test_2')
plt.pause(1000)
print("结束咧!")

特征图可视化的代码

import torch
from torchvision import models, transforms
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import scipy.misc
model = models.vgg16(num_classes=5)
model.load_state_dict(torch.load('test_2'))
model.eval()
import torchvision.models as models

#1.模型查看
# print(model)#可以看出网络一共有3层,两个Sequential()+avgpool
# model_features = list(model.children())
# print(model_features[0][3])#取第0层Sequential()中的第四层
# for index,layer in enumerate(model_features[0]):
#     print(layer)


#2. 导入数据
# 以RGB格式打开图像
# Pytorch DataLoader就是使用PIL所读取的图像格式
# 建议就用这种方法读取图像,当读入灰度图像时convert('')
def get_image_info(image_dir):
    image_info = Image.open(image_dir).convert('RGB')#是一幅图片
    # 数据预处理方法
    image_transform = transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
    image_info = image_transform(image_info)#torch.Size([3, 224, 224])
    image_info = image_info.unsqueeze(0)#torch.Size([1, 3, 224, 224])因为model的输入要求是4维,所以变成4维
    return image_info#变成tensor数据


#2. 获取第k层的特征图
'''
args:
k:定义提取第几层的feature map
x:图片的tensor
model_layer:是一个Sequential()特征层
'''
def get_k_layer_feature_map(model_layer, k, x):
    with torch.no_grad():
        for index, layer in enumerate(model_layer):#model的第一个Sequential()是有多层,所以遍历
            x = layer(x)#torch.Size([1, 64, 55, 55])生成了64个通道
            if k == index:
                return x


#  可视化特征图
def show_feature_map(feature_map):#feature_map=torch.Size([1, 64, 55, 55]),feature_map[0].shape=torch.Size([64, 55, 55])
                                                                         # feature_map[2].shape     out of bounds
    feature_map = feature_map.squeeze(0)#压缩成torch.Size([64, 55, 55])
    
    #以下4行,通过双线性插值的方式改变保存图像的大小
    feature_map =feature_map.view(1,feature_map.shape[0],feature_map.shape[1],feature_map.shape[2])#(1,64,55,55)
    upsample = torch.nn.UpsamplingBilinear2d(size=(256,256))#这里进行调整大小
    feature_map = upsample(feature_map)
    feature_map = feature_map.view(feature_map.shape[1],feature_map.shape[2],feature_map.shape[3])
    
    feature_map_num = feature_map.shape[0]#返回通道数
    row_num = np.ceil(np.sqrt(feature_map_num))#8
    row_num = int(row_num)
    plt.figure()
    for index in range(1, feature_map_num + 1):#通过遍历的方式,将64个通道的tensor拿出

        plt.subplot(row_num , row_num , index )
        plt.imshow(feature_map[index - 1], cmap='gray')#feature_map[0].shape=torch.Size([55, 55])
        #将上行代码替换成,可显示彩色 plt.imshow(transforms.ToPILImage()(feature_map[index - 1]))#feature_map[0].shape=torch.Size([55, 55])
        plt.axis('off')
        # scipy.misc.imsave( 'feature_map_save//'+str(index) + ".png", feature_map[index - 1])
    plt.show()



if __name__ ==  '__main__':

    image_dir = r"D:\\Py workprojects\\the new funny file\\data\\train\\Wenjing\\0.png"
    # 定义提取第几层的feature map
    k = 10
    image_info = get_image_info(image_dir)

    model_layer= list(model.children())
    model_layer=model_layer[0]#这里选择model的第一个Sequential()

    feature_map = get_k_layer_feature_map(model_layer, k, image_info)
    show_feature_map(feature_map)