可视化类激活的热力图(keras实现)
前言
Grad_CAM :英语的全称是Gradient-weighted Class Activation Mapping,直接翻译是【梯度加权分类激活映射,简单说就是用CNN做图像分类的时候,到底是根据图像的哪里来判断属于这个分类的,给明确映射出来。
经典论文:Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
下载链接:https://arxiv.org/abs/1610.02391
实现流程
求图像经过特征提取后最后一次卷积后得到的特征图(也就是VGG16 conv5_3的特征图(7x7x512))
512张feature map在全连接层分类的权重肯定不同,利用反向传播求出每张特征图的权重。注意cam和Grad-cam的不同就在于求每张特征图权重的方式。其他流程都一样
用每张特征图乘以权重得到带权重的特征图(7x7x512),在第三维求均值得到7x7的map(np.mean(axis=-1)),relu激活,归一化处理(避免有些值不在0-255范围内)。
该步最重要的是relu激活(relu只保留大于0的值),relu后只保留该类别有用的特征。正数认为是该类别有用的特征,负数是其他类别的特征(或无用特征)。如下图,假设某类别最后加权后为0.8965,类别值越大则是该类别的概率就越高,那么属于该类别的特征既为wx值大于0的特征。小于0的特征可能是其他类的特征。通俗理解,假如图像中出现一个猫头,那么该特征在猫类别中为正特征,在狗类别中为负特征,要增加猫的置信度,降低狗的置信度。w 1 x 1 + w 2 x 2 + ⋯ + w n x n = 0.8965 w_1x_1 + w_2x_2 + \dots + w_nx_n=0.8965 w 1 x 1 + w 2 x 2 + ⋯ + w n x n = 0 . 8 9 6 5
假如不加relu激活的话,heatmap代表着多类别的特征,论文中是这样概述的:如果没有relu,定位图谱显示的不仅仅是某一类的特征。而是所有类别的特征。
将处理后的heatmap放缩到图像尺寸大小,便于与图像加权
代码详解
添加依赖库
import os, cv2, random
import numpy as np
import matplotlib. pyplot as plt
import seaborn as sns
% matplotlib inline
from keras. models import Sequential
from keras. layers import Input, Dense, Conv2D, MaxPool2D , GlobalAveragePooling2D, GlobalMaxPooling2D
from keras. optimizers import Adam
from keras. callbacks import Callback, EarlyStopping, TensorBoard, ModelCheckpoint, ReduceLROnPlateau
from keras. applications. vgg16 import VGG16
from keras. models import Model
from keras. models import load_model
from keras. utils import np_utils
from keras. preprocessing. image import ImageDataGenerator
from sklearn. model_selection import train_test_split
from sklearn. metrics import accuracy_score
from keras. preprocessing import image
from keras import backend as K
K. set_image_data_format( 'channels_last' )
from PIL import Image
加载模型
saved_model = load_model( "./output/vgg16_1.h5" )
saved_model. summary( )
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 512, 512, 64) 1792
_________________________________________________________________
conv2d_2 (Conv2D) (None, 512, 512, 64) 36928
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 256, 256, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 256, 256, 128) 73856
_________________________________________________________________
conv2d_4 (Conv2D) (None, 256, 256, 128) 147584
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 128, 128, 128) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 128, 128, 256) 295168
_________________________________________________________________
conv2d_6 (Conv2D) (None, 128, 128, 256) 590080
_________________________________________________________________
conv2d_7 (Conv2D) (None, 128, 128, 256) 590080
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 64, 64, 256) 0
_________________________________________________________________
conv2d_8 (Conv2D) (None, 64, 64, 512) 1180160
_________________________________________________________________
conv2d_9 (Conv2D) (None, 64, 64, 512) 2359808
_________________________________________________________________
conv2d_10 (Conv2D) (None, 64, 64, 512) 2359808
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 32, 32, 512) 0
_________________________________________________________________
conv2d_11 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
conv2d_12 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
conv2d_13 (Conv2D) (None, 32, 32, 512) 2359808
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 16, 16, 512) 0
_________________________________________________________________
global_average_pooling2d_1 ( (None, 512) 0
_________________________________________________________________
dense_1 (Dense) (None, 7) 3591
=================================================================
Total params: 14,718,279
Trainable params: 14,718,279
Non-trainable params: 0
_________________________________________________________________
定义源路径和目标路径
name = '299-37-type7.jpg'
img_path = './Clip_sample/' + name
save_path = './heatmaps/' + name
加载图像
img = image. load_img( img_path, target_size= ( 512 , 512 ) )
img = np. asarray( img)
plt. imshow( img)
img = np. expand_dims( img, axis= 0 )
预测图片类别并获取特征图
output = saved_model. predict( img)
predict = np. array( output[ 0 ] )
heisemei_output = saved_model. output[ : , np. argmax( predict) ]
last_conv_layer = saved_model. get_layer( 'conv2d_13' )
print ( np. argmax( predict) )
6
grads = K. gradients( heisemei_output, last_conv_layer. output) [ 0 ]
pooled_grads = K. mean( grads, axis= ( 0 , 1 , 2 ) )
iterate = K. function( [ saved_model. input ] , [ pooled_grads, last_conv_layer. output[ 0 ] ] )
pooled_grads_value, conv_layer_output_value = iterate( [ img] )
for i in range ( 512 ) :
conv_layer_output_value[ : , : , i] *= pooled_grads_value[ i]
heatmap = np. mean( conv_layer_output_value, axis= - 1 )
特征图可视化
heatmap = np. maximum( heatmap, 0 )
heatmap /= np. max ( heatmap)
plt. imshow( heatmap)
热力图与原图进行叠加
img = cv2. imread( img_path)
heatmap = cv2. resize( heatmap, ( img. shape[ 1 ] , img. shape[ 0 ] ) )
heatmap = np. uint8( 255 * heatmap)
heatmap = cv2. applyColorMap( heatmap, cv2. COLORMAP_JET)
superimposed_img = heatmap * 0.2 + img
print ( superimposed_img. shape)
(512, 512, 3)
存储热力图
cv2. imwrite( save_path, superimposed_img)
print ( "success!" )
success!
最终结果
思考总结
如果能够利用CAM对神经网络的关注机制进行理解,是不是能够更快地帮助人们找到重点,理解一些原本不易发现的内容,也许吧??