山西工商学院教学平台-试译宝

A Primer on Atrous Convolutions and Depth-wise Separable Convolutions
Atrous卷积和深度可分卷积的基础

阅读原文

李何风山西工商学院

时间：2024-06-12 语向：英-中类型：人工智能字数：937

A Primer on Atrous Convolutions and Depth-wise Separable Convolutions
关于空洞卷积和深度可分离卷积的入门指南
A Primer on Atrous(Dilated) and Depth-wise Separable Convolutions
关于空洞（扩张）卷积和深度可分离卷积的入门指南
What are atrous/dilated and depth-wise separable convolutions? How are the different from standard convolutions? What are their uses?
空洞（Atrous/Dilated）卷积和深度可分离卷积是什么？它们与标准卷积有何不同？它们的用途是什么？
With properties such as weight sharing and translation invariance, Convolutional layers and CNNs have become ubiquitous in Computer Vision and Image Processing tasks using deep learning methods. With that in mind, this article aims at discussing some of the developments we’ve seen in convolutional networks. Specifically we focus on two developments: Atrous(Dilated) convolutions and Depth-wise Spearable convolutions. We will see how these two types of Convolutions work, how they are different from normal convolutions and why we may want to use them.
由于具有权重共享和平移不变性等特性，卷积层和卷积神经网络（CNNs）已经成为使用深度学习方法的计算机视觉和图像处理任务中无处不在的工具。鉴于此，本文旨在讨论我们在卷积网络中看到的一些发展。特别是，我们专注于两个发展：空洞（扩张）卷积和深度可分离卷积。我们将了解这两种类型的卷积是如何工作的，它们与正常卷积有何不同，以及我们为何可能想要使用它们。
Convolutional Layer
卷积层
Before we get into the topic, let’s quickly remind ourselves how convolutional layer works. At its core, convolutional filters are simply feature extractors. What were hand crafted feature filters before are now learned through the “magic” of back-propagation. We have a kernel(weights of the conv layer) that is slid over the input feature map and at each location, element-wise multiplication followed by a summation of theproducts is performed to obtain a scalar value. The same operation is performed at each location. Fig. 1 shows this in action.
在我们进入主题之前，让我们快速回顾一下卷积层是如何工作的。其核心是，卷积滤波器实际上就是特征提取器。以前需要手工制作的特征滤波器现在通过反向传播的“魔力”来学习。我们有一个卷积核（卷积层的权重），它在输入特征图上滑动，在每个位置，进行逐元素的乘法运算，然后对乘积进行求和以获得一个标量值。在每个位置都执行相同的操作。图1展示了这一过程。
The convolutional filter detects a particular feature by sliding over the input feature map, i.e, it looks for that feature at each location. This intuitively explains the translation invariance property of Convolutions.
卷积滤波器通过在输入特征图上滑动来检测特定的特征，也就是说，它在每个位置查找该特征。这直观地解释了卷积的平移不变性属性。
Atrous(Dilated) Convolution
空洞卷积
To understand how atrous convolution differs from the standard convolution, we firs need to know what receptive field is. Receptive Field is defined as the size of the region of the input feature map that produces each output element. In the case of Fig.1, the receptive field is 3x3 as each element in the output feature map sees(uses) 3x3 input elements.
为了理解空洞卷积（atrous convolution）与标准卷积的不同之处，我们首先需要了解什么是感受野（Receptive Field）。感受野定义为产生每个输出元素的输入特征图区域的尺寸。在图1的情况下，感受野是3x3，因为输出特征图中的每个元素都看到（使用）了3x3的输入元素。
Deep CNNs use a combination of Convolutions and max-pooling. This has the disadvantage that, at each step, the spatial resolution of the feature map is halved. Implanting the resultant feature map onto the original image results in sparse feature extraction. This effect can be seen in Fig. 2. The conv. filter downsamples the input image by a factor of two. Upsampling and imposing the feature map on the image shows that the responses correspond to only 1/4th of the image locations(Sparse feature extraction).
深度卷积神经网络（Deep CNNs）结合了卷积和最大池化（max-pooling）。这有一个缺点，即在每一步中，特征图的空间分辨率都会减半。将得到的特征图映射回原始图像会导致稀疏的特征提取。这种效果可以在图2中看到。卷积滤波器将输入图像的下采样因子设为2。通过上采样并将特征图强加在图像上，可以看到响应仅对应于图像位置的1/4（稀疏特征提取）。
Atrous(Dilated) convolution fixes this problem and allows for dense feature extraction. This is achieved a new parameter called rate(r). Put simply, atrous convolution is akin to the standard convolution except that the weights of an atrous convolution kernel are spaced r locations apart, i.e., the kernel of dilated convolution layers are sparse.
空洞（Dilated）卷积解决了这个问题，并允许进行密集的特征提取。这是通过引入一个新的参数——称为扩张率（rate，用r表示）来实现的。简单来说，空洞卷积类似于标准卷积，只不过空洞卷积核的权重间隔了r个位置，也就是说，扩张卷积层的卷积核是稀疏的。
Fig. 3(a) shows a standard kernel and Fig. 3(b) a Dilated 3x3 kernel with a rate r = 2. By controlling the rate parameter, we can arbitrarily control the receptive fields of the conv. layer. This allows the conv. filter to look at larger areas of the input(receptive field) without a decrease in the spatial resolution or increase in the kernel size. Fig. 4 shows a dilated convolutional filter in action.
图3(a)展示了一个标准卷积核，而图3(b)展示了一个扩张率为r=2的3x3空洞卷积核。通过控制扩张率参数，我们可以任意控制卷积层的感受野。这使得卷积滤波器可以在不降低空间分辨率或增加卷积核大小的情况下，查看输入图像的更大区域（感受野）。图4展示了空洞卷积滤波器在工作中的状态。
Compared to standard convolution used in Fig. 2, it can be seen in Fig. 5 that dense features are extracted by using a dilated kernel with rate r=2. Dilated convolutions can be trivially implemented by just setting the dilation parameter to the required dilation rate.
与图2中使用的标准卷积相比，从图5中可以看出，通过使用扩张率r=2的空洞卷积核可以提取密集特征。空洞卷积可以通过简单地将扩张参数设置为所需的扩张率来实现。
Depth-wise Separable Convolution
深度可分离卷积
Depth-wise separable convolution was introduced in Xception net[3]. Fig.6 shows a standard convolution operation where the convolution acts on all channels. For the configuration shown in Fig. 6, we have 256 5x5x3 kernels.
深度可分离卷积在Xception网络中首次被引入[3]。图6展示了一个标准卷积操作，其中卷积作用在所有通道上。对于图6所示的配置，我们有256个5x5x3的卷积核。
Fig. 7(a) shows depth-wise convolution where the filters are applied to each channel. This is what differentiates a Depth-wise separable convolution from a standard convolution. The output of the depth-wise convolution has the same channels as the input. For the configuration shown in Fig. 7(a), we have 3 5x5x1 kernels, one for each channel. Inter-channel mixing is achieved by convolving the output of depth-wise convolution with a 1x1 kernel of required number of output channels (Fig. 7(b)).
图7(a)展示了深度卷积，其中滤波器应用于每个通道。这就是深度可分离卷积与标准卷积的区别。深度卷积的输出与输入具有相同的通道数。对于图7(a)所示的配置，我们有3个5x5x1的卷积核，每个通道一个。通过用所需数量的输出通道的1x1卷积核对深度卷积的输出进行卷积，可以实现通道间的混合（图7(b)）。
Why choose Depth-wise Separable Convolution?
为什么选择深度可分离卷积？
To answer this we take a look at the number of multiplications required to perform a standard convolution and a depth-wise separable convolution.
为了回答这个问题，我们来看一下执行标准卷积和深度可分离卷积所需的乘法次数。
Standard Convolution
标准卷积
For the configuration specified in Fig. 6, we have 256 kernels of size 5x5x3. The total multiplications required to compute the convolution:
对于图6中指定的配置，我们有256个大小为5x5x3的卷积核。计算该卷积所需的乘法总数为：
256*5*5*3*(8*8 locations) = 1228800
256*5*5*3*(8*8个位置）=1228800
Depth-wise Separable Convolution
深度可分离卷积
For the configuration specified in Fig. 7, we have 2 convolution operations:
对于图7中指定的配置，我们有2个卷积操作：
1) 3 kernels of size 5x5x1. Here, the number of multiplications required is: 5*5*3*(8*8 locations) = 4800
1）3个大小为5x5x1的卷积核。这里，所需的乘法次数是：553*(8*8个位置) = 4800。
2) 256 kernels of size 1x1x3 for the 1x1 convolution. The number of multiplications required: 256*1*1*3*(8*8 locations) = 49152
2）对于1x1卷积，有256个大小为1x1x3的卷积核。所需的乘法次数是：256113(8*8个位置) = 49152。
Total multiplications required for Depth-wise separable convolutions: 4800 + 49512 = 54312.
深度可分离卷积所需的总乘法次数：4800 + 49512 = 54312。
We can quite clearly see that the depth-wise convolutions require much less computations than the standard convolution.
我们可以很清楚地看到，深度卷积所需的计算量远少于标准卷积。
In pytorch, depth-wise separable convolutions can be implemented by setting the group parameter to the number of input channels.
在PyTorch中，深度可分离卷积可以通过将group参数设置为输入通道的数量来实现。
Note: The groups parameter in pytorch has to be a multiple of the in_channels parameter. This is because in pytorch, the depth-wise convolution is applied by dividing the input features into groups=g groups. More info here.
注意：在PyTorch中，groups参数必须是in_channels参数的一个倍数。这是因为在PyTorch中，深度卷积是通过将输入特征划分为groups=g个组来实现的。更多信息可以在这里找到。
Conclusion
结论
This post delved into two popular types of convolution: atrous(dilated) convolution and depth-wise separable convolutions. We saw what they were, how they were different from the standard convolution operation and also saw the advantages they posed over the standard convolution operation. Finally we also saw how the atrous(dilated) and depth-wise separable convolution can be implemented using pyTorch.
这篇文章深入探讨了两种流行的卷积类型：空洞（atrous或dilated）卷积和深度可分离卷积。我们了解了它们的定义，它们与标准卷积操作的不同之处，以及它们相对于标准卷积操作的优势。最后，我们还看到了如何使用PyTorch实现空洞（atrous或dilated）卷积和深度可分离卷积。
References
参考文献
[1] Convolution arithmetic (https://github.com/vdumoulin/conv_arithmetic)
[1]卷积算法(https：//github.com/vdumoulin/conv_arithmetic)
[2] Chen, Liang-Chieh, et al. “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs.” IEEE transactions on pattern analysis and machine intelligence 40.4 (2017): 834–848.
[2] Chen, Liang-Chieh, 等人。“DeepLab：使用深度卷积网络、空洞卷积和全连接条件随机场的语义图像分割。”IEEE模式分析与机器智能汇刊 40.4（2017）：834–848。
[3] Chollet, François. “Xception: Deep learning with depthwise separable convolutions.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
[3] Chollet, François. “Xception：使用深度可分离卷积的深度学习。” 2017年IEEE计算机视觉与模式识别会议论文集。

查看更多我要分享

A Primer on Atrous Convolutions and Depth-wise Separable Convolutions Atrous卷积和深度可分卷积的基础

A Primer on Atrous Convolutions and Depth-wise Separable Convolutions
Atrous卷积和深度可分卷积的基础