西安电子科技大学教学平台-试译宝

终于有人把各路StyleGAN做了个大汇总 | Reddit超热
Finally, someone made a big summary of all StyleGAN's. Reddit is super hot.

阅读原文

梁雯西安电子科技大学

时间：2022-01-18 语向：中-英类型：人工智能字数：2141

终于有人把各路StyleGAN做了个大汇总 | Reddit超热
Surprisingly, someone summaried of all StyleGAN's | Hot Topic on Reddit
All You Need：预训练+一点空间操作
All you need is to do some pre-training and a little space operation.
StyleGAN在各种图像处理和编辑任务上，表现很惊艳。
StyleGAN is amazing in image processing and editing.
然而，“干一种活”就得换个体系重新“培训”一次，太麻烦。
However, it's too troublesome to get used to a new system if you do different tasks.
终于，有人细细研究了一下，发现：
Surprisingly, someone did find a solution.
其实只通过预训练和潜空间上的一点小操作，就可以让StyleGAN直接上手各种“活儿”，包括全景图生成、从单张图像生成、特征插值、图像到图像翻译等等。
In fact, StyleGAN can deal with different tasks through pre-training and a little operation in potential space. These tasks include panorama generation, from single image generation, feature interpolation, image to image translation, etc.
更厉害的是，它在这些“活儿”上的表现还完全不输每一位单项SOTA选手。
What's more, its performance in these tasks is not inferior to that of every single SOTA.
作者顺势做了个全面整理写成了一篇论文，相关讨论在reddit上直接收获了700+的热度：
The author made a comprehensive arrangement and wrote a paper. Soon after, the discussions on his paper gained 700 + heat on reddit.
网友纷纷感叹：这总结真的是太酷了！
Netizens kept praising that this is so cool!
All You Need：预训练+一点空间操作
All you need is to do some pre-training and a little space operation.
方法都非常简单，我们一个一个来。
The methods are very simple. Let's come one by one.
前提：fi∈RB×C×H×W表示StyleGAN第i层的中间特征（intermediate features）。
Premise: fi ∈ RB × C × H × W represents the intermediate features of StyleGAN layer i.
1、空间操作实现直观和逼真的图像
1. By using spatial operation to realize intuitive and vivid images
由于StyleGAN是全卷积的，我们可以调整fi的空间维度，从而在输出图像中引起相应的空间变化。
Since StyleGAN is Fully Convolutional Networks, we can adjust the spatial dimension of fi to cause spatial changes in the output image.
用简单的空间操作（如padding和resize），可以生成更直观和真实的图像。
With simple spatial operations (such as padding and resize), we can generate more intuitive and vivid images.
比如下图通过复制灌木和树丛来扩展背景，与导致纹理模糊等瑕疵的原始resize相比，在特征空间中可以保持更真实的纹理。
For example, we are trying to expand the background by copying shrubs and trees. It turns out that the new method can maintain more real texture in the feature space while the original Resize causes blurred texture and other defects.
2、特征插值
2. Characteristic interpolation
对StyleGAN中间层进行拼贴可以实现图像信息混合，但要拼接的两张图差异太大时效果往往不好。
The collage of StyleGAN middle layer can realize image information mixing, but the effect is unsatisfying when the two images have too many differences.
但采用特征插值就没问题。
However, it is totally fine to use feature interpolation.
具体操作方法：在每个StyleGAN层，分别使用不同的潜噪声生成fAi和fBi。然后用下面这个公式将它俩进行平滑地混合，然后再传递到下一个卷积层进行同样的操作。
Specific procedures: At each StyleGAN layer, different latent noises are used to generate fAi and fBi respectively. The next step is to use the following formula to mix them, and then pass them to the next convolution layer for the same procedure.
其中α∈ [0, 1]B×C×H×W是一个mask，如果用于水平混合，则mask将从左到右变大。
Where ∈ [0, 1] B × C × H × W is a mask, if used for horizontal mixing, the mask will become larger from left to right.
和对应模型的定性和定量比较：
Qualitative and quantitative comparison with models.
该特征插值法能够无缝地混合两幅图像，而Suzuki等人的结果存在明显的伪影。
The feature interpolation method can perfectly mix two images, while the results of Suzuki have obvious artifacts.
用户研究中，与Suzuki等人相比，87.6%的人也更喜欢该方法。
According to user study, 87.6% users prefer to use this method, just like Suzuki.
用户研究包含40人，每人需比较不同方法下的25对图像。
The user study included 40 people. Each of them are required to compare 25 pairs of images under different methods.
3、从单个图像生成
3, generate from a single image
除了在不同图像之间进行特征插值，我们还可以在单个图像中应用它。
We can not only apply it to feature interpolation between different images, but also to a single image.
具体操作方法：在一些特征层中，选择相关的patches，并将其与其他区域混合，在空间上进行复制。使用移位运算符Shift（·）：
Specific procedures: We need to select relevant patches in some feature layers and mix them with other areas to copy them spatially. Click Shift (·):
这和SinGAN的功能相同，不过SinGAN涉及采样，而该方法只需要手动选择用于特征插值的patches.
This has the same function with SinGAN, but SinGAN involves sampling, and this method only needs to manually select patches for feature interpolation.
和SinGAN的定性和定量比较：
Qualitative and Quantitative Comparison between SinGAN and SinGAN.
该方法生成的图像更加多样化和真实；SinGAN则未能以“有意义”的方式改变教堂结构，并产生不够真实的云彩和风景。
The images generated by this method are more diversified and real. SinGAN failed to change the church structure in the way that people expected. Meanwhile, it produced unreal clouds and scenery.
用户研究中，83.3%的人更喜欢该方法生成的新图像。
According to user study, 83.3% prefer to generate new images with this method.
4、改进GAN反演
4. Improved GAN inversion
GAN反演的目的是在W+空间中定位一个样式码（style code），通过该样式码合成与给定目标图像相似的图像。
GAN inversion aims to locate a style code in W + space and synthesize an image similar to a given target image through the style code.
Wulff等人的模型认为，在简单的非线性变换下，W+空间可以用高斯分布建模。然而，在属性转移设置中，需要反转源图像和参考图像，效果并不令人满意。
The model proposed by the team of Wulff predicts that under simple nonlinear transformation, W + space can be modeled by Gaussian distribution. However, in the attribute transfer setting, the source image and the reference image need to be reversed, while the effect is unsatisfying.
最近的研究表明，与W+相比，利用σ进行面部操作的性能更好。
The latest studies have shown that compared with W +, the effect of facial manipulation is much better by using σ.
但作者发现，没有任何变换的σ空间也可以建模为高斯分布。
However, the author found that the space without any transformation can also be modeled as Gaussian distribution.
然后在这个空间而不是在GAN反转期间，施加相同的高斯先验。
Then the same Gaussian priori is applied in this space instead of during GAN inversion.
效果比较：
Effect Comparison:
该方法在图像重建和可编辑性方面获得了显著改进。
This method has been greatly improved in image reconstruction and editability.
5、图像到图像翻译
5. From Image to Image Translation
得益于上部分σ空间的效果，作者建议在图像到图像翻译时freeze产生σ的仿射变换层（affine transformation layer），这一简单的变化能够更好地保留图像翻译的语义（注意下图d中嘴的形状）。
Due to the effect of the σ space in the upper part, the author suggests that the σ affine transformation layer be freezed during the image-to-image translation. This simple change can preserve the semantics of image translation better (note the shape of the mouth in Figure d below).
此外，作者发现：
The author also found that:
（1）可以在所有空间维度上使用常数α来执行连续翻译；（2）通过选择要执行特征插值的区域来执行局部图像翻译；（3）以及使用改进的GAN反演在真实人脸上执行人脸编辑和翻译；
(1) continuous translation can be performed in all spatial dimensions with a constant α; (2) local image translation can be performed by selecting regions to perform feature interpolation; (3) and face editing and translation on real faces can be performed by using improved GAN inversion;
这样获得的效果也更佳。
The effect obtained in this way can become better.
6、全景生成
6. Panoramic Generation
作者通过“编织”两幅图像的混合（span）生成全景图，方法如图所示：
The author generates a panorama by "weaving" the mixture (span) of two images, as shown in the figure:
重复这个过程可以生成任意长度的全景图像。
Panoramic images with any length can be generated by repeating this process.
而且该方法不仅限于一次混合两个图像、也不限于只在水平方向生成。
And the method is not limited to mixing two images at a time, nor is it limited to generating only in the horizontal direction.
一些示例：
Some examples:
7、属性转移
7. Attribute Transfer
为了使特征插值能够更好地用于任意人物姿势的图像的属性转移，作者选择在源图像和参考图像之间执行姿势对齐，具体就是对齐W+空间样式代码的前2048个维度。
To make feature interpolation better for attribute transfer of images of arbitrary person poses, the author chooses to perform pose alignment between the source image and the reference image, specifically aligning the first 2048 dimensions of the W+ space style code.
然后就可以应用特征插值将所选特征进行源图到目标图的转移了。
Then feature interpolation can be applied to transfer the selected features from source to target.
与现有方法比较：
Comparing with existing methods:
Collins等人的方法没有准确地转移细节属性，Suzuki等人在姿势不匹配时产生的图像不够真实。
The method of Collins et al. does not transfer detail attributes accurately, and the images produced by Suzuki et al. are not realistic enough when the poses do not match.
而作者的方法既准确又真实。
By comparison, the method of the author is both accurate and veritable.
用户根据真实感和准确性进行选择的结果也进一步验证了该方法的优越性。
The superiority of the method can be further verified by the results of users' selection according to realism and accuracy.
ps. 此外还可以在任意区域执行转移，比如无缝融合两边眼睛明显不同的两半脸：
ps. In addition, the transfer can be performed in any area, such as the seamless fusion of two halves of the face with significantly different eyes on both sides:
以上就是无需特定架构或训练范式、在StyleGAN模型潜空间中执行一些操作和微调，就能与其他图像处理任务达到同等或更佳性能的具体方法。
These are the specific ways to achieve the same or better performance as other image processing tasks, which perform some manipulations and fine-tuning in the latent space of the StyleGAN model, and the specific architecture or training paradigm is unnecessary.
你觉得如何？还有什么需要补充的吗？欢迎在评论区留言。
How do you feel? Is there anything else? Welcome to leave your comments.
论文地址：https : //arxiv.org/abs/2111.01619
Address: https : //arxiv.org/abs/2111.01619
项目地址：https://github.com/mchong6/SOAT
Project Address: https://github.com/mchong6/SOAT

查看更多我要分享

终于有人把各路StyleGAN做了个大汇总 | Reddit超热 Finally, someone made a big summary of all StyleGAN's. Reddit is super hot.

终于有人把各路StyleGAN做了个大汇总 | Reddit超热
Finally, someone made a big summary of all StyleGAN's. Reddit is super hot.