Recent studies have shown impressive progress in universal style transfer which can integrate arbitrary styles into content images. However, existing approaches struggle with low aesthetics and disharmonious patterns in the final results due to the following problems: (1) Aesthetic Assumption Bias: In the training stage, the aesthetic discriminator of AesUST lacks explicit supervisory signals to define aesthetics, instead it presumes that images from the style training dataset inherently possess aesthetics—a presumption that is not guaranteed to be accurate. This can lead to it acting more as a style feature extractor, potentially overlooking true aesthetic elements. (2) Style-Constrained Aesthetic Extraction>: AesUST restricts aesthetic feature extraction to style images, leading to a narrow, style-specific aesthetic perspective. However, aesthetics generally have universal qualities and shouldn't be confined as style-specific. (3) Indiscriminate Feature Fusion: attention scores recalibrate high layer feature maps of aesthetic and style features. However, these modified features are then merged with content features without adequately considering the differences in feature distributions and information from lower layers.
To this end, we propose AesStyler, a novel Aesthetic Guided Universal Style Transfer method. Firstly, we propose to utilize TANet as the aesthetic feature extractor in AesStyler. Secondly, we propose to build a Universal Aesthetic Codebook (UAC), to harness and utilize universal aesthetic features which encapsulate the global aspects of aesthetics. Thirdly, we propose the Universal and Style-specific Aesthetic-Guided Attention (USAesA) module. USAesA empowers our model to adaptively and progressively integrate both universal and style-specific aesthetic features with the style feature and incorporate the aes-enhanced style feature into the content feature. Extensive experiments and \yl{user studies} have demonstrated the superiority of our approach. Compared to previous methods, our AesStyler not only yields results of superior aesthetics but also with better style transfer quality.
We will release the details of the method after the review of CVPR2024.
We compare our proposed AesSTyler against 10 state-of-the-art arbitrary style transfer methods: aesthetic-aware UST methods (AesUST and AAST), aesthetic-free UST methods (AdaAttN, Avatar, ArtFlow, IECAST, MAST, AdaIN, SANet and StyleFormer.
Ours | AesUST | AAST | AdaAttN | Avatar | ArtFlow | IECAST | MAST | AdaIN | StyleFormer | |
---|---|---|---|---|---|---|---|---|---|---|
Gram Loss ↓ | 0.1710 | 0.2192 | 0.1756 | 0.2088 | 0.2614 | 0.2046 | 0.2641 | 0.1916 | 0.1913 | 0.1713 |
SSIM ↑ | 0.3971 | 0.3330 | 0.2780 | 0.4311 | 0.2449 | 0.3966 | 0.3392 | 0.2945 | 0.2668 | 0.3354 |
Aes Score ↑ | 0.4597 | 0.4102 | 0.4020 | 0.4180 | 0.4100 | 0.4056 | 0.4137 | 0.4065 | 0.4046 | 0.4109 |
Deception Rate↑ | 0.2857 | 0.1885 | 0.2176 | 0.2761 | 0.2620 | 0.1846 | 0.1811 | 0.2730 | 0.1363 | 0.2330 |
We will showm more experiment results after the review of CVPR2024.