Haokun Zhu

Hi! I am a first-year student in the MSR program at Carnegie Mellon University. Prior to this, I pursued my undergraduate studies at Shanghai Jiao Tong University (SJTU). I collaborated with Ph.D. Teng Hu at SJTU, under the supervision of Prof. Ran Yi. Additionally, I also received remote supervision from Prof. Yu-Kun Lai and Prof. Paul L. Rosin, who are based at Cardiff University. My research interests mainly lie in Computer Vision and Generative Models.

Email  /  Resume  /  Google Scholar  /  Github  / 

profile photo
Research

I used to be an undergraduate researcher both in Digital Media & Computer Vision Laboratory(DMCV) at SJTU on-site advised by Prof. Ran Yi and remotely supervised by Prof. Yu-Kun Lai and Prof. Paul L. Rosin at Cardiff University. I also interned in Youtu Lab at Tencent Technology (Shanghai) Co.Ltd, where I was advised by Jinlong Peng. I'm interested in deep generative models like GANs and Diffusion Models. My dream is to develop generative models with strong capabilities and employ them in every-day life to bring convenience to everyone.

In my past one year of research experience, I have explored a wide range of directions, including:

  • Few-shot Image Generation with Diffusion Model: how to employ diffusion model in producing high-quality and diverse images in a new domain with only a small number of training data.
  • Aesthetic Guided Universal Style Transfer: how to transfer the style of an arbitrary image to another content image while striking a balance among aesthetic qualities, style transfromation and content presevation.
  • Stroke-based Neural Painting: how to recreate a pixel-based image with a set of brushstrokes like real human-beings while achieving both faithful reconstruction and stroke style at the same time.
  • Image Vectorization: how to transform raster images into scalable vector graphics which have superior adaptability and detailed representation.
  • Multimodal Industrial Anomaly Detection: how to address the issue of ineffective feature integration in 3D point cloud and RGB images and apply multimodality to enhance industrial anomaly detection.
  • News

    [2024.07.15]: Our paper AesStyler: Aesthetic Guided Universal Style Transfer is accepted by ACM MM 2024!

    [2023.12.12]: Our paper SAMVG: A Multi-stage Image Vectorization Model with the Segment-Anything Model is accepted by ICASSP 2024!

    [2023.11.09]: Our paper SAMVG: A Multi-stage Image Vectorization Model with the Segment-Anything Model is on arXiv!

    [2023.07.26]: Our paper Stroke-based Neural Painting and Stylization with Dynamically Predicted Painting Region is accepted by ACM MM 2023!

    [2023.07.14]: Our paper Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption is accepted by ICCV 2023!

    Publications & Preprints (* means equal contribution, # means corresponding author)
    PontTuset AesStyler: Aesthetic Guided Universal Style Transfer
    Ran Yi*,#, Haokun Zhu*, Teng Hu, Yu-Kun Lai, Paul L. Rosin
    Accepted by ACM MM 2024
    [website]

    We propose AesStyler, a novel Aesthetic Guided Universal Style Transfer method, which utilizes pre-trained aesthetiic assessment model, a novel Universal Aesthetic Codebook and a novel Universal and Specific Aesthetic-Guided Attention (USAesA) module. Extensive experiments and user-studies have demonstrated that our approach generates aesthetically more harmonious and pleasing results than the state-of-the-art methods.

    PontTuset SAMVG: A Multi-stage Image Vectorization Model with the Segment-Anything Model
    Haokun Zhu*, Juang Ian Chong*, Teng Hu, Ran Yi, Yu-Kun Lai, Paul L. Rosin
    Accepted by ICASSP 2024
    [pdf] [arXiv]

    We propose SAMVG, a multi-stage model to vectorize raster images into SVG (Scalable Vector Graphics). Through a series of extensive experiments, we demonstrate that SAMVG can produce high quality SVGs in any domain while requiring less computation time and complexity compared to previous state-of-the-art methods.

    PontTuset M3DM-NR: RGB-3D Noisy-Resistant Industrial Anomaly Detection via Multimodal Denoising
    Chengjie Wang Haokun Zhu*, Jinlong Peng, Yue Wang, Ran Yi, Yunsheng Wu, Lizhuang Ma Jiangning Zhang,
    Under Review
    [pdf] [arXiv]

    We propose M3DM-NR, a novel noise-resistant framework to leverage the strong multi-modal(image and point cloud) discriminative capabilities of CLIP. Extensive experiments show that M3DM-NR outperforms state-of-the-art methods in 3D-RGB multi-modal noisy anomaly detection

    PontTuset Stroke-based Neural Painting and Stylization with Dynamically Predicted Painting Region
    Teng Hu, Ran Yi, Haokun Zhu, Liang Liu, Jinlong Peng, Yabiao Wang, Chengjie Wang Lizhuang Ma
    Accepted by ACM MM 2023
    [pdf] [code] [arXiv]

    We propose Compositional Neural Painter, a novel stroke-based rendering framework which dynamically predicts the next painting region based on the current canvas, instead of dividing the image plane uniformly into painting regions. Extensive experiments show our model outperforms the existing models in stroke-based neural painting.

    PontTuset Phasic Content Fusing Diffusion Model with Directional Distribution Consistency for Few-Shot Model Adaption
    Teng Hu, Jiangning Zhang, Liang Liu, Ran Yi, Siqi Kou, Haokun Zhu, Xu Chen, Yabiao Wang, Chengjie Wang Lizhuang Ma
    Accepted by ICCV 2023
    [pdf] [supp] [code] [arXiv]

    We propose a novel phasic content fusing few-shot diffusion model with directional distribution consistency loss, which targets different learning objectives at distinct training stages of the diffusion model. Theoretical analysis, and experiments demonstrate the superiority of our approach in few-shot generative model adaption tasks.

    Course Projects
    PontTuset Image-to-Image Translation: From Line to Sketch (CV)
    Haokun Zhu*, Ning Li*,
    [pdf] [code]

    This is the CS3511 course project. We use two frameworks, pix2pix and pixel2style2pixel, to solve an image-to-image translation task: line generation sketch task. Both methods achieve good performance in this task. We also achieved great results in the workshop of CGI-PSG2023 with this project, ranking 3rd in avg_FID and 2nd in avg_SSIM.

    PontTuset Real-time Ray Tracing with OpenGL (CG)
    Haokun Zhu*, Ning Li*
    [pdf] [code]

    This is the CS3310 Computer Graphics course project, which implements real-time ray tracing using OpenGL and use the graphics rendering pipeline to create visual effects like shadows, reflections, and refractions. It incorporates the SMAA algorithm for efficient anti-aliasing. Additionally, the project boosts ray tracing performance with algorithms such as Bounding Volume Hierarchy (BVH), ensuring the effective rendering.

    PontTuset EEG-based Emotion Recognition (Transfer Learning)
    Haokun Zhu,
    [pdf] [code]

    This is the CS3507 course project. EEG-based emotion recognition is an important branch in the field of affective computing. In this project, I implement baselines, domain generalization and domain adpatation methods for the EEG-based emotion recognition task. I implement abundant baseline models for the EEG-based emotion recognition task, including SVM, MLP and ResNet. For domain generalization, I implement the Invariant Risk Minimization domain generalization method. For domain adaptation, I implement 4 methods: Transfer Component Analysis, Domain Adversarial Neural Network, Adversarial Discriminative Domain Adaptation and Prototypical Representation based Pairwise Learning.

    PontTuset Sentiment Analysis with Bert (NLP)
    Haokun Zhu*, Hang Zheng*, Quanling Yan
    [pdf] [code]

    This is the CS3307 Internet Information Extraction course project. This project focuses on the Positive and Negative Sentiment Analysis, a binary sentiment analysis issue. Text sentiment analysis has a wide range of applications in fields such as social media and public opinion monitoring, including the analysis of positive and negative aspects of product reviews, monitoring of online corporate evaluations, etc. Therefore, achieving efficient and accurate sentiment analysis is of practical significance.