Deep Generative Models

Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models

Read more about Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models
Log in to post comments

Multi-Source Diffusion Models (MSDM) allow for compositional musical generation tasks: generating a set of coherent sources, creating accompaniments, and performing source separation. Despite their versatility, they require estimating the joint distribution over the sources, necessitating pre-separated musical data, which is rarely available, and fixing the number and type of sources at training time. This paper generalizes MSDM to arbitrary time-domain diffusion models conditioned on text embeddings.

gmsdi.pdf

gmsdi.pdf (201)

Categories:: Machine Learning for Signal Processing

25 Views

FAST PERSONALIZED TEXT TO IMAGE SYNTHESIS WITH ATTENTION INJECTION

Read more about FAST PERSONALIZED TEXT TO IMAGE SYNTHESIS WITH ATTENTION INJECTION
Log in to post comments

Currently, personalized image generation methods mostly require considerable time to finetune and often overfit the concept resulting in generated images that are similar to custom concepts but difficult to edit by prompts. We propose an effective and fast approach that could balance the text-image consistency and identity consistency of the generated image and reference image. Our method can generate personalized images without any fine-tuning while maintaining the inherent text-to-image generation ability of diffusion models.

Fast_Personalized.pptx

Fast_Personalized.pptx (215)

Categories:: Other

42 Views