diffusion model for image generation

tomato13 2024. 11. 27. 18:13

the diffusion model has an u-net architecture. the architecture's both input and output are the same image. The image is transformed into a noise image and then restored into the original image. Additionally, the diffusion model connects the image text description layer into the u-net architecture's image restoration process. Is it correct?