Google’s new diffusion models — image super-resolution (SR3) and cascaded diffusion models (CDM) — make use of AI to generate high fidelity images.
HIGHLIGHTS
1.Google’s SR3 can achieve strong results on the super-resolution task
2.These models can be used to restore old family portraits
3.SR3 can also be used to improve medical imaging systems
Google has introduced new AI-based diffusion models to improve the quality of low-resolution images. The two new diffusion models — image super-resolution (SR3) and cascaded diffusion models (CDM) — can use AI to generate high fidelity images. These models have many applications that can range from restoring old family portraits and improving medical imaging systems to enhancing performance of downstream models for image classification, segmentation, and more. The SR3 model, for instance, is trained to transform a low-resolution image into a detailed high-resolution image result that surpasses current deep generative models like generative adversarial networks (GANs) in human evaluations.
Researchers from Google Research's Brain Team have published a post on Google's AI blog, detailing both SR3 and CDM diffusion models. SR3 is said to be a super-resolution diffusion model that takes as input a low-resolution image and builds a corresponding high-resolution image from pure noise. The model is trained on an image corruption process that adds noise to a high-resolution image until only pure noise remains. The SR3 model then reverses the process “beginning from pure noise and progressively removing noise to reach a target distribution through the guidance of the input low-resolution image.”
Google has shared a few impressive examples of how a 64x64 pixels resolution image is scaled into a 1,024x1,024 pixels resolution photo using SR3. The end result of a 1,024x1,024 pixels resolution output, especially those of face and natural images, is very impressive. The tech giant says that SR3 is able to achieve strong benchmark results on the super-resolution task for face and natural images when scaling to 4x to 8x higher resolutions.
The CDM diffusion model is trained on ImageNet data to generate high-resolution natural images. Since ImageNet is a difficult, high-entropy dataset, Google built CDM as a cascade of multiple diffusion models. This cascade approach involves chaining together multiple generative models over several spatial resolutions. The chain includes one diffusion model that generates data at a low resolution followed by a sequence of SR3 super-resolution diffusion models that gradually increase the resolution of the generated image to the highest resolution. Google says it applies Gaussian noise and Gaussian blur to the low-resolution input image of each super-resolution model in the cascading pipeline. It calls this process as conditioning augmentation and it enables better and higher resolution sample quality for CDM.
With SR3 and CDM, Google says it has “pushed the performance of diffusion models to state-of-the-art on super-resolution and class-conditional ImageNet generation benchmarks.”