Generative AI

Generative AI refers to machine learning systems that produce new content — text, images, video, audio, 3D models, code, and more — rather than simply classifying or analyzing existing data. These systems are trained on large datasets and learn to generate plausible new outputs by modelling the statistical patterns in that data.

Large language models (LLMs) are the most prominent example, generating text and code. But generative AI encompasses other modalities too.

This section covers the broader landscape of generative AI, including image, video, and audio generation.

Image generation

Image generation models produce photorealistic or stylized images from text prompts, reference images, or other inputs. Most modern image generators are based on diffusion models, which learn to iteratively denoise random noise into coherent images.

Notable tools and platforms include:

  • Midjourney — High-quality image generation accessible via Discord and web interface.

  • GPT Image — OpenAI’s current, natively-integrated image generation model (gpt-image-1), noted for near-perfect text rendering. The successor to DALL·E.

  • DALL·E — OpenAI’s earlier image generation line, integrated into ChatGPT.

  • Stable Diffusion — Open-weight diffusion model from Stability AI, widely used as a base for fine-tuned variants.

  • Flux — High-resolution, prompt-editable, open-weight image models from Black Forest Labs.

  • Ideogram — Image generation noted for rendering precise, legible text within images.

  • Adobe Firefly — Adobe’s generative image models, integrated into Creative Cloud and designed to be commercially safe.

  • ComfyUI — Open-source, node-based visual workflow engine for running diffusion models locally, supporting image, video, 3D, and audio generation pipelines.

Video generation

Video generation models extend image generation into the temporal dimension, producing short clips from text or image prompts.

Notable tools and platforms include:

  • Sora — OpenAI’s text-to-video model.

  • Google Veo — Google DeepMind’s text-to-video model, with synchronized audio generation.

  • Kling — Cinematic, realistic video generation, from Kuaishou.

  • Runway — Commercial video generation and editing platform.

  • Luma Dream Machine — Text-to-video and image-to-video generation.

Audio generation

Audio generation covers music, sound effects, and voice synthesis from text or other audio inputs.

Notable tools and platforms include:

  • Suno — AI music generation from text prompts.

  • Udio — Music generation platform.

  • ElevenLabs — Voice synthesis and cloning, widely used for narration and dubbing.


See also

References