Real-ESRGAN Explained: How AI Image Upscaling Actually Works

The short answer: Real-ESRGAN is an open-source neural network (released by Tencent ARC Lab in 2021) that upscales images by 2× or 4× while generating plausible new detail. It uses a GAN architecture (two competing networks: generator + discriminator) trained on millions of image pairs with synthetic JPEG and noise degradations. Unlike bicubic interpolation, it hallucinates realistic textures. Upscale Free runs it in your browser via TensorFlow.js — 2-5 seconds per image, no upload.

When you see an AI upscaler in 2026, chances are it’s using Real-ESRGAN or one of its variants. It’s the open-source backbone of dozens of commercial products (including most “free” upscalers) and understanding how it works helps you use it better.

The Problem It Solves

Traditional upscaling is mathematical. Bicubic interpolation averages nearby pixels. Lanczos uses a sinc function. Both follow deterministic formulas — they can’t invent detail that isn’t in the source.

The result: upscaled images look soft, blurry, and obviously enlarged. An 8 MP photo bicubic-upscaled to 32 MP doesn’t have 32 MP of detail; it has 8 MP of detail spread across 32 MP of space.

Real-ESRGAN changed this by using a neural network that learned what high-resolution images look like.

The GAN Architecture

Real-ESRGAN uses a Generative Adversarial Network — two neural networks competing:

The Generator takes a low-resolution image and produces a high-resolution version. It starts randomly and improves through training.

The Discriminator judges whether a given high-resolution image is real or generator-produced. It learns from a dataset of actual high-resolution photos.

During training, both networks improve simultaneously. The generator gets better at producing convincing output. The discriminator gets better at spotting fakes. After millions of iterations, the generator produces images the discriminator can barely distinguish from real photos.

Why “Real” in Real-ESRGAN

The original ESRGAN (2018) was trained on synthetic degradations — smooth interpolated downsizing. But real-world images have messier degradations: JPEG compression, sensor noise, motion blur, lens softness.

Real-ESRGAN’s innovation: train on synthetic degradations that mimic real-world issues. The authors created a pipeline that randomly applies JPEG compression at various qualities, simulates camera noise, adds motion blur, and combines these. Training on this “degraded-to-clean” pairing taught the model to remove real-world artifacts while upscaling.

This is why Real-ESRGAN often makes images look better than the original source — it’s simultaneously upscaling AND denoising AND deblocking AND desharpening.

How It Runs in Your Browser

Running AI in-browser used to be impossible. Three things changed:

TensorFlow.js — Google’s library for running ML models in JavaScript, using WebGL or WebGPU for GPU acceleration.

Model quantization — Real-ESRGAN weights converted from 32-bit floats to 8-bit integers, reducing file size from 300MB to ~28MB with minimal quality loss.

Tile-based inference — Instead of processing the whole image at once (which would need 4-8 GB GPU memory for a 1024×1024 image), the tool processes 64×64 or 128×128 tiles with overlap, then stitches them. This runs in under 1 GB of memory.

The result: your consumer laptop GPU can run Real-ESRGAN inference in 2-15 seconds, depending on input size.

Quality Tiers

Real-ESRGAN has several model variants with different size/quality trade-offs:

ModelSizeQualitySpeed
ESRGAN-slim2 MBBaselineFast
ESRGAN-medium5 MBGoodFast
ESRGAN-thick28 MBExcellentMedium
ESRGAN-psnr67 MBSharpestSlow

Upscale Free uses ESRGAN-thick for the best balance of quality and browser performance.

What It Can’t Do

Understanding failure modes prevents disappointment:

Can’t recover out-of-focus content. If your subject was blurry in the original, Real-ESRGAN guesses at detail that isn’t there. Faces become uncanny, text becomes gibberish that looks-like-text.

Can’t read small text. Text in the source becomes “text-like” shapes that aren’t actually readable. For documents or screenshots with text, use OCR tools instead.

Can’t distinguish important from noise. If dust spots or scratches are in the source, the AI may enhance them too. Clean the source first.

Can’t generate beyond training data style. Trained mostly on photos and some art. For niche styles (technical drawings, medical imaging, infrared photography), quality varies.

The Future

Real-ESRGAN is already 5 years old. Newer models push further:

  • Diffusion-based upscaling (2024+): Stable Diffusion and Flux can upscale with much more detail, but require 5-10× more compute.
  • WebGPU adoption: 3-5× speed improvement when TF.js WebGPU compatibility stabilizes (2026-2027).
  • Real-ESRGAN v3: rumored for 2026, expected to handle diffusion-generated images natively.

For now, Real-ESRGAN-thick running via TensorFlow.js is the best free browser-based option, and likely to remain so for at least another year.

Try It Yourself

Upscale Free runs Real-ESRGAN-thick entirely in your browser. Drop any image, get a 4× result in 10-15 seconds, no upload, no signup. Source stays on your machine.

Frequently asked questions

What is Real-ESRGAN?

Real-ESRGAN (Real-world Enhanced Super-Resolution Generative Adversarial Network) is an open-source AI model released in 2021 by Tencent ARC Lab. It upscales images 2-4× while adding realistic detail, specifically trained to handle real-world image degradations like JPEG compression and sensor noise.

How is it different from bicubic or Lanczos upscaling?

Bicubic/Lanczos use mathematical interpolation — they average nearby pixels but can't invent detail. Real-ESRGAN uses a neural network trained on millions of image pairs to hallucinate plausible high-frequency detail based on learned patterns.

Why do some tools advertise 8× or 16× upscaling?

These typically chain multiple 2× or 4× passes. Quality degrades with each pass — compounding hallucinations can create artifacts. A single 4× pass is generally higher quality than two 2× passes or a single 8× pass.

Does Real-ESRGAN work better on photos or AI-generated images?

Both. Real-ESRGAN was originally trained on photos but handles AI-generated images well because the training included synthetic degradations. For anime and cartoon content, the Real-ESRGAN-anime variant performs better.

What's a GAN and why does it matter?

Generative Adversarial Network: two neural networks compete. One generates upscaled images, the other judges if they look real. This adversarial training produces sharper, more realistic output than traditional CNN-only models.

Can Real-ESRGAN make a bad photo actually sharper?

It enhances existing detail but cannot invent detail that isn't there. A blurry photo becomes less blurry, but AI cannot recover out-of-focus subjects or motion-blurred text. It works best on technically sound photos that just need more pixels.

Ready to try it?

Upscale your images 4× for free — no signup, no upload.

Upscale Image Now →