Real-ESRGAN Explained: How AI Image Upscaling Actually Works
The short answer: Real-ESRGAN is an open-source neural network (released by Tencent ARC Lab in 2021) that upscales images by 2× or 4× while generating plausible new detail. It uses a GAN architecture (two competing networks: generator + discriminator) trained on millions of image pairs with synthetic JPEG and noise degradations. Unlike bicubic interpolation, it hallucinates realistic textures. Upscale Free runs it in your browser via TensorFlow.js — 2-5 seconds per image, no upload.
When you see an AI upscaler in 2026, chances are it’s using Real-ESRGAN or one of its variants. It’s the open-source backbone of dozens of commercial products (including most “free” upscalers) and understanding how it works helps you use it better.
The Problem It Solves
Traditional upscaling is mathematical. Bicubic interpolation averages nearby pixels. Lanczos uses a sinc function. Both follow deterministic formulas — they can’t invent detail that isn’t in the source.
The result: upscaled images look soft, blurry, and obviously enlarged. An 8 MP photo bicubic-upscaled to 32 MP doesn’t have 32 MP of detail; it has 8 MP of detail spread across 32 MP of space.
Real-ESRGAN changed this by using a neural network that learned what high-resolution images look like.
The GAN Architecture
Real-ESRGAN uses a Generative Adversarial Network — two neural networks competing:
The Generator takes a low-resolution image and produces a high-resolution version. It starts randomly and improves through training.
The Discriminator judges whether a given high-resolution image is real or generator-produced. It learns from a dataset of actual high-resolution photos.
During training, both networks improve simultaneously. The generator gets better at producing convincing output. The discriminator gets better at spotting fakes. After millions of iterations, the generator produces images the discriminator can barely distinguish from real photos.
Why “Real” in Real-ESRGAN
The original ESRGAN (2018) was trained on synthetic degradations — smooth interpolated downsizing. But real-world images have messier degradations: JPEG compression, sensor noise, motion blur, lens softness.
Real-ESRGAN’s innovation: train on synthetic degradations that mimic real-world issues. The authors created a pipeline that randomly applies JPEG compression at various qualities, simulates camera noise, adds motion blur, and combines these. Training on this “degraded-to-clean” pairing taught the model to remove real-world artifacts while upscaling.
This is why Real-ESRGAN often makes images look better than the original source — it’s simultaneously upscaling AND denoising AND deblocking AND desharpening.
How It Runs in Your Browser
Running AI in-browser used to be impossible. Three things changed:
TensorFlow.js — Google’s library for running ML models in JavaScript, using WebGL or WebGPU for GPU acceleration.
Model quantization — Real-ESRGAN weights converted from 32-bit floats to 8-bit integers, reducing file size from 300MB to ~28MB with minimal quality loss.
Tile-based inference — Instead of processing the whole image at once (which would need 4-8 GB GPU memory for a 1024×1024 image), the tool processes 64×64 or 128×128 tiles with overlap, then stitches them. This runs in under 1 GB of memory.
The result: your consumer laptop GPU can run Real-ESRGAN inference in 2-15 seconds, depending on input size.
Quality Tiers
Real-ESRGAN has several model variants with different size/quality trade-offs:
| Model | Size | Quality | Speed |
|---|---|---|---|
| ESRGAN-slim | 2 MB | Baseline | Fast |
| ESRGAN-medium | 5 MB | Good | Fast |
| ESRGAN-thick | 28 MB | Excellent | Medium |
| ESRGAN-psnr | 67 MB | Sharpest | Slow |
Upscale Free uses ESRGAN-thick for the best balance of quality and browser performance.
What It Can’t Do
Understanding failure modes prevents disappointment:
Can’t recover out-of-focus content. If your subject was blurry in the original, Real-ESRGAN guesses at detail that isn’t there. Faces become uncanny, text becomes gibberish that looks-like-text.
Can’t read small text. Text in the source becomes “text-like” shapes that aren’t actually readable. For documents or screenshots with text, use OCR tools instead.
Can’t distinguish important from noise. If dust spots or scratches are in the source, the AI may enhance them too. Clean the source first.
Can’t generate beyond training data style. Trained mostly on photos and some art. For niche styles (technical drawings, medical imaging, infrared photography), quality varies.
The Future
Real-ESRGAN is already 5 years old. Newer models push further:
- Diffusion-based upscaling (2024+): Stable Diffusion and Flux can upscale with much more detail, but require 5-10× more compute.
- WebGPU adoption: 3-5× speed improvement when TF.js WebGPU compatibility stabilizes (2026-2027).
- Real-ESRGAN v3: rumored for 2026, expected to handle diffusion-generated images natively.
For now, Real-ESRGAN-thick running via TensorFlow.js is the best free browser-based option, and likely to remain so for at least another year.
Try It Yourself
Upscale Free runs Real-ESRGAN-thick entirely in your browser. Drop any image, get a 4× result in 10-15 seconds, no upload, no signup. Source stays on your machine.