From text to two-dimensional barcode, step by step.
by gui dávid
In 1994, an engineer named Masahiro Hara at Denso Wave, a subsidiary of Toyota, had a problem. The one-dimensional barcodes used to track car parts on the assembly line could only hold about 20 characters. As production systems grew more complex, with more parts, more suppliers, and more metadata per component, those 20 characters became a bottleneck. Workers had to scan multiple barcodes per part, slowing everything down.
Hara's insight was geometric. A traditional barcode encodes information along a single axis: the varying widths of vertical lines. What if you used both axes? A two-dimensional grid of black and white squares could hold orders of magnitude more data in the same physical space.
His team spent two years developing what they called the QR code, short for "Quick Response," named for its design goal: fast, reliable scanning even in a noisy factory environment. The three large squares in the corners weren't decorative; they were a deliberate engineering choice to allow scanners to find and orient the code at any angle, under any rotation, in under a second.
Denso Wave made an unusual decision: they released the QR code specification publicly and chose not to enforce their patent rights. This openness is a major reason QR codes are everywhere today, from restaurant menus to airline boarding passes to cryptocurrency wallets.
A QR code is a matrix of black and white squares called modules. Each module is a single bit of visual information: dark or light, on or off. The arrangement of these modules encodes binary data using a combination of efficient encoding schemes and polynomial error correction.
QR codes come in 40 versions. Version 1, the smallest, is a 21×21 grid. Each successive version adds 4 modules per side:
$$\text{size} = 4v + 17$$
where $v$ is the version number. So Version 2 is 25×25, Version 3 is 29×29, and Version 40, the maximum, is a dense 177×177 grid. In practice, most QR codes you encounter are Version 1 through 7. A Version 3 code (33×33 modules) with medium error correction can hold a 77-character URL, which covers nearly any link you'd want to share.
Here's how the grid scales with version number. Move the slider to see the structural patterns at each size:
The three large corner squares (finder patterns) stay the same size in every version. Small squares (alignment patterns) appear at Version 2 and multiply as the code grows.
A QR code looks like noise at first glance, a scrambled grid of black and white. But it has a precise internal anatomy, and every region has a specific purpose. Understanding the anatomy is the key to understanding how the whole system works.
The three large squares in the top-left, top-right, and bottom-left corners. Each is a 7×7 module pattern: a black outer border, a white ring, and a solid black 3×3 core. These give the scanner an immediate fix on the code's position, size, and rotation.
The key design insight is the 1:1:3:1:1 ratio. If you scan a horizontal or vertical line through a finder pattern, you'll always encounter this ratio of dark-light-dark-light-dark modules. This ratio is unique enough that a scanner can distinguish it from random noise and detect the code even at an angle or partially obscured. The absence of a fourth finder pattern in the bottom-right corner tells the scanner which way is "up."
Each finder pattern is surrounded by a one-module-wide separator of white modules. These ensure the scanner doesn't confuse the finder pattern's black border with adjacent dark data modules.
Alternating black and white modules running between the finder patterns: one horizontal strip along row 6, one vertical strip along column 6. Think of them as ruler markings: they help the scanner figure out the exact grid spacing, which is especially important for larger codes where lens distortion could make the grid ambiguous. Without timing patterns, a scanner might miscalculate cell boundaries and read the wrong data.
Small 5×5 squares that first appear in Version 2. Larger versions have more of them, arranged in a grid pattern across the code's surface. These provide additional reference points that help the scanner correct for perspective distortion, the warping that happens when you photograph a flat QR code from an angle. Each alignment pattern is a miniature target: a black center, white ring, black border.
Two copies of a 15-bit string placed near the finder patterns. This encodes two critical parameters: the error correction level (L, M, Q, or H) and the mask pattern (0 through 7). It's protected by its own BCH error correction code, and it's duplicated for redundancy. The scanner reads format info first, because it needs to know the EC level and mask before it can decode anything else.
For Version 7 and above, an 18-bit version information string is placed in two locations: a 6×3 block below the top-right finder pattern and a 3×6 block to the right of the bottom-left finder pattern. This tells the scanner which version (grid size) it's dealing with before it tries to decode anything. Smaller versions don't need this because the scanner can infer the version from the grid size alone.
A single module that is always black, located at a fixed position (row $(4v + 9)$, column 8, where $v$ is the version number). It exists for historical reasons in the spec and helps the scanner confirm that it's reading a valid QR code. Every QR code has exactly one.
Everything else. The encoded message and its Reed-Solomon error correction codewords fill the remaining modules in a specific zigzag pattern. This is where the actual information lives: the URL, the text, the Wi-Fi password, whatever you encoded.
A 4-module-wide border of empty space around the entire code. It's not part of the code itself, but it's essential: without it, a scanner can't distinguish where the QR code ends and the surrounding surface begins. This is why QR codes sometimes fail when printed too close to the edge of a sticker or next to high-contrast graphics.
Click a region to highlight it. In the "All" view, each region is color-coded.
In the All view above, each region is color-coded: finder patterns, timing patterns, alignment patterns, format info, and data (black/white). Click individual buttons to isolate each region. Notice how much of the code is actually structural overhead versus actual data.
Now for the interesting part: how do we turn human-readable text into a grid of black and white squares? The process starts with converting the input into a stream of bits, and the QR specification is surprisingly clever about how it does this.
QR codes support four encoding modes, each optimized for different types of data. The encoder analyzes the input and automatically selects the most efficient mode:
Mode indicator: 0001. Encodes only digits 0–9. Groups of three digits are
packed into 10-bit binary numbers, pairs into 7 bits, and singles into 4 bits. This gives
roughly 3.33 bits per character, remarkably efficient. A 45-digit
number in numeric mode takes about 150 bits; in byte mode, it would take 360.
Mode indicator: 0010. Encodes a 45-character set: digits 0–9, uppercase
letters A–Z, and nine symbols: space $ % * + - . / :. Each character maps to
a value 0–44. Pairs of characters are encoded as 11-bit numbers using the formula
$v_1 \times 45 + v_2$, giving about 5.5 bits per character. This is the mode
used for most URLs (since they're typically uppercase in QR encoding).
Mode indicator: 0100. Encodes arbitrary bytes, typically ISO 8859-1 or UTF-8.
Each byte is directly encoded as 8 bits. This is the fallback mode for anything that doesn't
fit the more efficient modes, like lowercase text, special characters, or binary data.
Mode indicator: 1000. Encodes Shift JIS double-byte characters at 13 bits each.
Designed specifically for Japanese text, a nod to the QR code's origins.
The complete bit stream is structured as follows: a 4-bit mode indicator,
a character count field (whose length varies by mode and version),
the encoded data, and a 4-bit terminator
(0000). After the terminator, the stream is padded to fill the available data capacity.
The character count field length depends on the version range:
| Mode | V1-9 | V10-26 | V27-40 |
|---|---|---|---|
| Numeric | 10 bits | 12 bits | 14 bits |
| Alphanumeric | 9 bits | 11 bits | 13 bits |
| Byte | 8 bits | 16 bits | 16 bits |
| Kanji | 8 bits | 10 bits | 12 bits |
Try it yourself. Type text into the box below and watch the binary encoding change in real time:
Notice how the encoding changes as you type. Pure digits (like 12345) trigger
numeric mode, the most compact. Uppercase text with basic symbols triggers alphanumeric
mode. The moment you type a lowercase letter, the encoder falls back to byte mode, and you
can see the bit stream grow longer. This is why many QR codes encode URLs in uppercase: it
saves space.
Fun fact: the capacity of a Version 1-M QR code in each mode is 34 numeric, 20 alphanumeric, 14 byte, or 8 kanji characters. A typical short URL needs about Version 2 or 3.
Here's what makes QR codes genuinely remarkable, and what separates them from a naive "just draw a grid of bits" approach: they still work when they're damaged.
A QR code printed on a sticker can lose a chunk of its surface (scratched, torn, covered by a logo, splashed with coffee) and still be perfectly scannable. This resilience comes from Reed-Solomon error correction, the same family of algorithms used in CDs, DVDs, satellite communications, and deep-space probes.
QR codes offer four levels of error correction, each trading data capacity for resilience:
| Level | Recovery capacity | Typical use |
|---|---|---|
| L (Low) | ~7% of codewords | Clean environments, maximum data |
| M (Medium) | ~15% of codewords | General purpose (default) |
| Q (Quartile) | ~25% of codewords | Industrial, outdoor use |
| H (High) | ~30% of codewords | Logos embedded, harsh conditions |
The tradeoff is direct: higher error correction means more redundancy bytes, which means less room for actual data. A Version 1 code with EC level L can hold 17 data codewords (enough for a short string), but with level H, only 9. You're literally trading capacity for durability.
Reed-Solomon encoding treats each data byte as a coefficient of a polynomial defined over the Galois field GF(256). This is a finite field with exactly 256 elements, which maps perfectly to the 256 possible values of a byte.
What makes GF(256) special is that addition, subtraction, multiplication, and division are all defined for its elements, and every non-zero element has a multiplicative inverse. Addition and subtraction are both XOR. Multiplication is done using log and antilog tables based on a primitive element $\alpha$:
$$a \times b = \alpha^{\log_\alpha(a) + \log_\alpha(b)}$$
Given $k$ data codewords, the encoder constructs a data polynomial:
$$f(x) = d_{k-1}x^{k-1} + d_{k-2}x^{k-2} + \cdots + d_1x + d_0$$
It then divides $f(x) \cdot x^n$ by a generator polynomial $g(x)$, which is constructed as the product of $n$ linear factors:
$$g(x) = \prod_{i=0}^{n-1}(x - \alpha^i) = (x - \alpha^0)(x - \alpha^1) \cdots (x - \alpha^{n-1})$$
where $\alpha = 2$ is a primitive element of GF(256) with the reducing polynomial $x^8 + x^4 + x^3 + x^2 + 1$ (hex value 0x11D). The remainder of this polynomial division gives us $n$ error correction codewords.
The mathematical guarantee: these $n$ EC codewords can correct up to $\lfloor n/2 \rfloor$ symbol errors, or detect up to $n$ errors, or some mix of both. The decoder uses syndrome computation, the Berlekamp-Massey algorithm, and Forney's formula to locate and fix errors, essentially solving a system of equations to find which bytes were corrupted and what they should have been.
Try it below. Increase the damage slider and watch modules flip randomly. Switch between error correction levels to see how much damage each can tolerate:
With low error correction (L), the code becomes unreliable quickly. Switch to H and watch it hold on far longer, even at 25-30% damage. This is exactly why QR codes with logos embedded in the center typically use level H: the logo effectively destroys the modules underneath it, and the error correction compensates.
After placing data and error correction codewords into the grid, there's one more critical step that most people never think about: masking.
The problem is subtle. The encoded data might produce module patterns that confuse scanners. Imagine if the data happened to create a large region of all-black modules, or a pattern that looks like a finder or timing pattern in the wrong place. The scanner could misidentify structural elements, miscalculate grid alignment, or fail to decode entirely.
To prevent this, the QR specification defines 8 mask patterns. Each mask is a simple mathematical formula that determines, for any module position $(i, j)$, whether to flip that module (XOR it). The mask is only applied to data and EC modules. Finder patterns, timing patterns, alignment patterns, and format info are never touched.
The encoder applies all 8 masks, computes a penalty score for each result, and selects the mask with the lowest penalty. The penalty function checks four conditions:
The mask that produces the "most random-looking" distribution, the one least likely to confuse a scanner, wins. Here are all 8 mask patterns:
Click a mask pattern to see it applied to a QR code. Green modules are flipped.
Each mask has an elegant formula. Mask 0, $(i + j) \bmod 2 = 0$, creates a checkerboard. Mask 1, $i \bmod 2 = 0$, creates horizontal stripes. They range from simple to complex, but each produces a distinct visual pattern that interacts differently with different data.
The mask number (0-7) is encoded in the format information near the finder patterns, so the scanner knows which mask to "undo" before reading the data.
Let's trace the complete pipeline from input text to finished QR code, using "guidavid.com" as our example. Step through each stage:
And that's it. Every black and white module in the final code is precisely determined by math. There's no randomness, no approximation. Given the same input text and EC level, the same QR code is produced every time. Try the generator below:
The QR code above is real and scannable. Point your phone camera at it. The code running on this page uses the qrcodegen reference library by Project Nayuki, which implements the full QR specification: Galois field arithmetic, Reed-Solomon encoding, module placement, all 8 mask patterns, and penalty scoring.
You point your phone at a sticker. The screen loads a menu. You order a coffee. Between the camera and the URL, your phone ran polynomial division over a finite field, decoded a zigzag, unmasked eight candidate patterns, and corrected for the scratch across the bottom-left corner. You didn't notice.
The best engineering disappears.