Why does `btoa()` produce junk or throw on umlauts and emoji?

`btoa(string)` expects a byte string. `btoa("München")` does not throw, but it treats `ü` as the Latin-1 byte `0xFC`, so the result is not UTF-8 Base64. Emoji and Chinese characters do throw `InvalidCharacterError`. The tool code takes the detour `TextEncoder` → bytes → `btoa(String.fromCharCode(...bytes))` and produces what servers and APIs expect for UTF-8 text.

What is the difference between standard and URL-safe Base64?

Standard (RFC 4648 §4) uses `+` and `/`. Both are special in URLs and have to be percent-encoded again if they end up there. URL-safe (§5) replaces `+` with `-` and `/` with `_`. The variant is safe in URL paths, query strings, filenames and cookie values. Padding `=` is often dropped in the URL-safe form.

How many bytes come out of a Base64 string?

Rule of thumb: 4 Base64 characters carry up to 3 data bytes. A 100-character Base64 string encodes at most 75 bytes. More precisely: encoding n bytes yields 4 × ⌈n/3⌉ characters. Decoding yields 3 × block count minus padding characters.

Base64 explained: encoder and decoder with byte breakdown

Base64 step by step: 3-byte groups as chips, 4-character blocks beside them, padding and URL-safe variant explained. Encode and decode in the browser.

Published: May 1, 2026 Last updated: May 1, 2026

When you spot cGFzc3dvcmQ6IHN1cGVyc2VjcmV0 in a config file or an API token starting with eyJ, you are looking at Base64. It is not encryption and not a protocol; it is a translation layer that packs arbitrary bytes into 64 safe ASCII characters so they fit through text channels like URLs, JSON strings, and email headers. The encoder below shows your input as UTF-8 bytes in 3-byte groups and pairs each group with the matching 4-character output block.

Variant

RFC 4648 §4 - uses `+` and `/`

Input (bytes)

Output (Base64)

SGFsbG8gVG9vbGZsdXghIPCfmoA=

UTF-8 bytes: 20 Output bytes: 28 4-char blocks: 7 Padding: 1

The result explained

Input (bytes)

Hallo Toolflux!20 F0 9F9A 80

Input bytes (8 bits)

H 0x48

a 0x61

l 0x6C

01001000

01100001

01101100

3 bytes (24 bits) → 4 Base64 chars (6 bits each)

Output (Base64)

SGFsbG8gVG9vbGZsdXghIPCfmoA=

Output sextets (6 bits)

S Value 18

G Value 6

F Value 5

s Value 44

010010

000110

000101

101100

Runs in your browser. No network call. No account.

What is Base64 and why does it look like that?

Base64 packs three bytes (24 bits) into four characters from a 64-symbol alphabet (A-Z, a-z, 0-9, +, /). The size grows by about 33 percent - the price you pay for a result made of harmless ASCII that travels safely through systems that otherwise only accept text.

The panel above splits your input into four categories: ASCII bytes grey, UTF-8 lead bytes orange, UTF-8 continuation bytes amber, padding = purple. Every 3-byte group on the left has a 4-character block on the right. With hello (5 bytes) you get two blocks - the second one ends in =. With München you can see how Mü becomes three bytes (4D C3 BC) and then the block TcO8.

A bit strip runs underneath the chips: three 8-bit bytes on the input side, four 6-bit sextets on the output side. Both sides carry exactly 24 bits, just sliced differently. Click any block and the strip jumps to that quantum - so you can watch 4D C3 BC (i.e. 01001101 11000011 10111100) regroup into the sextets 19 28 14 60, which the standard alphabet spells T c O 8.

When do you run into Base64?

Base64 turns up wherever bytes have to travel through text channels. Developers are not the only readers - power users, sysadmins, students hit it too, often without realising it is a translation rather than a code.

Kubernetes secrets store data values as Base64 inside the YAML manifest (echo -n "password" | base64 returns cGFzc3dvcmQ=). stringData is the plain-text alternative.
Data URIs in HTML like data:image/png;base64,iVBORw0KG... embed images straight into the markup.
JWT tokens are three URL-safe Base64 parts: header, payload, signature. → Decode JWT header
Email attachments have travelled as Base64 through SMTP since the 90s, because the protocol only guarantees 7-bit ASCII.
Browser APIs like FileReader.readAsDataURL hand back Base64. JWK fields from crypto.subtle.exportKey("jwk", ...) use Base64url.

If you need the bytes as a hex list instead of Base64 - to compare against xxd, a hex viewer, or openssl output - the "Copy hex" action hands them over ready to paste.

Standard or URL-safe - which variant fits?

The variant switch picks between the two alphabets defined in RFC 4648. Standard (§4) uses + and /. Both are special in URLs and have to be percent-encoded if they appear there. URL-safe (§5) replaces + with -, / with _, and usually drops the padding =, so the string is safe to drop into a URL, a filename, or a cookie value.

Scenario	Right choice	What breaks otherwise
Bytes in YAML, JSON, headers	Standard	This is the default almost everywhere
Token in a URL path or query	URL-safe	`+` and `/` need extra percent-encoding
Filename derived from a hash	URL-safe	`/` would create a fresh path segment
JWT (header, payload, signature)	URL-safe	Standard does not work in the token format

Default to Standard. Switch to URL-safe only when the Base64 string ends up in a URL or a filename.

What does the `=` at the end mean?

Padding = rounds the final block up to four characters. Base64 consumes three bytes at a time. If your input is a multiple of three bytes, you get no =. One byte left over yields two =. Two bytes left over yield one =.

hello (5 bytes) → one full 3-byte block plus 2 bytes left → aGVsbG8= (→ try it).
hi (2 bytes) → 2 bytes left → aGk=.
Foo (3 bytes) → exact fit → Rm9v, no padding.
a (1 byte) → 1 byte left → YQ==.

Padding is not just decoration. Decoders use it to recover the exact byte count without inspecting the data. The URL-safe variant tends to drop the padding because the length is implicit anyway - the decoder above accepts both forms.

Why is `btoa()` tricky with umlauts?

btoa(string) is a 90s browser API. It expects a byte string: characters with code values from 0x00 to 0xFF. A ü (U+00FC) is inside that range, so btoa("München") does not throw. The result is TfxuY2hlbg==, because ü is treated as the single byte 0xFC. The UTF-8 form that APIs usually expect is TcO8bmNoZW4=.

The moment an emoji or a Chinese character shows up, btoa throws InvalidCharacterError, because those characters cannot fit into one byte. The encoder above handles both cases by taking the detour through TextEncoder: first turn the text into UTF-8 bytes, then call String.fromCharCode(...bytes), then btoa. Decoding works the same way in reverse: atob returns a byte sequence, TextDecoder turns it back into UTF-8 text. With München you can see the jump from 7 characters to 8 bytes in the panel above - the ü accounts for two of them.

Frequently Asked Questions

What is Base64 and why does it exist?

Base64 is an encoding that packs arbitrary bytes into 64 safe ASCII characters (A-Z, a-z, 0-9, +, /). It started in email, which for decades could only carry 7-bit ASCII. Today Base64 turns up in data URIs, JWT tokens, Kubernetes secrets, and anywhere binary data has to travel through a text channel.

What does the `=` at the end mean?

The = is padding. Base64 packs three bytes into four characters. If your input is not divisible by three, the last block is padded with =. One = means two data bytes in the final block, two = means one data byte. The URL-safe variant usually drops the padding.

Why does `btoa()` produce junk on umlauts?

btoa(string) expects a byte string. Umlauts like ü are treated as Latin-1 bytes, while emoji and many other characters throw. The tool code routes through TextEncoder and produces the UTF-8 Base64 form that servers and APIs expect. The URL Encoder uses UTF-8 the same way.

Is Base64 encryption?

No. Base64 is encoding, not encryption. Anyone with the string and a decoder gets the plain text back. If you need secrecy, reach for encryption (AES, RSA, libsodium).

When do I need Base64 in a URL?

Whenever bytes have to travel through a URL without being mangled - JWT tokens, OAuth state parameters, webhook signatures, sometimes image hashes in paths. Switch to the URL-safe variant in those cases, or you end up with double encoding. More on the mechanics in the URL Encoder.

ASCII

UTF-8 lead byte

UTF-8 continuation

Padding `=`

What is Base64 and why does it look like that?

When do you run into Base64?

Standard or URL-safe - which variant fits?

What does the = at the end mean?

Why is btoa() tricky with umlauts?

Frequently Asked Questions

What is Base64 and why does it exist?

What does the = at the end mean?

Why does btoa() produce junk on umlauts?

Is Base64 encryption?

When do I need Base64 in a URL?

What does the `=` at the end mean?

Why is `btoa()` tricky with umlauts?

What does the `=` at the end mean?

Why does `btoa()` produce junk on umlauts?