Encoding and Decoding Base64 Strings in Java

Introduction

The process of converting data by applying some techniques/rules into a new format is called encoding. Decoding is the reverse process of encoding - turning the encoded data back to the original format.

Encoding is all around us and computers heavily rely on different encoding formats to deliver and transfer information.

From the latest cat image in your feed to the voice message you listened to over a messaging app - all of these were encoded on the sender-side, delivered to you, and decoded on the recipient-side.

Encoding makes sure that data remains intact and efficient for transport.

What is Base64?

Base64 is a binary-to-text encoding scheme. It's represented as printable ASCII characters where each Base64 character contains 6 bits of binary information.

It's very useful for storing image/audio information in Strings of information. What Base64 isn't is an encryption algorithm.

While it does "obscure" the data to a degree, it's not a secure format by any means.

How Does Base64 Work?

In Base64, as the name suggests, there are 64 characters used to encode binary data. These characters are:

  • 26 Capital letters [A-Z]
  • 26 lower letters [a-z]
  • 10 digits [0-9]
  • 2 special characters [+ , /]

Note: There is also a 65th character (=) , which serves a special meaning and it's called a padding character.

As usual, when it comes to computers, numbers are represented in powers of 2. So 64 characters of Base64 can be written as:
$$
64 = 2^6
$$
Each character is represented in 6 bits. You could go higher than this and have each character represent 8 bits. That would be Base256. Though, this is impractical due to difficulties in the encoding/decoding process.

Let's go ahead and manually encode a simple String to get a good idea of how the process works.

Manual Encoding

For example, let's say we have a String - ABC123 that we'd like to encode in Base64. There are five steps we take during this process:

  1. Take ASCII values of input characters separately:
Character A B C 1 2 3
ASCII (decimal) 65 66 67 49 50 51
  1. Convert the decimal value of each character into an 8-bit binary:
Character A B C 1 2 3
ASCII 65 66 67 49 50 51
Binary 01000001 01000010 01000011 00110001 00110010 00110011
  1. Re-arrange the binary numbers into chunks of 6-bits:

    010000 010100 001001 000011 001100 010011 001000 110011

    If it's impossible to break it up into chunks of six, we have to pad the sequence.

  2. Get the decimal values of these binary chunks:

Binary 010000 010100 001001 000011 001100 010011 001000 110011
Decimal 16 20 9 3 12 19 8 51
  1. Finally, using the Base64 chart, convert each decimal number into a Base64 character:
Value Char Value Char Value Char Value Char
0 A 16 Q 32 g 48 w
1 B 17 R 33 h 49 x
2 C 18 S 34 i 50 y
3 D 19 T 35 j 51 z
4 E 20 U 36 k 52 0
5 F 21 V 37 l 53 1
6 G 22 W 38 m 54 2
7 H 23 X 39 n 55 3
8 I 24 Y 40 o 56 4
9 J 25 Z 41 p 57 5
10 K 26 a 42 q 58 6
11 L 27 b 43 r 59 7
12 M 28 c 44 s 60 8
13 N 29 d 45 t 61 9
14 O 30 e 46 u 62 +
15 P 31 f 47 v 63 /

In the end, we're left with the String QUJDMTIz - the Base64 representation of ABC123.

Note: 6 input characters are encoded into 8 characters in the end. This means that it takes ~33% more space to store a Base64 encoded String as opposed to the original String.

Why use Base64 Encoding?

Computers work with 0s and 1s, so why bother converting this into another format?

Yes, true. Binary is the language of computers. That's exactly why we're converting it. A sequence such as 0010110 can mean many things. It can be part of an image, it can be a part of an audio file or it can be a command that deletes half of your hard drive.

This data must be processed differently, depending on what it's supposed to represent. Also, many servers don't expect raw binary data. Email servers, for example, expect textual data. All emails are encoded before the applications send them.

Encoding Strings with Java 8 - Base64

Java 8 introduced us to a new class - Base64. It supports encoding and decoding a few types of data variants as specified by RFC 2045 and RFC 4648:

  • Basic
  • URL and Filename safe
  • MIME

Basic String Encoding and Decoding

Using the base encoder, we can encode a String into Base64. The base encoder doesn't add a line separator character.

If the String's length is not divisible by 3 (the 8-bit representation won't be able to bundle up in 6-bit groups), padding will be added in the form of = at the end:

Encoder encoder = Base64.getEncoder();
String originalString = "basic";
String encodedString = encoder.encodeToString(originalString.getBytes());

System.out.println(encodedString);

The Base64 is a factory class from which we'll retrieve different implementations/variations of the encoder. The encodeToString() method will encode our input into a Base64 representation of the input, and pack it into a String. You can also use the encode() method to encode it into a byte stream or a ByteBuffer instead.

If required, you can also return an encoder instance that won't pad input Strings via the withoutPadding() method while instantiating it from the factory class:

Base64.getEncoder().withoutPadding();

Note: Encoding without padding will likely result in a Base64 String that can't be decoded back since a portion of the information is lost.

Running the code example will yield:

YmFzaWM=

With our String encoded, let's also instantiate a Decoder and decode the produced String. Though, decoders only work with byte arrays:

Decoder decoder = Base64.getDecoder();
byte[] bytes = decoder.decode(encodedString);
		
System.out.println(new String(bytes));

This code will result in:

basic

URL Encoding and Decoding

For encoding and decoding URLs, we can use an instance of Encoder and Decoder that utilize the URL and Filename safe type of Base64. They don't add a line separator character either.

We get ahold of these through the getUrlEncoder() and getUrlDecoder() methods through the Base64 factory:

Encoder encoder = Base64.getUrlEncoder();
String originalinput = "https://stackabuse.com/tag/java/";
String encodedUrl = encoder.encodeToString(originalinput.getBytes());

System.out.println(encodedUrl);

The same as before, the encodeToString() method will encode the input and pack it into a String:

aHR0cHM6Ly9zdGFja2FidXNlLmNvbS90YWcvamF2YS8=

The encode() method would encode the input into a byte array. Again, the decode() method works with a byte array and decodes the Base64 String into the original one:

Decoder decoder = Base64.getUrlDecoder();
byte[] bytes = decoder.decode(encodedUrl);

System.out.println(new String(bytes));

Running this code yields:

https://stackabuse.com/tag/java/

Note: Three special characters are available in this schema: +, /, =. These characters may have a different meaning in URLs and certain file systems and could cause a problem if the encoded String is used. To make sure we avoid these problems, URL and File Strings should be encoded and decoded using the URL-safe decoder.

The + and / characters are replaced with - and _ respectively.

MIME Encoding and Decoding

Multipurpose Internet Mail Extension (MIME) labels are used to identify the type of media (HTML, audio, video).

Since a lot of media types such as images, documents, etc. are being sent as attachments via email - it is necessary to encode them in a format acceptable to all protocols.

When sending attachments, such as a .txt file, they're encoded into Base64 and attached to the email. Here's how an attachment can look like:

Content-Type: text/plain;
name="log_attachment.txt"
Content-Disposition: attachment;
filename="attachment.txt"
Content-Transfer-Encoding: base64

TUlNRS1WZXJzaW9uOiAxLjANClgtTWFpbGVyOiBNYWlsQmVlLk5FVCA4LjAuNC40MjgNClN1Ympl
Y3Q6IHRlc3Qgc3ViamVjdA0KVG86IGtldmlubUBkYXRhbW90aW9uLmNvbQ0KQ29udGVudC1UeXBl
OiBtdWx0aXBhcnQvYWx0ZXJuYXRpdmU7DQoJYm91bmRhcnk9Ii0tLS09X05leHRQYXJ0XzAwMF9B
RTZCXzcyNUUwOUFGLjg4QjdGOTM0Ig0KDQoNCi0tLS0tLT1fTmV4dFBhcnRfMDAwX0FFNkJfNzI1
RTA5QUYuODhCN0Y5MzQNCkNvbnRlbnQtVHlwZTogdGV4dC9wbGFpbjsNCgljaGFyc2V0PSJ1dGYt
OCINCkNvbnRlbnQtVHJhbnNmZXItRW5jb2Rpbmc6IHF1b3RlZC1wcmludGFibGUNCg0KdGVzdCBi
b2R5DQotLS0tLS09X05leHRQYXJ0XzAwMF9BRTZCXzcyNUUwOUFGLjg4QjdGOTM0DQpDb250ZW50
LVR5cGU6IHRleHQvaHRtbDsNCgljaGFyc2V0PSJ1dGYtOCINCkNvbnRlbnQtVHJhbnNmZXItRW5j
b2Rpbmc6IHF1b3RlZC1wcmludGFibGUNCg0KPHByZT50ZXN0IGJvZHk8L3ByZT4NCi0tLS0tLT1f
TmV4dFBhcnRfMDAwX0FFNkJfNzI1RTA5QUYuODhCN0Y5MzQtLQ0K

When encoding MIME, the output is formatted to be MIME-friendly - the lines can't have more than 76 characters and each line ends with \r\n, except for the last one, naturally. The \r is a carriage return character while the \n is the newline character.

Let's make a file, called file.txt which contains:

"I met a traveller from an antique land,
Who said - “Two vast and trunkless legs of stone
Stand in the desert.... Near them, on the sand,
Half sunk a shattered visage lies, whose frown,
And wrinkled lip, and sneer of cold command,
Tell that its sculptor well those passions read
Which yet survive, stamped on these lifeless things,
The hand that mocked them, and the heart that fed;
And on the pedestal, these words appear:
My name is Ozymandias, King of Kings;
Look on my Works, ye Mighty, and despair!
Nothing beside remains. Round the decay
Of that colossal Wreck, boundless and bare
The lone and level sands stretch far away.”

Ozymandias, by Percy Bysshe Shelley

Now, let's read the bytes of the file and pack them into a byte array, after which we'll encode it:

byte[] bytes = Files.readAllBytes(Paths.get("path/to/file"));
String encodedString = Base64.getMimeEncoder().encodeToString(bytes);
System.out.println(encodedString);

This code will yield us:

IkkgbWV0IGEgdHJhdmVsbGVyIGZyb20gYW4gYW50aXF1ZSBsYW5kLA0KV2hvIHNhaWTigJTigJxU
d28gdmFzdCBhbmQgdHJ1bmtsZXNzIGxlZ3Mgb2Ygc3RvbmUNClN0YW5kIGluIHRoZSBkZXNlcnQu
Li4uIE5lYXIgdGhlbSwgb24gdGhlIHNhbmQsDQpIYWxmIHN1bmsgYSBzaGF0dGVyZWQgdmlzYWdl
IGxpZXMsIHdob3NlIGZyb3duLA0KQW5kIHdyaW5rbGVkIGxpcCwgYW5kIHNuZWVyIG9mIGNvbGQg
Y29tbWFuZCwNClRlbGwgdGhhdCBpdHMgc2N1bHB0b3Igd2VsbCB0aG9zZSBwYXNzaW9ucyByZWFk
DQpXaGljaCB5ZXQgc3Vydml2ZSwgc3RhbXBlZCBvbiB0aGVzZSBsaWZlbGVzcyB0aGluZ3MsDQpU
aGUgaGFuZCB0aGF0IG1vY2tlZCB0aGVtLCBhbmQgdGhlIGhlYXJ0IHRoYXQgZmVkOw0KQW5kIG9u
IHRoZSBwZWRlc3RhbCwgdGhlc2Ugd29yZHMgYXBwZWFyOg0KTXkgbmFtZSBpcyBPenltYW5kaWFz
LCBLaW5nIG9mIEtpbmdzOw0KTG9vayBvbiBteSBXb3JrcywgeWUgTWlnaHR5LCBhbmQgZGVzcGFp
ciENCk5vdGhpbmcgYmVzaWRlIHJlbWFpbnMuIFJvdW5kIHRoZSBkZWNheQ0KT2YgdGhhdCBjb2xv
c3NhbCBXcmVjaywgYm91bmRsZXNzIGFuZCBiYXJlDQpUaGUgbG9uZSBhbmQgbGV2ZWwgc2FuZHMg
c3RyZXRjaCBmYXIgYXdheS7igJ0NCg0KT3p5bWFuZGlhcywgYnkgUGVyY3kgQnlzc2hlIFNoZWxs
ZXk=

Naturally, decoding this content is as easy as:

Decoder decoder = Base64.getMimeDecoder();
byte[] decodedBytes = decoder.decode(encodedString);
System.out.println(new String(decodedBytes));

This code will yield:

"I met a traveller from an antique land,
Who said - “Two vast and trunkless legs of stone
Stand in the desert.... Near them, on the sand,
Half sunk a shattered visage lies, whose frown,
And wrinkled lip, and sneer of cold command,
Tell that its sculptor well those passions read
Which yet survive, stamped on these lifeless things,
The hand that mocked them, and the heart that fed;
And on the pedestal, these words appear:
My name is Ozymandias, King of Kings;
Look on my Works, ye Mighty, and despair!
Nothing beside remains. Round the decay
Of that colossal Wreck, boundless and bare
The lone and level sands stretch far away.”

Ozymandias, by Percy Bysshe Shelley

Apache Commons

Due to its usefulness and prevalence in Java, many projects have Apache Commons included in the classpath. It's a great library with many useful features often used in production - and encoding/decoding is no exception.

Using Maven, adding it to your project is as easy as including the dependency:

<dependency>
    <groupId>commons-codec</groupId>
    <artifactId>commons-codec</artifactId>
    <version>${version}</version>
</dependency>

Or, if you're using Gradle:

compile group: 'commons-codec', name: 'commons-codec', version: '${version}'

Similar to the official Java implementation, the Base64 class is the main API. Though, instead of being used as a factory for Encoder/Decoder instances, the class itself handles all the encoding and decoding. Other useful methods it provides are methods such as isBase64(), isInAlphabet() and isUrlSafe().

Apache Commons Basic String Encoding and Decoding

Let's go ahead and encode a simple String into Base64. The encodeBase64() method accepts a byte array, and returns a byte array, representing the encoded input.

String inputString = "Some String";
byte[] bytes = Base64.encodeBase64(inputString.getBytes());

System.out.println(new String(bytes));

Running this code will yield:

U29tZSBTdHJpbmc=

Alternatively, you can use the encodeBase64String() method, which will return a String, instead of a byte array:

String inputString = "Some String";
String encodedString = Base64.encodeBase64String(inputString.getBytes());

System.out.println(encodedString);

Running this code will yield:

U29tZSBTdHJpbmc=

In the same vein, the Base64 class is used as a decoder as well. It accepts a byte array or String and returns a byte array:

String decodedString = new String(Base64.decodeBase64(encodedString.getBytes()));
System.out.println(decodedString);
// Or
String anotherDecodedString = new String(Base64.decodeBase64(encodedString));
System.out.println(anotherDecodedString);

Running this code will yield:

Some String
Some String

Apache Commons URL Encoding and Decoding

Encoding and decoding URLs follows the same principle:

String originalInput = "https://stackabuse.com/tag/java/";
byte[] bytes = Base64.encodeBase64URLSafe(originalInput.getBytes());

System.out.println(new String(bytes));

Running this will result in:

aHR0cHM6Ly9zdGFja2FidXNlLmNvbS90YWcvamF2YS8=

You can also return a String instead of a byte array:

String originalInput = "https://stackabuse.com/tag/java/";
String encodedString = Base64.encodeBase64URLSafeString(originalInput.getBytes());

System.out.println(encodedString);

The encodedString's contents are:

aHR0cHM6Ly9zdGFja2FidXNlLmNvbS90YWcvamF2YS8=

And finally, we can also decode the Base64 representation of the URL:

String decodedUrl = new String(Base64.decodeBase64(encodedString.getBytes()));
System.out.println(decodedUrl);
// Or
String anotherDecodedUrl = Base64.decodeBase64(encodedString);
System.out.println(anotherDecodedUrl);
https://stackabuse.com/tag/java/
https://stackabuse.com/tag/java/

Apache Commons MIME Encoding and Decoding

Encoding and decoding MIME types is a bit different than in the official Java implementation. This time, we use the encodeBase64Chunked() method:

byte[] bytes = Files.readAllBytes(Paths.get("path/to/file"));
String encodedString = new String(Base64.encodeBase64Chunked(bytes));
System.out.println(encodedString);

This code results in:

IkkgbWV0IGEgdHJhdmVsbGVyIGZyb20gYW4gYW50aXF1ZSBsYW5kLA0KV2hvIHNhaWTigJTigJxU
d28gdmFzdCBhbmQgdHJ1bmtsZXNzIGxlZ3Mgb2Ygc3RvbmUNClN0YW5kIGluIHRoZSBkZXNlcnQu
Li4uIE5lYXIgdGhlbSwgb24gdGhlIHNhbmQsDQpIYWxmIHN1bmsgYSBzaGF0dGVyZWQgdmlzYWdl
IGxpZXMsIHdob3NlIGZyb3duLA0KQW5kIHdyaW5rbGVkIGxpcCwgYW5kIHNuZWVyIG9mIGNvbGQg
Y29tbWFuZCwNClRlbGwgdGhhdCBpdHMgc2N1bHB0b3Igd2VsbCB0aG9zZSBwYXNzaW9ucyByZWFk
DQpXaGljaCB5ZXQgc3Vydml2ZSwgc3RhbXBlZCBvbiB0aGVzZSBsaWZlbGVzcyB0aGluZ3MsDQpU
aGUgaGFuZCB0aGF0IG1vY2tlZCB0aGVtLCBhbmQgdGhlIGhlYXJ0IHRoYXQgZmVkOw0KQW5kIG9u
IHRoZSBwZWRlc3RhbCwgdGhlc2Ugd29yZHMgYXBwZWFyOg0KTXkgbmFtZSBpcyBPenltYW5kaWFz
LCBLaW5nIG9mIEtpbmdzOw0KTG9vayBvbiBteSBXb3JrcywgeWUgTWlnaHR5LCBhbmQgZGVzcGFp
ciENCk5vdGhpbmcgYmVzaWRlIHJlbWFpbnMuIFJvdW5kIHRoZSBkZWNheQ0KT2YgdGhhdCBjb2xv
c3NhbCBXcmVjaywgYm91bmRsZXNzIGFuZCBiYXJlDQpUaGUgbG9uZSBhbmQgbGV2ZWwgc2FuZHMg
c3RyZXRjaCBmYXIgYXdheS7igJ0NCg0KT3p5bWFuZGlhcywgYnkgUGVyY3kgQnlzc2hlIFNoZWxs
ZXk=

Decoding this is the same as before:

String decodedMime = new String(Base64.decodeBase64(encodedString.getBytes()));
System.out.println(decodedMime);
// Or
String anotherDecodedMime = Base64.decodeBase64(encodedString);
System.out.println(anotherDecodedMime);

This code results in:

"I met a traveller from an antique land,
Who said - “Two vast and trunkless legs of stone
Stand in the desert.... Near them, on the sand,
Half sunk a shattered visage lies, whose frown,
And wrinkled lip, and sneer of cold command,
Tell that its sculptor well those passions read
Which yet survive, stamped on these lifeless things,
The hand that mocked them, and the heart that fed;
And on the pedestal, these words appear:
My name is Ozymandias, King of Kings;
Look on my Works, ye Mighty, and despair!
Nothing beside remains. Round the decay
Of that colossal Wreck, boundless and bare
The lone and level sands stretch far away.”

Ozymandias, by Percy Bysshe Shelley

Apache Commons Helper Methods

In true Apache Commons fashion, we're introduced to a few helper methods that make our life just a bit easier. For example, we can check if a String or a byte array are in valid Base64:

String originalInput = "https://stackabuse.com/tag/java/";
byte[] bytes = Base64.encodeBase64URLSafe(originalInput.getBytes());

System.out.println(Base64.isBase64(bytes));
// Or
System.out.println(Base64.isBase64(encodedString));

This code would result in:

true 
true

We can check if we're in URL-safe mode or not:

System.out.println(Base64.isUrlSafe());

Since we've encoded a URL String via the encodeBase64URLSafe() method, this code results in:

true

And finally, we can check if each individual byte from our array is in the Base64 alphabet:

for (int i = 0; i < bytes.length; i++) {
    System.out.println(Base64.isInAlphabet(bytes[i]));
}

Conclusion

In this article, we've dived into what encoding and decoding is, followed by an overview of Base64. We've manually encoded a String to get a better understanding of the process, preceding many code examples.

Using the Base64 class in Java allows us to make different types of encoders and decoders - optimized for encoding and decoding basic Strings, URLs and MIME types.

Using the Base64 class provided from Apache Commons, we rely on the base class to do the encoding directly.