Java: Read a File into an ArrayList

Introduction

There are many ways to go about Reading and Writing Files in Java.

We typically have some data in memory, on which we perform operations, and then persist in a file. However, if we want to change that information, we need to put the contents of the file back into memory and perform operations.

If, for an example, our file contains a long list that we want to sort, we'll have to read it into an adequate data structure, perform operations, and then persist it once again - in this case an ArrayList.

This can be achieved with several different approaches:

  • Files.readAllLines()
  • FileReader
  • Scanner
  • BufferedReader
  • ObjectInputStream
  • Java Streams API

Files.readAllLines()

Since Java 7, it's possible to load all lines of a file into an ArrayList in a very simple way:

try {
    ArrayList<String> lines = new ArrayList<>(Files.readAllLines(Paths.get(fileName)));
}
catch (IOException e) {
    // Handle a potential exception
}

We can also specify a charset to handle different formats of text, if necessary:

try {
    Charset charset = StandardCharsets.UTF_8;
    ArrayList<String> lines = new ArrayList<>(Files.readAllLines(Paths.get(fileName), charset));
}
catch (IOException e) {
    // Handle a potential exception
}

Files.readAllLines() opens and closes the necessary resources automatically.

Scanner

As nice and simple as the previous method was, it's only useful for reading the file line by line. What would happen if all the data was stored in a single line?

Scanner is an easy to use tool for parsing primitive types and Strings. Using Scanner can be as simple or as hard as the developer wants to make it.

A simple example of when we'd prefer to use Scanner would be if our file had only one line, and the data needs to be parsed into something usable.

A delimiter is a sequence of characters that Scanner uses to separate values. By default, it uses a series of spaces/tabs as a delimiter (whitespace between values), but we can declare our own delimiter and use it to parse the data.

Let's take a look at an example file:

some-2123-different-values- in - this -text-with a common-delimiter

In such a case, it's easy to notice that all the values have a common delimiter. We can simply declare that "-" surrounded by any number of whitespaces is our delimiter.

// We'll use "-" as our delimiter
ArrayList<String> arrayList = new ArrayList<>();
try (Scanner s = new Scanner(new File(fileName)).useDelimiter("\\s*-\\s*")) {
    // \\s* in regular expressions means "any number or whitespaces".
    // We could've said simply useDelimiter("-") and Scanner would have
    // included the whitespaces as part of the data it extracted.
    while (s.hasNext()) {
        arrayList.add(s.next());
    }
}
catch (FileNotFoundException e) {
    // Handle the potential exception
}

Running this piece of code would yield us an ArrayList with these items:

[some, 2, different, values, in, this, text, with a common, delimiter]

On the other hand, if we had only used the default delimiter (whitespace), the ArrayList would look like:

[some-2-different-values-, in, -, this, -text-with, a, common-delimiter]

Scanner has some useful functions for parsing data, such as nextInt(), nextDouble(), etc.

Important: Calling .nextInt() will NOT return the next int value that can be found in the file! It will return an int value only if the next items the Scanner "scans" is a valid int value, otherwise an exception will be thrown. An easy way to make sure an exception doesn't arise is to perform a corresponding "has" check - like .hasNextInt() before actually using .nextInt().

Even though we don't see that when we call functions like scanner.nextInt() or scanner.hasNextDouble(), Scanner uses regular expressions in the background.

Very Important: An extremely common mistake with using Scanner occurs when working with files that have multiple lines and using .nextLine() in conjunction with .nextInt(),nextDouble(), etc.

Let's take a look at another file:

12
some data we want to read as a string in one line
10

Oftentimes, newer developers who use Scanner would write code like:

try (Scanner scanner = new Scanner(new File("example.txt"))) {
    int a = scanner.nextInt();
    String s = scanner.nextLine();
    int b = scanner.nextInt();

    System.out.println(a + ", " + s + ", " + b);
}
catch (FileNotFoundException e) {
    // Handle a potential exception
}
//catch (InputMismatchException e) {
//    // This will occur in the code above
//}

This code seems to be logically sound - we read an integer from the file, then the following line, then the second integer. If you try to run this code, the InputMismatchException will be thrown without an obvious reason.

If you start debugging and printing what you've scanned, you'll see that int a loaded well, but that String s is empty.

Why is that? The first important thing to note is that once Scanner reads something from the file, it continues scanning the file from the first character after the data it previously scanned.

For example, if we had "12 13 14" in a file and called .nextInt() once, the scanner would afterward pretend as if there was only " 13 14" in the file. Notice that the space between "12" and "13" is still present.

The second important thing to note - the first line in our example.txt file doesn't only contain the number 12, it contains what it called a "newline character", and it is actually 12\n instead of just 12.

Our file, in reality, looks like this:

12\n
some data we want to read as a string in one line\n
10

When we first call .nextInt(), Scanner reads only the number 12, and leaves the first \n unread.

.nextLine() then reads all the characters that scanner hasn't read yet until it reaches the first \n character, which it skips over and then returns the characters it read. This is exactly what the problem is in our case - we have a leftover \n character after reading the 12.

So when we call .nextLine() we get an empty string as a result since Scanner doesn't add the \n character to the string it returns.

Now the Scanner is at the beginning of the second line in our file, and when we try to call .nextInt(), Scanner encounters something that can't be parsed to an int and throws the aforementioned InputMismatchException.

Solutions

  • Since we know what exactly is wrong in this code, we can hardcode a workaround. We'll simply "consume" the newline character between .nextInt() and .nextLine():
...
int a = scanner.nextInt();
scanner.nextLine(); // Simply consumes the bothersome \n
String s = scanner.nextLine();
...
  • Given that we know how example.txt is formatted we can read the entire file line by line and parse the necessary lines using Integer.parseInt():
...
int a = Integer.parseInt(scanner.nextLine());
String s = scanner.nextLine();
int b = Integer.parseInt(scanner.nextLine());
...

BufferedReader

BufferedReader reads text from a character-input stream, but it does so by buffering characters in order to provide efficient .read() operations. Since accessing an HDD is a very time-consuming operation, BufferedReader gathers more data than we ask for, and stores it in a buffer.

The idea is that when we call .read() (or similar operation) we are likely to read again soon from the same block of data from which we've just read, and so "surrounding" data is stored in a buffer. In case we wanted to read it, we'd read it directly from the buffer instead of from the disk, which is much more efficient.

This brings us to what BufferedReader is good for - reading large files. BufferedReader has a significantly larger buffer memory than Scanner (8192 characters by default vs 1024 characters by default, respectively).

BufferedReader is used as a wrapper for other Readers, and so constructors for BufferedReader take a Reader object as a parameter, such as a FileReader.

We're using try-with-resources so we don't have to close the reader manually:

ArrayList<String> arrayList = new ArrayList<>();

try (BufferedReader reader = new BufferedReader(new FileReader(fileName))) {
    while (reader.ready()) {
        arrayList.add(reader.readLine());
    }
}
catch (IOException e) {
    // Handle a potential exception
}

It's advised to wrap a FileReader with a BufferedReader, exactly due to the performance benefits.

ObjectInputStream

ObjectInputStream should only be used alongside ObjectOutputStream. What these two classes help us accomplish is to store an object (or array of objects) into a file, and then easily read from that file.

This can only be done with classes that implement the Serializable interface. The Serializable interface has no methods or fields and serves only to identify the semantics of being serializable:

public static class MyClass implements Serializable {
    int someInt;
    String someString;

    public MyClass(int someInt, String someString) {
        this.someInt = someInt;
        this.someString = someString;
    }
}

public static void main(String[] args) throws IOException, ClassNotFoundException {
    // The file extension doesn't matter in this case, since they're only there to tell
    // the OS with what program to associate a particular file
    ObjectOutputStream objectOutputStream =
        new ObjectOutputStream(new FileOutputStream("data.olivera"));

    MyClass first = new MyClass(1, "abc");
    MyClass second = new MyClass(2, "abc");

    objectOutputStream.writeObject(first);
    objectOutputStream.writeObject(second);
    objectOutputStream.close();

    ObjectInputStream objectInputStream =
                new ObjectInputStream(new FileInputStream("data.olivera"));

    ArrayList<MyClass> arrayList = new ArrayList<>();

    try (objectInputStream) {
        while (true) {
            Object read = objectInputStream.readObject();
            if (read == null)
                break;

            // We should always cast explicitly
            MyClass myClassRead = (MyClass) read;
            arrayList.add(myClassRead);
        }
    }
    catch (EOFException e) {
        // This exception is expected
    }

    for (MyClass m : arrayList) {
        System.out.println(m.someInt + " " + m.someString);
    }
}

Java Streams API

Since Java 8, another quick and easy way to load the contents of a file into an ArrayList would be using the Java Streams API:

// Using try-with-resources so the stream closes automatically
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
    ArrayList<String> arrayList = stream.collect(Collectors.toCollection(ArrayList::new));
}
catch (IOException e) {
    // Handle a potential exception
}

Though, keep in mind that this approach, just like Files.readAllLines() would only work if the data is stored in lines.

The code above doesn't do anything special, and we'd rarely use streams this way. However, since we're loading this data into an ArrayList so that we can process it in the first place - streams provide an excellent way of doing this.

We can easily sort/filter/map the data before storing it in an ArrayList:

try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
    ArrayList<String> arrayList = stream.map(String::toLowerCase)
                                        .filter(line -> !line.startsWith("a"))
                                        .sorted(Comparator.comparing(String::length))
                                        .collect(Collectors.toCollection(ArrayList::new));
}
catch (IOException e) {
    // Handle a potential exception
}

Conclusion

There are several different ways in which you can read data from a file into an ArrayList. When you only need to read the lines as elements use Files.readAllLines; when you have data that can be easily parsed use Scanner; when working with large files use FileReader wrapped with BufferedReader; when dealing with an array of objects use ObjectInputStream (but make sure the data was written using ObjectOutputStream).

Author image