Introduction
There are many ways to go about Reading and Writing Files in Java.
We typically have some data in memory, on which we perform operations, and then persist in a file. However, if we want to change that information, we need to put the contents of the file back into memory and perform operations.
If, for an example, our file contains a long list that we want to sort, we'll have to read it into an adequate data structure, perform operations, and then persist it once again - in this case an ArrayList
.
This can be achieved with several different approaches:
Files.readAllLines()
FileReader
Scanner
BufferedReader
ObjectInputStream
- Java Streams API
Files.readAllLines()
Since Java 7, it's possible to load all lines of a file into an ArrayList
in a very simple way:
try {
ArrayList<String> lines = new ArrayList<>(Files.readAllLines(Paths.get(fileName)));
}
catch (IOException e) {
// Handle a potential exception
}
We can also specify a charset
to handle different formats of text, if necessary:
try {
Charset charset = StandardCharsets.UTF_8;
ArrayList<String> lines = new ArrayList<>(Files.readAllLines(Paths.get(fileName), charset));
}
catch (IOException e) {
// Handle a potential exception
}
Files.readAllLines()
opens and closes the necessary resources automatically.
Scanner
As nice and simple as the previous method was, it's only useful for reading the file line by line. What would happen if all the data was stored in a single line?
Scanner
is an easy to use tool for parsing primitive types and Strings. Using Scanner
can be as simple or as hard as the developer wants to make it.
A simple example of when we'd prefer to use Scanner
would be if our file had only one line, and the data needs to be parsed into something usable.
A delimiter is a sequence of characters that Scanner
uses to separate values. By default, it uses a series of spaces/tabs as a delimiter (whitespace between values), but we can declare our own delimiter and use it to parse the data.
Let's take a look at an example file:
some-2123-different-values- in - this -text-with a common-delimiter
In such a case, it's easy to notice that all the values have a common delimiter. We can simply declare that "-" surrounded by any number of whitespaces is our delimiter.
// We'll use "-" as our delimiter
ArrayList<String> arrayList = new ArrayList<>();
try (Scanner s = new Scanner(new File(fileName)).useDelimiter("\\s*-\\s*")) {
// \\s* in regular expressions means "any number or whitespaces".
// We could've said simply useDelimiter("-") and Scanner would have
// included the whitespaces as part of the data it extracted.
while (s.hasNext()) {
arrayList.add(s.next());
}
}
catch (FileNotFoundException e) {
// Handle the potential exception
}
Running this piece of code would yield us an ArrayList
with these items:
[some, 2, different, values, in, this, text, with a common, delimiter]
On the other hand, if we had only used the default delimiter (whitespace), the ArrayList
would look like:
[some-2-different-values-, in, -, this, -text-with, a, common-delimiter]
Scanner
has some useful functions for parsing data, such as nextInt()
, nextDouble()
, etc.
Important: Calling .nextInt()
will NOT return the next int
value that can be found in the file! It will return an int
value only if the next items the Scanner
"scans" is a valid int
value, otherwise an exception will be thrown. An easy way to make sure an exception doesn't arise is to perform a corresponding "has" check - like .hasNextInt()
before actually using .nextInt()
.
Even though we don't see that when we call functions like scanner.nextInt()
or scanner.hasNextDouble()
, Scanner
uses regular expressions in the background.
Very Important: An extremely common mistake with using Scanner
occurs when working with files that have multiple lines and using .nextLine()
in conjunction with .nextInt()
,nextDouble()
, etc.
Let's take a look at another file:
12
some data we want to read as a string in one line
10
Oftentimes, newer developers who use Scanner
would write code like:
try (Scanner scanner = new Scanner(new File("example.txt"))) {
int a = scanner.nextInt();
String s = scanner.nextLine();
int b = scanner.nextInt();
System.out.println(a + ", " + s + ", " + b);
}
catch (FileNotFoundException e) {
// Handle a potential exception
}
//catch (InputMismatchException e) {
// // This will occur in the code above
//}
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
This code seems to be logically sound - we read an integer from the file, then the following line, then the second integer. If you try to run this code, the InputMismatchException
will be thrown without an obvious reason.
If you start debugging and printing what you've scanned, you'll see that int a
loaded well, but that String s
is empty.
Why is that? The first important thing to note is that once Scanner
reads something from the file, it continues scanning the file from the first character after the data it previously scanned.
For example, if we had "12 13 14" in a file and called .nextInt()
once, the scanner would afterward pretend as if there was only " 13 14" in the file. Notice that the space between "12" and "13" is still present.
The second important thing to note - the first line in our example.txt
file doesn't only contain the number 12
, it contains what it called a "newline character", and it is actually 12\n
instead of just 12
.
Our file, in reality, looks like this:
12\n
some data we want to read as a string in one line\n
10
When we first call .nextInt()
, Scanner
reads only the number 12, and leaves the first \n
unread.
.nextLine()
then reads all the characters that scanner hasn't read yet until it reaches the first \n
character, which it skips over and then returns the characters it read. This is exactly what the problem is in our case - we have a leftover \n
character after reading the 12
.
So when we call .nextLine()
we get an empty string as a result since Scanner
doesn't add the \n
character to the string it returns.
Now the Scanner
is at the beginning of the second line in our file, and when we try to call .nextInt()
, Scanner
encounters something that can't be parsed to an int
and throws the aforementioned InputMismatchException
.
Solutions
- Since we know what exactly is wrong in this code, we can hardcode a workaround. We'll simply "consume" the newline character between
.nextInt()
and.nextLine()
:
...
int a = scanner.nextInt();
scanner.nextLine(); // Simply consumes the bothersome \n
String s = scanner.nextLine();
...
- Given that we know how
example.txt
is formatted we can read the entire file line by line and parse the necessary lines usingInteger.parseInt()
:
...
int a = Integer.parseInt(scanner.nextLine());
String s = scanner.nextLine();
int b = Integer.parseInt(scanner.nextLine());
...
BufferedReader
BufferedReader
reads text from a character-input stream, but it does so by buffering characters in order to provide efficient .read()
operations. Since accessing an HDD is a very time-consuming operation, BufferedReader
gathers more data than we ask for, and stores it in a buffer.
The idea is that when we call .read()
(or similar operation) we are likely to read again soon from the same block of data from which we've just read, and so "surrounding" data is stored in a buffer. In case we wanted to read it, we'd read it directly from the buffer instead of from the disk, which is much more efficient.
This brings us to what BufferedReader
is good for - reading large files. BufferedReader
has a significantly larger buffer memory than Scanner
(8192 characters by default vs 1024 characters by default, respectively).
BufferedReader
is used as a wrapper for other Readers, and so constructors for BufferedReader
take a Reader object as a parameter, such as a FileReader
.
We're using try-with-resources so we don't have to close the reader manually:
ArrayList<String> arrayList = new ArrayList<>();
try (BufferedReader reader = new BufferedReader(new FileReader(fileName))) {
while (reader.ready()) {
arrayList.add(reader.readLine());
}
}
catch (IOException e) {
// Handle a potential exception
}
It's advised to wrap a FileReader
with a BufferedReader
, exactly due to the performance benefits.
ObjectInputStream
ObjectInputStream
should only be used alongside ObjectOutputStream
. What these two classes help us accomplish is to store an object (or array of objects) into a file, and then easily read from that file.
This can only be done with classes that implement the Serializable
interface. The Serializable
interface has no methods or fields and serves only to identify the semantics of being serializable:
public static class MyClass implements Serializable {
int someInt;
String someString;
public MyClass(int someInt, String someString) {
this.someInt = someInt;
this.someString = someString;
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException {
// The file extension doesn't matter in this case, since they're only there to tell
// the OS with what program to associate a particular file
ObjectOutputStream objectOutputStream =
new ObjectOutputStream(new FileOutputStream("data.olivera"));
MyClass first = new MyClass(1, "abc");
MyClass second = new MyClass(2, "abc");
objectOutputStream.writeObject(first);
objectOutputStream.writeObject(second);
objectOutputStream.close();
ObjectInputStream objectInputStream =
new ObjectInputStream(new FileInputStream("data.olivera"));
ArrayList<MyClass> arrayList = new ArrayList<>();
try (objectInputStream) {
while (true) {
Object read = objectInputStream.readObject();
if (read == null)
break;
// We should always cast explicitly
MyClass myClassRead = (MyClass) read;
arrayList.add(myClassRead);
}
}
catch (EOFException e) {
// This exception is expected
}
for (MyClass m : arrayList) {
System.out.println(m.someInt + " " + m.someString);
}
}
Java Streams API
Since Java 8, another quick and easy way to load the contents of a file into an ArrayList
would be using the Java Streams API:
// Using try-with-resources so the stream closes automatically
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
ArrayList<String> arrayList = stream.collect(Collectors.toCollection(ArrayList::new));
}
catch (IOException e) {
// Handle a potential exception
}
Though, keep in mind that this approach, just like Files.readAllLines()
would only work if the data is stored in lines.
The code above doesn't do anything special, and we'd rarely use streams this way. However, since we're loading this data into an ArrayList
so that we can process it in the first place - streams provide an excellent way of doing this.
We can easily sort/filter/map the data before storing it in an ArrayList
:
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
ArrayList<String> arrayList = stream.map(String::toLowerCase)
.filter(line -> !line.startsWith("a"))
.sorted(Comparator.comparing(String::length))
.collect(Collectors.toCollection(ArrayList::new));
}
catch (IOException e) {
// Handle a potential exception
}
Conclusion
There are several different ways in which you can read data from a file into an ArrayList
. When you only need to read the lines as elements use Files.readAllLines
; when you have data that can be easily parsed use Scanner
; when working with large files use FileReader
wrapped with BufferedReader
; when dealing with an array of objects use ObjectInputStream
(but make sure the data was written using ObjectOutputStream
).