In Computer Science, a file is a resource used to record data discretely in a computer’s storage device. In Java, a resource is usually an object implementing the AutoCloseable
interface.
Reading files and resources have many usages:
- Statistics, Analytics, and Reports
- Machine Learning
- Dealing with large text files or logs
Sometimes, these files can be absurdly large, with giga-bytes or tera-bytes being stored, and reading through them in entirety is inefficient.
Being able to read a file line by line gives us the ability to seek only the relevant information and stop the search once we have found what we're looking for. It also allows us to break up the data into logical pieces, like if the file was CSV-formatted.
There are a few different options to choose from when you need to read a file line by line.
Scanner
One of the easiest ways of reading a file line by line in Java could be implemented by using the Scanner class. A Scanner breaks its input into tokens using a delimiter pattern, which in our case is the newline character:
Scanner scanner = new Scanner(new File("filename"));
while (scanner.hasNextLine()) {
String line = scanner.nextLine();
// process the line
}
The hasNextLine()
method returns true
if there is another line in the input of this scanner, but the scanner itself does not advance past any input or read any data at this point.
To read the line and move on, we should use the nextLine()
method. This method advances the scanner past the current line and returns the input that wasn't reached initially. This method returns the rest of the current line, excluding any line separator at the end of the line. The read position is then set to the beginning of the next line, which will be read and returned upon calling the method again.
Since this method continues to search through the input looking for a line separator, it may buffer all of the input while searching for the end of the line if no line separators are present.
Buffered Reader
The BufferedReader class represents an efficient way of reading the characters, arrays, and lines from a character-input stream.
As described in the naming, this class uses a buffer. The default amount of data that is buffered is 8192 bytes, but it could be set to a custom size for performance reasons:
BufferedReader br = new BufferedReader(new FileReader(file), bufferSize);
The file, or rather an instance of a File
class, isn't an appropriate data source for the BufferedReader
, so we need to use a FileReader
, which extends InputStreamReader
. It is a convenience class for reading information from text files and isn't necessarily suitable for reading a raw stream of bytes:
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
String line;
while ((line = br.readLine()) != null) {
// process the line
}
}
The initialization of a buffered reader was written using the try-with-resources syntax, specific to Java 7 or higher. If you're using an older version, you should initialize the br
variable before the try
statement and close it in the finally
block.
Here's an example of the previous code without the try-with-resources syntax:
BufferedReader br = new BufferedReader(new FileReader(file));
try {
String line;
while ((line = br.readLine()) != null) {
// process the line
}
} finally {
br.close();
}
The code will loop through the lines of the provided file and stop when it meets the null
line, which is the end of the file.
Don't get confused as the null
isn't equal to an empty line and the file will be read until the end.
The lines Method
A BufferedReader
class also has a lines
method that returns a Stream
. This stream contains lines that were read by the BufferedReader
, as its elements.
You can easily convert this stream into a list if you need to:
List<String> list = new ArrayList<>();
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
list = br.lines().collect(Collectors.toList());
}
Reading through this list is the same as reading through a Stream, which are covered in the next section:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
list.forEach(System.out::println);
Java 8 Streams
If you're already familiar with the Java 8 Streams, you can use them as a cleaner alternative to the legacy loop:
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
stream.forEach(System.out::println);
}
Here we're using try-with-resources syntax once again, initializing a lines stream with the Files.lines()
static helper method. The System.out::println
method reference is used for the demo purposes, and you should replace it with whatever code you'll be using to process your lines of text.
In addition to a clean API, streams are very useful when you want to apply multiple operations to the data or filter something out.
Let's assume we have a task to print all of the lines that are found in a given text file and end with the "/" character. The lines should be transformed to the uppercase and sorted alphabetically.
By modifying our initial "Streams API" example we'll get a very clean implementation:
try (Stream<String> stream = Files.lines(Paths.get(fileName))) {
stream
.filter(s -> s.endswith("/"))
.sorted()
.map(String::toUpperCase)
.forEach(System.out::println);
}
The filter()
method returns a stream consisting of the elements of this stream that match the given predicate. In our case we're leaving only those that end with the "/".
The map()
method returns a stream consisting of the results of applying the given function to the elements of this stream.
The toUpperCase()
method of a String
class helps us to achieve the desired result and is being used here as a method reference, just like the println
call from our previous example.
The sorted()
method returns a stream consisting of the elements of this stream, sorted according to the natural order. You're also able to provide a custom Comparator
, and in that case sorting will be performed according to it.
While the order of operations could be changed for the filter()
, sorted()
, and map()
methods, the forEach()
should be always placed in the end as it's a terminal operation. It returns void
and for that matter, nothing can be chained to it further.
Apache Commons
If you're already using Apache Commons in your project, you might want to utilize the helper that reads all the lines from a file into a List<String>
:
List<String> lines = FileUtils.readLines(file, "UTF-8");
for (String line: lines) {
// process the line
}
Remember, that this approach reads all lines from the file into the lines
list and only then the execution of the for
loop starts. It might take a significant amount of time, and you should think twice before using it on large text files.
Conclusion
There are multiple ways of reading a file line by line in Java, and the selection of the appropriate approach is entirely a programmer's decision. You should think of the size of the files you plan to process, performance requirements, code style and libraries that are already in the project. Make sure to test on some corner cases like huge, empty, or non-existent files, and you'll be good to go with any of the provided examples.