Reading and Writing CSVs in Java with OpenCSV

Introduction

This is the final article in a short series dedicated to Libraries for Reading and Writing CSVs in Java, and a direct continuation from the previous article - Reading and Writing CSVs in Java with Apache Commons CSV.

OpenCSV

OpenCSV is one of the simplest and easiest CSV parsers to understand, using standard Reader/Writer classes and offering a CSVReader implementation on top.

Just like Apache Commons CSV, OpenCSV operates with an Apache 2.0 license. Before downloading and deciding whether to use OpenCSVs parsers, you can browse through the source code and Java docs, and even check out their JUnit test suite, which is included in their git repository.

OpenCSV is also included in the MVNRepository, making dependency management straightforward.

The CSVReader allows fetching a single record at a time, multiple records as a list or as an iterator, making it flexible in terms of usability of the read data. The library also includes handy features such as reading, writing to and from beans, and direct mapping from a CSV to a Java Map using the header row.

OpenCSV does not have as wide a variety of pre-defined formats as Apache Commons CSV. It relies on two parsers:

  • CSVParser - The original parser defined in OpenCSV. This works for most simple parsing instances but fails if there are escape characters defined as part of the record itself.
  • RFC4180Parser - similar to the CSVFormat.RFC4180 parser in Apache Commons CSV. Works on CSV files which are formatted according to the specifications of RFC 4180. This version of the parser considers all characters between the opening and closing quotation marks as content, except for the double quote character, which needs to be escaped with another double quote.

Reading CSVs with OpenCSV

Reading CSVs with OpenCSV is faster than with Apache Commons CSV because the CSVWriter is implemented to be multi-threaded, when using the CSVToBean.parse() method.

The CSVReader is also implemented using Java Iterable, so it is possible to manage both memory and time constraints based on the implementation method you choose.

OpenCSV has two objects types for reading CSVs - CSVReader, and its sub class CSVReaderHeaderAware.

CSVReader is similar to its Apache Commons CSV CSVParser counterpart and can be used for both simple and complicated parsing scenarios.

To iterate through each record in a CSV file, where record will be a string array with the comma separated values split into individual fields:

CSVReader csvReader = new CSVReader (new InputStreamReader(csvFile.getInputStream()));
while ((record = csvReader.readNext()) != null) {
    // do something
}

If your CSV is delimited by a character other than a comma, you can use the two-parameter constructor instead, and specify the delimiter you want the CSVReader to use.

For example if your CSV contains tab separated values, you can initialize the CSVReader as follows:

CSVReader csvReader = new CSVReader(new InputStreamReader(csvFile.getInputStream()), '\t');

OpenCSV also has a more complicated way of parsing CSV files which involves implementing beans to map the fields in a CSV, and then use annotations for identifying the types of records with either header-based, or position-based annotations.

This helps because it allows the records of a CSV to be processed as a common dataset, instead of as a collection of individual fields.

If the header names of the file being processed are consistent, you can annotate the columns using the @CSVBindByName annotation and allow OpenCSV to take care of the mapping and copying side of processing the parsed data.

For example with our tree dataset:

public class Trees {
    @CSVBindByName
    private int index;

    @CSVBindByName
    private int girth;

    @CSVBindByName
    private int height;

    @CSVBindByName
    private int volume;

    public int getIndex() {
        return this.index;
    }

    public void setIndex(int newIndex) {
        this.index = newIndex;
    }
    ...
}

As long as your CSV file contains a header named with the variable names in our class declaration, OpenCSV can parse and read data into the corresponding element, with type conversions automatically handled:

List<Trees> treeParser = new CSVToBeanBuilder(FileReader("somefile.csv")).withType(Trees.class).build().parse();

Validations can be added to the getter and setter methods where needed, and mandatory fields can be specified by setting the required flag on the annotation.

If the header name is slightly different from the name of the variable, the String can be set in the annotation as well. The ability to map the header name when the column name is different is useful in our example since our actual dataset contains the unit of measure of the field, along with a space and punctuation characters which are not allowed in standard Java variable names.

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

The flag and mapping can be specified with the annotation in this case:

...
    @CSVBindByName (column = "Girth (in)", required = true)
    private int girth;
...

If your CSV file does not have a header, you can map by column position along with the @CSVBindByPosition annotation.

Keep in mind the OpenCSV positions are 0-based:

public class Trees{
    @CSVBindByPosition(position = 0, required = true)
    private int index;

    @CSVBindByPosition(position = 1, required = true)
    private int girth;

    @CSVBindByPosition(position = 2)
    private int height;

    @CSVBindByPosition(position = 3)
    private int volume;
}

If you want to handle more complicated scenarios, you can implement a class with the MappingStrategy Interface and define the translation or mapping schema that suits your parsing scenario.

Writing CSVs with OpenCSV

OpenCSV has more options than Apache Commons CSV when it comes to writing data to CSV files. It allows you to either write from an array of Strings, or write from a list of objects.

Writing from a list of objects requires that the objects be initialized and declared beforehand. So to keep things simple, let's consider working with an array of strings.

To generate a CSV file with data from an array of strings:

CSVWriter csvWriter = new CSVWriter(new FileWriter("new.csv"), ',');
String[] records = "Index.Girth.Height.Volume".split(".");
csvWriter.writeNext(records);
csvWriter.close();

OpenCSV works with the concept that CSV is not just comma-separated values; it allows you to define which delimiter you want to use in the file as a parameter in the CSVWriter constructor.

Similarly, when defining a String array, you may find it useful to declare a String and then separate it into values based on a delimiter. This is especially useful when you need to copy a selected subset of data rows from one CSV or database file to another.

When initializing the CSVWriter, the FileWriter or Writer is mandatory. Initializing the writer using just one parameter results in a default comma separated file.

There are some additional parameters for specific use cases:

  • Char separator - the delimiter. If undeclared the default delimiter will be a comma.
  • Char quotechar - the quotation character. This will be used in case your dataset contains a value with a comma as part of the dataset, and you need to generate a comma separated file. Generally either double quotes, single quotes or slashes are used as quote characters.
  • Char escapechar - This is generally used to escape the quotechar.
  • String lineend - the string or character that determines the end of a line of data.

You could construct the CSVWriter including all optional parameters:

CSVWriter csvWriter = new CSVWriter(new FileWriter("new.csv"), ",", "'","/", "\n");

CSVWriter also has some fields that you can pass as parameters to the constructor. You can define these values as constants and reuse the characters and strings across your codebase to preserve consistency.

For example after declaring:

CSVWriter.DEFAULT_SEPARATOR = ",";
CSVWriter.DEFAULT_QUOTE_CHARACTER = "'";
CSVWriter.DEFAULT_ESCAPE_CHARACTER = "/";
CSVWriter.DEFAULT_LINE_END = "\n";

You could use:

CSVWriter csvWriter = new CSVWriter(new FileWriter("new.csv"), CSVWriter.DEFAULT_SEPARATOR, CSVWriter.DEFAULT_QUOTE_CHARACTER, CSVWriter.DEFAULT_ESCAPE_CHARACTER, CSVWriter.DEFAULT_LINE_END);

Or make use of OpenCSV using the default values if values are not explicitly defined in the constructor and simply call:

CSVWriter csvWriter = new CSVWriter(new FileWriter("new.csv"));

So if your data includes a line with a username and an address, for example: JohnDoe, 19/2, ABC Street, Someplace, the actual string format that you'd need it to be in is "JohnDoe", "19//2/, ABC Street/, Someplace".

Conclusion

OpenCSV is one of the simplest and easiest CSV parsers to understand, using standard Reader/Writer classes and offering a CSVReader implementation on top.

Last Updated: February 20th, 2019
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Make Clarity from Data - Quickly Learn Data Visualization with Python

Learn the landscape of Data Visualization tools in Python - work with Seaborn, Plotly, and Bokeh, and excel in Matplotlib!

From simple plot types to ridge plots, surface plots and spectrograms - understand your data and learn to draw conclusions from it.

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms