home > tutorial > java

Extracting Data in Java: How to read text files_

It’s 2017, and I just completed my first year of Computer Science at Stellenbosch University. Under the tutelage of Prof Geldenhuys, I learned Java and built two major projects:

  • A two-player Sudoku game called Twoduku.
  • A two-player game called Robots vs. Cats, based on the popular Plants vs. Zombies.

One thing I never needed in these projects was the extraction of data from text files as all gameplay took place on a Graphical User Interface(GUI). This is unfortunate, as I had just started working on a football prediction application in Java and all the match data is in text files. To answer questions like:

  • How many goals did Manchester United score in the 2015/2016 season?
  • Which team scored the most goals in the English Premier League in the 2015/2016 season?
  • What are the average goals scored by each team in the 2015/2016 season?

I will need to extract the match data from the text files and use Java constructs like conditional statements, loops and basic arithmetic to compute the answers to those questions. But how do I read the data from these text files?

Example of a match data text file

Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR
28/11/2015,Man City,Southampton,3,1,H
28/11/2015,Sunderland,Stoke,2,0,H
29/11/2015,Liverpool,Swansea,1,0,H
29/11/2015,Norwich,Arsenal,1,1,D
29/11/2015,Tottenham,Chelsea,0,0,D
29/11/2015,West Ham,West Brom,1,1,D
05/12/2015,Arsenal,Sunderland,3,1,H
05/12/2015,Chelsea,Bournemouth,0,1,A
05/12/2015,Man United,West Ham,0,0,D

Since this text file separates fields using commas it is called a CSV(Comma-separated values) file. The first line of the file details the heading of the column and the remaining lines of the file are the rows with the actual values. CSV files in this sense work like Spreadsheets and can be opened in Excel. Let’s discuss the columns:

  • Date: the date on which the football match was played.
  • HomeTeam: the name of the team playing at home.
  • AwayTeam: the name of the team playing away.
  • FTHG: the full-time home team goals i.e. the number of goals scored by the home team.
  • FTAG: the full-time away team goals i.e. the number of goals scored by the away team.
  • FTR: the full-time result, H if the home team won, A if the away team won and D if the match ended as a draw.

Extracting data from text files using Scanner

The Java Scanner class provides a simple text scanner which can parse primitive types and strings using regular expressions. Let’s read the match data from the example text file using a Scanner object:

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class ReadFileWithScanner {
    public static void main(String[] args) {
        File file = new File("EPL_2015_2016.txt");
        try {
            Scanner scanner = new Scanner(file);
            while (scanner.hasNextLine()) {
                String dataRow = scanner.nextLine();
                System.out.println(dataRow);
            }
            scanner.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }
}

We start by creating an instance of a Java IO File object with the path to the file. In this case, the file is in the same directory as the Java source file so the file name is all we need to provide. We then create an instance of the Scanner class using the File object and proceed to read lines from the Scanner until there are none using a while loop, each time printing the line to the standard output stream. We complete our program by closing the Scanner object to avoid resource leakage.

It’s possible to provide files that do not exist to Scanner by accident so the code is wrapped in a try-catch to catch the FileNotFoundException exception and display the associated error stack trace.

Now that we have our file reading code, let’s view the output when the program is executed:

Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR
28/11/2015,Man City,Southampton,3,1,H
28/11/2015,Sunderland,Stoke,2,0,H
29/11/2015,Liverpool,Swansea,1,0,H
29/11/2015,Norwich,Arsenal,1,1,D
29/11/2015,Tottenham,Chelsea,0,0,D
29/11/2015,West Ham,West Brom,1,1,D
05/12/2015,Arsenal,Sunderland,3,1,H
05/12/2015,Chelsea,Bournemouth,0,1,A
05/12/2015,Man United,West Ham,0,0,D

Since we used the nextLine() method it returns one line/row from the match data text file at a time. You can think of the Scanner as having a file pointer which points to the line it is currently processing. As you extract lines from the file using nextLine() the pointer moves down the file.

Image of the Scanner file pointer moving down the file line-by-line as the nextLine() method is called

It’s important to check there is a line to read i.e. we are not at the end of the file using hasNextLine() otherwise the following exception: java.util.NoSuchElementException: No line found, will be thrown.

A Scanner breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types using the various next methods. This means we do not need to extract data from the file as lines but can also extract data as:

  • boolean by using nextBoolean()
  • float by using nextFloat()
  • double by using nextDouble()
  • int by using nextInt()
  • long by using nextLong()

The respective 'has' method e.g. hasNextBoolean() when extracting boolean types can be used to check if there’s still a token left to extract. So in this case where the tokens are not lines but a type e.g. int, boolean, float, the file pointer points to data of that respective type.

Image of the Scanner file pointer moving token-by-token according to boolean types using nextBoolean()

In this case, the file is parsed token-by-token according to a data type, not line-by-line. This can be extremely useful when loading a text file made up of numbers for example.

Extracting data from text files using In.java

If you are not keen to directly interact with the Scanner class yourself, then you can use the abstraction developed by the authors of Introduction to Programming in Java: An Interdisciplinary Approach called In. You simply download the In.java file and add it to your project directory. The text file of match data can be extracted as follows using the In class:

public class ReadFileWithIn {
    public static void main(String[] args) {
        In in = new In("EPL_2015_2016.txt");
        while (in.hasNextLine()) {
            String dataRow = in.readLine();
            System.out.println(dataRow);
        }
    }
}

Extracting data from text files using BufferedReader

The BufferedReader class can be used to read text from a character-input stream, buffering characters to provide for the efficient reading of characters, arrays, and lines.

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.FileReader;

public class ReadFileWithBufferedReader {
    public static void main(String[] args) {
        File file = new File("EPL_2015_2016.txt");
        try {
            BufferedReader br = new BufferedReader(new FileReader(file));

            String dataRow;
            while (true) {
                dataRow = br.readLine();
                if (dataRow == null) {
                    break;
                }
                
                System.out.println(dataRow);
            }
            br.close();
        } catch(FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

readLine() was used here to read the file line-by-line as was done with the Scanner class but the file could have been read character-by-character by using the read() method.

The code above can be made neater by reading the next dataRow line in the boolean condition of the while loop. The following code fragment implements this improvement:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.FileReader;

public class ReadFileWithBufferedReader {
    public static void main(String[] args) {
        File file = new File("EPL_2015_2016.txt");
        try {
            BufferedReader br = new BufferedReader(new FileReader(file));

            String dataRow;
            while ((dataRow = br.readLine()) != null) {
                System.out.println(dataRow);
            }
            br.close();
        } catch(FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Much better!

Extracting data from text files using FileReader

If you are looking for a convenient way to read characters from files consider the FileReader class.

import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

public class ReadFileWithFileReader {

    public static void main(String[] args) {
        FileReader fileReader;
        try {
            fileReader = new FileReader("EPL_2015_2016.txt");
            int c;
            char ch;
            while ((c = fileReader.read()) != -1) {
                ch = (char) c;
                System.out.print(ch);
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Extracting data from text files into a String

It can be very useful to read all the text from a file into a single String. Especially, if the application requires the analysis of the text as a whole, for example:

  • AI prompts
  • Sentiment analysis
  • Zipf's Law

This section explores two options:

  • DataInputStream
  • Java NIO (New Input/Output) package

Java NIO: Files

The Files class in the NIO package consists exclusively of static methods that operate on files, directories, or other types of files. The readAllBytes() method can be used to read all the data in the match data text file into a String:

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class ReadFileIntoString {
    public static void main(String[] args) {
        try {
            String fileData = new String(
                Files.readAllBytes(Paths.get("EPL_2015_2016.txt")));
            System.out.println(fileData);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

The Paths class in the NIO package consists exclusively of static methods that return a Path object by converting a path string or URI object.

DataInputStream

A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way.

import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;

public class ReadFileWithDataInputStream {
    public static void main(String[] args) {
        try {
            DataInputStream reader = new DataInputStream(new FileInputStream("EPL_2015_2016.txt"));
            int nBytesToRead = reader.available();
            if (nBytesToRead > 0) {
                byte[] bytes = new byte[nBytesToRead];
                reader.read(bytes);
                String fileData = new String(bytes);
                System.out.println(fileData);
            }
            reader.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Extracting data from text files using FileChannel

The FileChannel class offers a channel for reading, writing, mapping, and manipulating a file.

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;

public class ReadFileWithFileChannel {
    public static void main(String[] args) {
        try {
            RandomAccessFile reader = new RandomAccessFile("EPL_2015_2016.txt", "r");
            FileChannel channel = reader.getChannel();

            int bufferSize = 1024;
            if (bufferSize > channel.size()) {
                bufferSize = (int) channel.size();
            }
            ByteBuffer buff = ByteBuffer.allocate(bufferSize);
            channel.read(buff);
            buff.flip();

            String fileData = new String(buff.array());
            System.out.println(fileData);

            channel.close();
            reader.close();

        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

If you are reading a large file, FileChannel can be faster than standard IO.

Extracting data from text files into a List

The Files class in the NIO package has a readAllLines() method that can be used to read the lines from a text file and store them in a list with each line having its own list entry.

import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;

public class ReadFileIntoList {
    public static void main(String[] args) {
        try {
            List<String> fileLines = Files.readAllLines(Paths.get("EPL_2015_2016.txt"), 
                                       StandardCharsets.UTF_8);
            for (String dataRow: fileLines) {
                System.out.println(dataRow);
            }                   
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Which option should I pick?

This tutorial on reading data from text files in Java has explored different options:

  • Extracting the data from the text file line-by-line.
  • Extracting the data from the text file token-by-token.
  • Extracting the data from the text file character-by-character.
  • Extracting all the data from the text file into a single String object.
  • Extracting all the data from the text file into a List object.

After going through all the options your question may be which one do I select to use in my project? I would say consider the following questions:

  • Which form would I like the data in? Lines, characters, as a single String or in a List?
  • Does performance matter? If you are dealing with a large file and performance matters i.e. I need my program to run as fast as possible then consider the FileChannel option.
  • What will make pre-processing my data easier? Sometimes you need to check the data for errors or split it into different objects, how you read the data from the file e.g. as lines or characters can make this process easier. If you need to check every character in your file during a validation process then using a character-by-character option would be best.

Consider the listed questions and select an option that aligns with your answer. Also, remember that most of the classes mentioned in the options support different methods so be sure to view the respective Oracle documentation. For example, BufferedReader can be used to read the data line-by-line using readLine() or character-by-character using read().

table of contents

Ready to join the movement?

Dive into our articles, explore our resources, and join the clan. Together, let's rewrite the rules of coding education and prove that learning can be fun, effective, and a little bit rebellious.

Subscribe to receive updates about new posts.