It’s 2017, and I just completed my first year of Computer Science at Stellenbosch University. Under the tutelage of Prof Geldenhuys, I learned Java and built two major projects:
- A two-player Sudoku game called Twoduku.
- A two-player game called Robots vs. Cats, based on the popular Plants vs. Zombies.
One thing I never needed in these projects was the extraction of data from text files as all gameplay took place on a Graphical User Interface(GUI). This is unfortunate, as I had just started working on a football prediction application in Java and all the match data is in text files. To answer questions like:
- How many goals did Manchester United score in the 2015/2016 season?
- Which team scored the most goals in the English Premier League in the 2015/2016 season?
- What are the average goals scored by each team in the 2015/2016 season?
I will need to extract the match data from the text files and use Java constructs like conditional statements, loops and basic arithmetic to compute the answers to those questions. But how do I read the data from these text files?
Example of a match data text file
Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR 28/11/2015,Man City,Southampton,3,1,H 28/11/2015,Sunderland,Stoke,2,0,H 29/11/2015,Liverpool,Swansea,1,0,H 29/11/2015,Norwich,Arsenal,1,1,D 29/11/2015,Tottenham,Chelsea,0,0,D 29/11/2015,West Ham,West Brom,1,1,D 05/12/2015,Arsenal,Sunderland,3,1,H 05/12/2015,Chelsea,Bournemouth,0,1,A 05/12/2015,Man United,West Ham,0,0,D
Since this text file separates fields using commas it is called a CSV(Comma-separated values) file. The first line of the file details the heading of the column and the remaining lines of the file are the rows with the actual values. CSV files in this sense work like Spreadsheets and can be opened in Excel. Let’s discuss the columns:
- Date: the date on which the football match was played.
- HomeTeam: the name of the team playing at home.
- AwayTeam: the name of the team playing away.
- FTHG: the full-time home team goals i.e. the number of goals scored by the home team.
- FTAG: the full-time away team goals i.e. the number of goals scored by the away team.
- FTR: the full-time result, H if the home team won, A if the away team won and D if the match ended as a draw.
Extracting data from text files using Scanner
The Java Scanner
class provides a simple text scanner which can parse primitive types and strings using regular expressions. Let’s read the match data from the example text file using a Scanner
object:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
public class ReadFileWithScanner {
public static void main(String[] args) {
File file = new File("EPL_2015_2016.txt");
try {
Scanner scanner = new Scanner(file);
while (scanner.hasNextLine()) {
String dataRow = scanner.nextLine();
System.out.println(dataRow);
}
scanner.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}

We start by creating an instance of a Java IO File
object with the path to the file. In this case, the file is in the same directory as the Java source file so the file name is all we need to provide. We then create an instance of the Scanner
class using the File
object and proceed to read lines from the Scanner until there are none using a while loop, each time printing the line to the standard output stream. We complete our program by closing the Scanner
object to avoid resource leakage.
It’s possible to provide files that do not exist to Scanner
by accident so the code is wrapped in a try-catch
to catch the FileNotFoundException
exception and display the associated error stack trace.
Now that we have our file reading code, let’s view the output when the program is executed:
Date,HomeTeam,AwayTeam,FTHG,FTAG,FTR 28/11/2015,Man City,Southampton,3,1,H 28/11/2015,Sunderland,Stoke,2,0,H 29/11/2015,Liverpool,Swansea,1,0,H 29/11/2015,Norwich,Arsenal,1,1,D 29/11/2015,Tottenham,Chelsea,0,0,D 29/11/2015,West Ham,West Brom,1,1,D 05/12/2015,Arsenal,Sunderland,3,1,H 05/12/2015,Chelsea,Bournemouth,0,1,A 05/12/2015,Man United,West Ham,0,0,D
Since we used the nextLine()
method it returns one line/row from the match data text file at a time. You can think of the Scanner
as having a file pointer which points to the line it is currently processing. As you extract lines from the file using nextLine()
the pointer moves down the file.
It’s important to check there is a line to read i.e. we are not at the end of the file using hasNextLine()
otherwise the following exception: java.util.NoSuchElementException: No line found
, will be thrown.
A Scanner
breaks its input into tokens using a delimiter pattern, which by default matches whitespace. The resulting tokens may then be converted into values of different types using the various next methods. This means we do not need to extract data from the file as lines but can also extract data as:
boolean
by usingnextBoolean()
float
by usingnextFloat()
double
by usingnextDouble()
int
by usingnextInt()
long
by usingnextLong()
The respective 'has' method e.g. hasNextBoolean()
when extracting boolean types can be used to check if there’s still a token left to extract. So in this case where the tokens are not lines but a type e.g. int
, boolean
, float
, the file pointer points to data of that respective type.
In this case, the file is parsed token-by-token according to a data type, not line-by-line. This can be extremely useful when loading a text file made up of numbers for example.
Extracting data from text files using In.java
If you are not keen to directly interact with the Scanner
class yourself, then you can use the abstraction developed by the authors of Introduction to Programming in Java: An Interdisciplinary Approach called In
. You simply download the In.java file and add it to your project directory. The text file of match data can be extracted as follows using the In
class:
public class ReadFileWithIn {
public static void main(String[] args) {
In in = new In("EPL_2015_2016.txt");
while (in.hasNextLine()) {
String dataRow = in.readLine();
System.out.println(dataRow);
}
}
}

Extracting data from text files using BufferedReader
The BufferedReader
class can be used to read text from a character-input stream, buffering characters to provide for the efficient reading of characters, arrays, and lines.
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.FileReader;
public class ReadFileWithBufferedReader {
public static void main(String[] args) {
File file = new File("EPL_2015_2016.txt");
try {
BufferedReader br = new BufferedReader(new FileReader(file));
String dataRow;
while (true) {
dataRow = br.readLine();
if (dataRow == null) {
break;
}
System.out.println(dataRow);
}
br.close();
} catch(FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}

readLine()
was used here to read the file line-by-line as was done with the Scanner
class but the file could have been read character-by-character by using the read()
method.
The code above can be made neater by reading the next dataRow
line in the boolean
condition of the while
loop. The following code fragment implements this improvement:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.FileReader;
public class ReadFileWithBufferedReader {
public static void main(String[] args) {
File file = new File("EPL_2015_2016.txt");
try {
BufferedReader br = new BufferedReader(new FileReader(file));
String dataRow;
while ((dataRow = br.readLine()) != null) {
System.out.println(dataRow);
}
br.close();
} catch(FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}

Much better!
Extracting data from text files using FileReader
If you are looking for a convenient way to read characters from files consider the FileReader
class.
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
public class ReadFileWithFileReader {
public static void main(String[] args) {
FileReader fileReader;
try {
fileReader = new FileReader("EPL_2015_2016.txt");
int c;
char ch;
while ((c = fileReader.read()) != -1) {
ch = (char) c;
System.out.print(ch);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}

Extracting data from text files into a String
It can be very useful to read all the text from a file into a single String. Especially, if the application requires the analysis of the text as a whole, for example:
- AI prompts
- Sentiment analysis
- Zipf's Law
This section explores two options:
- DataInputStream
- Java NIO (New Input/Output) package
Java NIO: Files
The Files
class in the NIO package consists exclusively of static methods that operate on files, directories, or other types of files. The readAllBytes()
method can be used to read all the data in the match data text file into a String
:
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
public class ReadFileIntoString {
public static void main(String[] args) {
try {
String fileData = new String(
Files.readAllBytes(Paths.get("EPL_2015_2016.txt")));
System.out.println(fileData);
} catch (IOException e) {
e.printStackTrace();
}
}
}

The Paths
class in the NIO package consists exclusively of static methods that return a Path
object by converting a path string or URI
object.
DataInputStream
A data input stream lets an application read primitive Java data types from an underlying input stream in a machine-independent way.
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
public class ReadFileWithDataInputStream {
public static void main(String[] args) {
try {
DataInputStream reader = new DataInputStream(new FileInputStream("EPL_2015_2016.txt"));
int nBytesToRead = reader.available();
if (nBytesToRead > 0) {
byte[] bytes = new byte[nBytesToRead];
reader.read(bytes);
String fileData = new String(bytes);
System.out.println(fileData);
}
reader.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}

Extracting data from text files using FileChannel
The FileChannel
class offers a channel for reading, writing, mapping, and manipulating a file.
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
public class ReadFileWithFileChannel {
public static void main(String[] args) {
try {
RandomAccessFile reader = new RandomAccessFile("EPL_2015_2016.txt", "r");
FileChannel channel = reader.getChannel();
int bufferSize = 1024;
if (bufferSize > channel.size()) {
bufferSize = (int) channel.size();
}
ByteBuffer buff = ByteBuffer.allocate(bufferSize);
channel.read(buff);
buff.flip();
String fileData = new String(buff.array());
System.out.println(fileData);
channel.close();
reader.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}

If you are reading a large file, FileChannel
can be faster than standard IO.
Extracting data from text files into a List
The Files
class in the NIO package has a readAllLines()
method that can be used to read the lines from a text file and store them in a list with each line having its own list entry.
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.List;
public class ReadFileIntoList {
public static void main(String[] args) {
try {
List<String> fileLines = Files.readAllLines(Paths.get("EPL_2015_2016.txt"),
StandardCharsets.UTF_8);
for (String dataRow: fileLines) {
System.out.println(dataRow);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}

Which option should I pick?
This tutorial on reading data from text files in Java has explored different options:
- Extracting the data from the text file line-by-line.
- Extracting the data from the text file token-by-token.
- Extracting the data from the text file character-by-character.
- Extracting all the data from the text file into a single String object.
- Extracting all the data from the text file into a List object.
After going through all the options your question may be which one do I select to use in my project? I would say consider the following questions:
- Which form would I like the data in? Lines, characters, as a single
String
or in aList
? - Does performance matter? If you are dealing with a large file and performance matters i.e. I need my program to run as fast as possible then consider the
FileChannel
option. - What will make pre-processing my data easier? Sometimes you need to check the data for errors or split it into different objects, how you read the data from the file e.g. as lines or characters can make this process easier. If you need to check every character in your file during a validation process then using a character-by-character option would be best.
Consider the listed questions and select an option that aligns with your answer. Also, remember that most of the classes mentioned in the options support different methods so be sure to view the respective Oracle documentation. For example, BufferedReader
can be used to read the data line-by-line using readLine()
or character-by-character using read()
.