C has the beautiful function isprint()
which checks for any printable character including space. This function can be used to determine if a character is printable and the result can be negated to check if it is non-printable. But how do we complete this task in Java?
What are non-printable characters?
Non-printable characters or formatting marks are parts of a character set that do not represent a written symbol or part of the text within a document or code, but rather are there in the context of signal and control in character encoding. They are used to tell word processors and certain applications, like Web browsers, how a document is supposed to look but are not displayed when the document is printed, hence the non-printable. Examples of non-printable characters include:
- TAB: horizontal tab. ASCII value 9. It is used to align text horizontally to the next tab stop.
- LF: NL line feed, new line. ASCII value 10. It tells the printer to advance the paper.
- CR: carriage return. ASCII value 13. A control character that is used to reset a device's position to the beginning of a line of text. It is the result of pressing the Enter or Return key.
As mentioned earlier non-printable characters do not only include formatting/style marks but also include:
- control characters,
- and other invisible symbols that we can find in text but aren’t meant to show.
Control characters can be used in data streams, such as the STX (ASCII value 2) and ETX (ASCII value 3) characters, they are used to transmit ON and OFF commands, as well as the NULL character (ASCII value 0), which is used to indicate the end of a data stream. Examples of other control characters include:
- EOT: end of transmission. ASCII value 4.
- ETB: end of transmission block. ASCII value 23.
For a full list of non-printable characters check out: Reference: Non Printable Characters List
Why can they be a nuisance?
Non-printable characters can cause problems with text handling, showing and saving. For example, non-printable characters in clinical trial data create potential problems in producing quality deliverables like:
- Incorrect statistics or counts in the deliverables.
- Appearance of strange symbols in reports.
So it’s important to develop methods to detect them so they can be changed or removed from text. Let’s consider some options for this problem.
Hardcode: The painful way...
Problems that require checking if a value is in a fixed list usually have an approach that involves hardcoding the values. Unfortunately, such a solution tends to be tedious to write and contains serious room for error. What if you forget a printable character? Alas, let’s develop our own version of C’s isprint()
:
Code
public class NonPrintableHardcode {
private static char[] PRINTABLE = {
' ', '!', '"', '#', '$', '%', '&', '\'', '(', ')', '*',
'+', ',', '-', '.', '/', '0', '1', '2', '3', '4', '5',
'6', '7', '8', '9', ':', ';', '<', '=', '>', '?', '@',
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K',
'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V',
'W', 'X', 'Y', 'Z', '[', '\\\\', ']', '^', '_', '`', 'a',
'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l',
'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w',
'x', 'y', 'z', '{', '|', '}', '~'
};
public static void main(String[] args) {
char a = 'a';
// start of text control character
char nonPrintableSTX = (char) 2;
System.out.println("Is '" + a + "' printable? " + isPrint(a));
System.out.println("Is '" + nonPrintableSTX + "' printable? "
+ isPrint(nonPrintableSTX));
}
public static boolean isPrint(char ch) {
for (int i = 0; i < PRINTABLE.length; i++) {
if (ch == PRINTABLE[i]) {
return true;
}
}
return false;
}
}

We enter all the necessary printable characters in the PRINTABLE
array. In the post What on earth is ASCII? we learnt about ASCII values and the ASCII Table. We can use the ASCII Table to get the list of printable characters to include in the PRINTABLE
array.
Output
Is 'a' printable? true Is '' printable? false
Apache Commons Lang
The great folks from Apache developed the Apache Commons Lang package which contains a useful module called StringUtils
. The StringUtils
class contains a static method called isAsciiPrintable()
which takes a String
as a parameter and checks if it contains only ASCII characters that are printable. This in a sense is the equivalent of C’s isprint()
but for strings. Let’s see how the isAsciiPrintable()
method can be used to detect whether text contains a non-printable character:
Code
import org.apache.commons.lang3.StringUtils;
public class NonPrintableApache {
public static void main(String[] args) {
String s1 = "This is printable text";
System.out.println("Is '" + s1 + "' printable? " +
StringUtils.isAsciiPrintable(s1));
// start of text control character
char nonPrintableSTX = (char) 2;
String s2 = "The last character is non-printable " +
nonPrintableSTX;
System.out.println("Is '" + s2 + "' printable? " +
StringUtils.isAsciiPrintable(s2));
}
}

Output
Is 'This is printable text' printable? true Is 'The last character is non-printable ' printable? false
This is the expected output as s1
contains the string "This is printable text"
which only contains printable characters while s2
contains an STX control character which non-printable.
Note: the method returns:
false
if the input string isnull
.true
if the input string is empty.
Setting up the Apache Commons Lang package: Using Maven
StringUtils
is defined in the Apache Commons Lang package so you will have to add it to your Maven project by adding the following dependency to the project pom.xml
file.
<dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> <version>3.12.0</version> </dependency>
Setting up the Apache Commons Lang package: Using the JAR file
You can download the JAR file of the package from the Download Apache Commons Lang page and then add it to your build path in Eclipse or IntelliJ.
Setting up the Apache Commons Lang package: Adding the package folder to your project directory
Suppose you are a wild one like me. In that case, you can download the src ZIP archive from the Download Apache Commons Lang page. Open the java directory in the main directory and simply move the entire org directory into your project directory. Unfortunately, you cannot just yank out the StringUtils.java
source file and place it into your project directory due to the dependencies it uses in the package.
Build your own isAsciiPrintable() using ASCII values
In the post What on earth is ASCII? we did not only learn about ASCII values and the ASCII Table, but also about how the ASCII Table can be used to determine the characteristics of characters. The ASCII Table can be used to determine which ASCII values are non-printable:
- 0 - 31 contains formatting and control characters like STX, EOT, LF, CR etc.
- 127 is DEL, the delete control character.
In What on earth is ASCII? we also learnt how to obtain the ASCII values of characters using Java. So we have enough to develop our the function by using the listed ranges in an
if
statement:
public static boolean isAsciiPrintable(String text) {
if (text == null) {
return false;
} else if (text.equals("")) {
return true;
}
for (int i = 0; i < text.length(); i++) {
int asciiValue = (int) text.charAt(i);
if (asciiValue < 32 || asciiValue == 127) {
return false;
}
}
return true;
}

So any ASCII value below 32 or equal to 127 will be flagged and trigger a false
return.
Note: just as with StringUtils.isAsciiPrintable()
the method returns:
false
if the input string isnull
.true
if the input string is empty.
Code
public class NonPrintableASCIIRange {
public static void main(String[] args) {
String s1 = "This is printable text";
System.out.println("Is '" + s1 + "' printable? " +
isAsciiPrintable(s1));
// start of text control character
char nonPrintableSTX = (char) 2;
String s2 = "The last character is non-printable " +
nonPrintableSTX;
System.out.println("Is '" + s2 + "' printable? " +
isAsciiPrintable(s2));
}
public static boolean isAsciiPrintable(String text) {
if (text == null) {
return false;
} else if (text.equals("")) {
return true;
}
for (int i = 0; i < text.length(); i++) {
int asciiValue = (int) text.charAt(i);
if (asciiValue < 32 || asciiValue == 127) {
return false;
}
}
return true;
}
}

Output
The output of the code will be as follows:
Is 'This is printable text' printable? true Is 'The last character is non-printable ' printable? false
Be careful
If your input text contains characters from extended ASCII e.g. É, Latin capital letter E with acute, or other Unicode characters be sure to include them in your range checks.
Build your own isAsciiPrintable() using the Character class
The Java Character class contains a lot of useful methods to check characters like:
isDigit()
: determines if the specified character is a digit.isLetter()
: determines if the specified character is a letter.isLetterOrDigit()
: determines if the specified character is a letter or digit.
On top of useful methods, the class contains useful static fields. The following fields of interest relate to Unicode specification categories:
CONTROL
: General category "Cc" in the Unicode specificationFORMAT
: General category "Cf" in the Unicode specification.UNASSIGNED
: General category "Cn" in the Unicode specification. An unassigned character is a codepoint not assigned to an abstract character.
Together these fields can categorise characters in text as non-printable. So if a character does not belong to the category CONTROL
, FORMAT
or UNASSIGNED
it must be printable. But how do we compute the category of a character? We can use the Character
class getType()
static method to obtain a value indicating a character's general category. Let’s put it together now and build a new isAsciiPrintable()
method:
Code
public class NonPrintableCharacterClass {
public static void main(String[] args) {
String s1 = "This is printable text";
System.out.println("Is '" + s1 + "' printable? "
+ isAsciiPrintable(s1));
// start of text control character
char nonPrintableSTX = (char) 2;
String s2 = "The last character is non-printable " +
nonPrintableSTX;
System.out.println("Is '" + s2 + "' printable? "
+ isAsciiPrintable(s2));
}
public static boolean isAsciiPrintable(String text) {
if (text == null) {
return false;
} else if (text.equals("")) {
return true;
}
for (int i = 0; i < text.length(); i++) {
char ch = text.charAt(i);
if (Character.getType(ch) != Character.CONTROL
&& Character.getType(ch) != Character.FORMAT
&& Character.getType(ch) != Character.UNASSIGNED) {
// valid printable character
continue;
} else {
// non-printable character
return false;
}
}
return true;
}
}

Output
The output of the code will be as follows:
Is 'This is printable text' printable? true Is 'The last character is non-printable ' printable? false
Note: technically we should call this method isUnicodePrintable()
as the fields relate to Unicode but the desired effect is the same. The same goes for the upcoming approach that uses a regular expression.
Regular expression
When it comes to text analysis, parsing or processing the Programmer’s go-to tool is the Regular expression. Regular expressions are very useful and can be used to not only detect non-printable characters but also replace them! Let’s build our own isAsciiPrintable()
using a regular expression:
Code
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class NonPrintableRegularExpression {
public static void main(String[] args) {
String s1 = "This is printable text";
char nonPrintableSTX = (char) 2;
String s2 = "The last character is non-printable "
+ nonPrintableSTX;
System.out.println("Is '" + s1 + "' printable? " +
isAsciiPrintable(s1));
System.out.println("Is '" + s2 + "' printable? " +
isAsciiPrintable(s2));
}
public static boolean isAsciiPrintable(String text) {
if (text == null) {
return false;
} else if (text.equals("")) {
return true;
}
String regex = "[\\p{C}]";
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(text);
// if the String contains only non-printable characters
// then m.find() will return true so we negate it to
// obtain the opposite i.e. it only contains printable
// characters
return !m.find();
}
}

The regular expression to detect Unicode non-printable characters is given by \p{C}
. The Matcher
object is built using this pattern so that when the find()
method is executed, it returns true
as soon as it finds a non-printable character in the String
.
We want our function to match the behaviour of Apache’s isAsciiPrintable()
so we need to negate the result from the find()
method so that it returns true
when the text only contains printable characters not when it contains a non-printable character.
Output
The output of the code will be as follows:
Is 'This is printable text' printable? true Is 'The last character is non-printable ' printable? false
Conclusion
We have considered different ways to solve the problem of detecting a non-printable character in text. From the painful hardcoding approach to the gold standard isAsciiPrintable()
from the Apache Commons Lang package, you are spoilt for choice. We even explored how to build your own isAsciiPrintable()
. Now select an approach and get coding! Select your option by considering:
- Do I want to narrow the list of printable characters? i.e. do I want low-level control of the detection process?
- What will be easier to maintain?