Java – Regular Expressions – 1

Regular Expressions in Java

We are going to learn about  Regular expressions in Java under the following topics:

  1. Introduction to Regular expressions
  2. Useful Classes in Regular Expressions
  3. Syntax of Regular Expressions
  4. Matcher Classes in Regular Expressions
  5. Demo of Regular Expressions
  6. Watch the Video

 

1. Introduction to Regular Expressions

A regular expression is a special sequence of characters. Its also known as regex. You can use them to represent patterns in a text. Moreover, you can also used them to perform other operations. These includes: searching for particular pattern in a text or replacing occurrence of certain string.

For example, you may want to search for phone numbers inside several pages of text. Regular expression comes to the rescue in cases like this.

In Java, we have three classes used to work with regular expressions. This classes are available in java.util.regex package. They are Pattern, Matcher and PatternSyntaxException classes. Let’s now discuss each of them.

 

2. Useful Classes in Regular Expressions

There are three of such classes. We would consider them in this section. They are:

  • Pattern Class
  • Matcher Class
  • PatternSyntaxException Class

 

Pattern Class

So you can create a Pattern object from a pattern class. Hence,  pattern object represents a particular regular expression. However, the Pattern class has not public constructors. As such, you cannot just create a Pattern object using new keyword.

How then can you create a pattern object?

You can do this by calling its static compile method(). This method accepts a regular expression as parameter and returns a Pattern object.

 

Matcher Class

Next class in the package is the Matcher class. The Matcher class helps you to match a string against a pattern. Just like the Pattern class, there are no public constructor in the Matcher class. However, you can create a Matcher object by calling the match() method of the Pattern object. Then you pass it the string  you want to match.

 

PatternSyntaxException

A PatternSyntaxException object is an exception that occurs if there is error in the regular expression

 

 

3. Syntax of Regular Expressions

So we are going to separate the syntax into categories. This is because there are quite a number of different expressions to use.

(a) Matching Characters

Expression Matches
.  Any character except newline.
\w Word character (a-z, A-Z, 0-9, _)
\W Not a word character
\s Whitespaced character.  Same as to [\t\n\r\f].
\S Non-whitespace.
\d Any digit. Same as to [0-9].
\D Nondigits.

 

(b) Matching Boundaries

So we provide a list of regular expressions that matches any boundaries.

Subexpression Matches
^ Beginning of a string
$ End of a string
\b Word boundary
\B Non word boundaries.

 

(c) Groups

To work with groups you use braces. So the list is provided below.

Expression Matches
[…] Any single character in brackets.
[^…] Any single character not in brackets.
a| b Matches either a or b.
(re) Groups regular expressions enclosed in brackets.

 

(d) Quantifiers

Now we consider quantifiers. You use them to match more than one character at a time. Find the list below.

Expression Matches
* 0 or more occurrences of the preceding expression.
+ 1 or more of the previous thing.
? 0 or 1 occurrence of the preceding expression.
{ n} Exactly n number of occurrences of the preceding expression.
{ n,} n or more occurrences of the preceding expression.
{ n, m} Range of numbers. Minimum of n and maximum of m

 

Furthermore note that metacharacters must be escaped. Metacharacters include the following:

. [ { ( ) \ ^ $ | ? * +

You escape a metacharacter by preceding it with a backslash (\)

 

 

4. Matcher Class Methods

You can use the methods provided by the Matcher class to perform certain operations. So there are three categories of method we would discuss. They are:

  • Index Methods
  • Verifier Methods
  • Replacement Methods

 

The index methods

You use the index methods to get where a match was found. It provides the index position of the match. Returns an integer

SN Method name description
1 start()

This method returns the start index of the previous match.

2 start(int group)

Gives the start index of the subsequence captured by the given group during the previous operation.

3 end()

Gives the offset after the last character matched.

4 end(int group)

Gives the offset after the last character of the subsequence captured by the given group during the previous operation.

 

Verifier Methods

So the verifier methods are used to check an input string to verify if a patter is found. Hence, it returns true if the pattern is found. Otherwise, it returns false. Find the list below.

SN Method name and description
1 lookingAt()

Tries to match the input sequence, starting at the beginning of the region, against the pattern.

2 find()

Tries to find the next subsequence of the input sequence that matches the pattern. Returns true if it finds a match. Otherwise return false.

3 find(int start)

Resets this matcher and then tries to find the next subsequence of the input sequence that matches the pattern, starting at the given index. Returns true if there is a match. Otherwise return false.

4 matches()

Tries to match the entire region against the pattern. Returns true if  it finds a match. Else it returns false.

 

Replacement Methods

This methods are used to replace matching text in the input string. Find the list below

SN Method & brief description
1 Matcher appendReplacement(StringBuffer sb, String replacement)

Implements a non-terminal append-and-replace step. Returns a Matcher

2 StringBuffer appendTail(StringBuffer sb)

Implements a terminal append-and-replace step. Returns a StringBuffer

3 String replaceAll(String replacement)

Replaces every subsequence of the input sequence that matches the pattern with the given replacement string. Then returns the new string

4 String replaceFirst(String replacement)

Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string. The returns the new string

5 static String quoteReplacement(String s)

Returns a replacement String for the specified String. This method produces a String that will work as a literal replacement sin the appendReplacement method of the Matcher class.

 

 

5. Demo of Regular Expressions

Let’s now apply what we’ve learnt so far.

We would  write a regular expression that finds all 3 digit numbers in an input string

 

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexDeom {

	public static void main(String[] args) {
		Pattern pt = Pattern.compile("\\d{3}");
		
		String text = "23 becomes 230 if you add 0 " +
		"but its not same as 544 be it becomed 3";
		
		Matcher mt = pt.matcher(text);
		
		int count = 0;		
		while(mt.find()) {
			count = count+ 1;
		}		
		System.out.println(count);		
	}
}

6. Watch the Video