Simple Explanation And Implementation Of Knuth-Morris-Pratt (KMP) Algorithm For String Search

Introduction

We have seen brute force approach to string search here. Brute force approach to string search has time complexity of O(n*m). Donald Knuth and Vaughan Pratt, and James H. Morris conceived the algorithm in 1970. KMP algorithm is the first algorithm to have linear time complexity.

Problem Statement

We will assume text (where we have to search) is an array of n characters. Let’s denote the text by T[1..n]. String to be searched is called pattern and is represented by an array P[1..m] where m ≤ n.

We need to find indexes in given text T where the given pattern P occurs. We need to improve from the brute force algorithm having time complexity of O(m*n).

Refer here for detailed problem statement.

Explanation of KMP Algorithm

Here’s our text and pattern:

Screen Shot 2019-03-22 at 4.24.43 PM
Figure 1: Text
Screen Shot 2019-03-22 at 4.25.00 PM.png
Figure 2: Pattern

We will try matching from first character of pattern with first character of the text. Both are “a” and it matches. Moving on, we match second character which is “b” and it matches. But the third character also matches which is “a”. Fourth character is “b” in text but “c” in pattern and they do not match. Following picture illustrate this.

Screen Shot 2019-03-22 at 4.31.09 PM
Figure 3: Matching the characters

The matched substring is “aba” in pattern “abac” from index 0 to 2 in text as well as pattern.

If we observe closely, the matched substring has “a” in prefix which is also suffix in the same substring. Following figure illustrates this.

Untitled document
Figure 4: Matched substring “aba” with “a” as prefix and suffix

Now, using the fact that we have already transversed the string, we will not start matching pattern from index 0 with text as index 1. Instead, we will match text at index 3 and pattern at index 1. Because, we know that character at index 2 in text, i.e. “a”, is already present in pattern at index 0. This is the crux of KMP. Instead of transversing the pattern for every character in text, we transverse the pattern once and find the information about repetition.

We again continue matching until we find mismatch at index 4 of text with index 2 of pattern. Following picture illustrates this.

untitled-document-1.jpg
Figure 5: Matching again

The matched substring is ab. We don’t have any prefix and suffix that matches. In this case, we will start matching again with text at index 3 and pattern at index 0.

Untitled document (2)
Figure 6

Continue transversing.

Implementation

We first transverse the pattern and find information about repetition.

Untitled document (2)
Figure 7: Suffix Array in KMP algorithm

We will transverse the pattern character array and try to build suffix array. Suffix array will have length equal to pattern character array. First position of suffix will be always 0 as there is no index before 0th index of pattern character array. Or in other words, there is no character “a”, before the first “a” in pattern. See figure 7 for reference.

Our suffix array will look like this after each iteration:

Screen Shot 2019-03-22 at 6.17.25 PM

So, only “a” at index 2 is already present at index 0, so suffix array has 1 at second index.

Question: What would be the suffix array for pattern "abab"?

Answer: 0 0 1 2

Code

Time Complexity

The time complexity of suffix generation is O(m) where m is the length of pattern. Time complexity of finding pattern is O(n) where n is the length of the pattern. Total time complexity is O(m+n).

Time Comparison

Screen Shot 2019-03-22 at 6.31.25 PM

When text = abcabac” and pattern = “abac”.

You should also read about Rabin Karp Algorithm here.


Reference:

1

2

3

4

Advertisements

One thought on “Simple Explanation And Implementation Of Knuth-Morris-Pratt (KMP) Algorithm For String Search

Add yours

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: