A beginner guide to Brute Force Algorithm for substring search

Introduction

CONTROL + F or COMMAND + F

How often do you use above keyboard shortcut? In fact, for most of us, searching a string or substring in a pile of strings/document is involuntarily action with the above key combination. This post will deal with the subject of the substring search. We will quickly define the problem statement and move to our algorithm explanation. We will finally move to the implementation.

Problem Statement

We will assume text (where we have to search) is an array of  n characters. Let’s denote the text by T[1..n]. String to be searched is called pattern and is represented by an array P[1..m] where m ≤ n.

We need to find indexes in given text T where the given pattern P occurs.

Following picture will explain it better:

Screen Shot 2019-02-09 at 4.57.51 PM
Figure 1: Here, the blue coloured characters in the text matched with the pattern. The program should return 3, i.e. the index of in text.

 

Brute Force Approach

Introduction

The Brute Force algorithm compares the pattern to the text  taking one character at a time.

For example, let’s suppose we need to find  “abba” in “abcabaabcabac”.

We will start with first “a” in “abcabaabcabac”.

Pattern has “a”; Text has “a”. Now try to match second character of pattern to second character of the text.

Pattern has “b”; Text has “b”. Very well, now let us try to match third character of pattern with third character of the text.

Pattern has “b”; Text has “c”. Ah, failed.

Let’s try to see the same thing with graphics.

brute.png

brute (1).png

brute (2).png

Now, We will start with second character “b” in “abcabaabcabac”.

Pattern has “a”; Text has “b”. This is a no go at the very first character of the.

brute (3).png

Now, We will start with third character “c” in “abcabaabcabac”.

Pattern has “a”; Text has “c”. This is a no go at the very first character of the pattern.

brute (4).png

Similarly, we will go on matching until our characters in text are exhausted or we find a match.

Implementation

The output for this is as follow:

========String matcher Brute Force Approach===========
Pattern found at 3
Pattern found at 6
Time taken for matching 243434

Time Complexity

In worst case, for any text of length n and pattern m, we are going to do m comparisons for each character of text, that will mean O(n*m)

We will be discussing a faster Rabin-Karp Algorithm for substring search in next post.


References and useful link:

Corman

Rabin Karp

KMP

Rabin Fingerprint

PPT for Rabin Karp

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: