# Introduction

CONTROL + F or COMMAND + F

How often do you use above keyboard shortcut? In fact, for most of us, searching a string or substring in a pile of strings/document is involuntarily action with the above key combination. This post will deal with the subject of the substring search. We will quickly define the problem statement and move to our algorithm explanation. We will finally move to the implementation.

# Problem Statement

We will assume text (where we have to search) is an array of  n characters. Let’s denote the text by T[1..n]. String to be searched is called pattern and is represented by an array P[1..m] where m ≤ n.

We need to find indexes in given text T where the given pattern P occurs.

Following picture will explain it better: Figure 1: Here, the blue coloured characters in the text matched with the pattern. The program should return 3, i.e. the index of a in text.

# Brute Force Approach

## Introduction

The Brute Force algorithm compares the pattern to the text  taking one character at a time.

For example, let’s suppose we need to find  “abba” in “abcabaabcabac”.

Pattern has “a”; Text has “a”. Now try to match second character of pattern to second character of the text.

Pattern has “b”; Text has “b”. Very well, now let us try to match third character of pattern with third character of the text.

Pattern has “b”; Text has “c”. Ah, failed.

Let’s try to see the same thing with graphics.   Pattern has “a”; Text has “b”. This is a no go at the very first character of the. Pattern has “a”; Text has “c”. This is a no go at the very first character of the pattern. Similarly, we will go on matching until our characters in text are exhausted or we find a match.

## Implementation

 import java.util.ArrayList; import java.util.List; public class StringMatchBruteForce { public static void main (String args []){ System.out.println("========String matcher Brute Force Approach==========="); String text = "abcabbabbaabac"; String pattern = "abba"; long initialTime = System.nanoTime(); List matchedIndexes = bruteForceStringMatcher(text, pattern); long finalTime = System.nanoTime(); for(Integer matchedIndex : matchedIndexes){ System.out.println("Pattern found at "+matchedIndex); } if(matchedIndexes.size() == 0) System.out.println("Pattern not found"); System.out.println("Time taken for matching "+(finalTime – initialTime)); } public static List bruteForceStringMatcher(String text, String pattern){ char[] textArray = text.toCharArray(); char[] patternArray = pattern.toCharArray(); int textLength = textArray.length; int patternLength = patternArray.length; List matchedIndexes = new ArrayList<>(); int textIndex = 0; for(textIndex = 0; textIndex < textLength; textIndex++){ int textIndexLocal = textIndex; Boolean match = true; int matchedIndex = textIndex; for(int patternIndex = 0; patternIndex < patternLength; patternIndex++){ if(textArray[textIndexLocal] != patternArray[patternIndex]) { match = false; break; } textIndexLocal = textIndexLocal + 1; } if(match) matchedIndexes.add(matchedIndex); } return matchedIndexes; } }

The output for this is as follow:

```========String matcher Brute Force Approach===========
Pattern found at 3
Pattern found at 6
Time taken for matching 243434```

# Time Complexity

In worst case, for any text of length n and pattern m, we are going to do m comparisons for each character of text, that will mean O(n*m)

We will be discussing a faster Rabin-Karp Algorithm for substring search in next post.

Corman

Rabin Karp

KMP

Rabin Fingerprint

PPT for Rabin Karp