読者です 読者をやめる 読者になる 読者になる

Boyer-Moore String Search Algorithm with C++

C++ アルゴリズム

The Boyer-Moore string search algorithm is one of the efficient string searching algorithm. This is based on a naive string searching algorithm which slides a pattern forward by 1 position and verifies whether a substring of text matches the pattern. A characteristic of this algorithm is its behavior when verifying. This verifies whether a match exists at particular position 'backward'.

Let us consider an example. Let a pattern 'P' be "ANPANMAN". First, it checks the 8th position of the text to see if it is the "N". If it finds the "N", it moves to the 7th position and checks as well, and so on until it checks the first position of the text for a "A". How about the failure case? For example, let 8th position of the text be "X". If it doesn't find the searching character at the current position, it slides the pattern forward. However, we know the pattern doesn't include the character 'X', we don't need to verify the text at the next seven position following it.

Following picture(?) is a vidualization of this algorithm.

-------X-------
ANPANMAN-------
-ANPANMAN------
--ANPANMAN-----
---ANPANMAN----
----ANPANMAN---
-----ANPANMAN--
------ANPANMAN-
-------ANPANMAN

This example explains why Boyer-Moore is the efficient string searching algorithm. In this example, we set the 8th position of the text as 'X'. But how about setting it as "M"? The pattern 'P' includes the character "M", so we have to be careful when skipping. We only can skip to the 6th position.

-------M-------
ANPANMAN-------
-ANPANMAN------
--ANPANMAN-----

In the Boyer-Moore algorithm, we preliminarily prepare a function lambda to see the last occurrence of 'a' in the pattern 'P'. With this function lambda, this algorithm can work effectively.

I implemented the Boyer-Moore algorithm with C++. At first, it sets up the last occurrence function lambda, tries to find the match backward, and skips with the function lambda.

#include <iostream>
#include <string>
#include <map>

// ref http://infoshako.sk.tsukuba.ac.jp/~yamamoto/Courses/files/page2_13.pdf
class BoyerMoore {
public:
  std::string text;
  std::string pattern;
  int n;
  int m;
  std::map<char, int> lambda;
  BoyerMoore(std::string text_, std::string pattern_) : 
	text(text_), pattern(pattern_), n(text_.size()), m(pattern_.size()){
	compute_lambda();
  };
  void compute_lambda() {
	for(int j = 1; j <= m; j++) {
	  lambda[pattern.at(j - 1)] = j;
	}
  };
  int get_lambda(const char& c) {
	if (lambda.find(c) != lambda.end()) {
	  return lambda[c];
	} else {
	  return 0;
	}
  };
  void match() {
	int s = 0;
	while(s <= n - m) {
	  int j = m;
	  while(j > 0 && pattern.at(j - 1) == text.at(s + j - 1)) {
		j--;
	  }
	  if(j == 0) {
		std::cout << "Pattern occurs at shift " << s << std::endl;
		s++;
	  } else {
		s += std::max(1, j - get_lambda(text.at(s + j - 1)));
	  }
	}
  };
};

int main(int argc, char *argv[]) {
  BoyerMoore bm("hogefugapiyo", "piyo");
  bm.match();
}
/Users/syou6162/cpp% ./a.out
Pattern occurs at shift 8