Lifestyle

Understanding the Horspool Algorithm in Simple Words

Parveen Verma
Published By
Parveen Verma
Kanishk Mehra
Reviewed By
Kanishk Mehra
Shubham Sharma
Edited By
Shubham Sharma
Understanding the Horspool Algorithm in Simple Words

The Horspool algorithm is a way to find a small piece of text (called a pattern) in a larger chunk of text. It’s like a treasure hunt where you skip unnecessary steps to reach your goal faster. It’s simpler than its parent method, the Boyer-Moore algorithm, and works best when the alphabet is small, or the pattern isn’t too long.

How Does It Work?

The algorithm uses a shift table to decide how far to skip if there’s no match. Instead of checking every single letter, it focuses on comparing the pattern’s characters from right to left. If there’s a mismatch, it jumps ahead, saving time.

Step 1: Preprocessing (Creating the Skip Table)

This table will tell how many steps to skip if the letters don’t match.

  1. Write down all the letters in the alphabet that might appear in your text.
  2. For each letter, calculate how far it is from the last position of that letter in the pattern. If the letter isn’t in the pattern, give it the full pattern length as the skip value.

For example, if the pattern is ABCD

  • A → 3, B → 2, C → 1, D → 4, and any other letters → 4.

Here’s how the skip table looks for the word “cat”

LetterSkip Steps
C2
A1
T3
Others3

Step 2: Searching (Finding the Pattern)

  1. Line up the pattern with the text.
  2. Start comparing letters from the last character of the pattern.
  3. If there’s a mismatch, use the skip table to move the pattern forward by the correct number of steps.
  4. Repeat until you either find the pattern or run out of text.

Let’s find "cat" in the text: “I have a cat”

  1. Start with "I h." No match for "t." Skip 3 steps.
  2. Next chunk, "ave." No match again. Skip 3 steps.
  3. Now, "cat." This matches!

Boyer-Moore and Horspool algorithm, which is better?

The Boyer-Moore algorithm and the Horspool algorithm are both used for searching a smaller pattern in a larger text. The main difference lies in their complexity and flexibility.

The Boyer-Moore algorithm uses two smart techniques, the bad character rule, and the good suffix rule, making it highly efficient for searching patterns in text. By combining these two rules, it can skip large sections of the text, which speeds up the search process significantly. On the other hand, the Horspool algorithm simplifies Boyer-Moore by using only the bad character rule. It focuses on the last character of the pattern for mismatches and decides how far to jump using a shift table. While Horspool is easier to understand and implement, it is less efficient for very long patterns compared to Boyer-Moore.

In short, Boyer-Moore is faster for complex cases, while Horspool is simpler and works well for moderate patterns.