KMP Algorithm for Pattern Seaching
Pattern: 模式
Given a text txt[0...N-1] and a pattern pat[0...M-1], write a function search(char pat[], char txt[]) that prints all
occurrences of pat[] in txt[].You may assume that N>M.
Occurrence: 出现
Assume: 假定

Examples:
Input: txt[] = "THIS IS A TEST TEXT", pat[] = "TEST"
Output: Pattern found at index 10
Input: txt[] = "AABAACAADAABAABA"
pat[] = "AABA"
Output: Pattern found at index 0, Pattern found at index 9, Pattern found at index 12

Pattern searching is an important problem in computer science. When we do search for a string in a notepad/word file or
browser or database, pattern-searching algorithms are used to show the search results.

We have discussed the Naive pattern-searching algorithm in the previous post. The worst case complexity of the Naive
algorithm is O(m(n-m+1)). The time complexity of the KMP algorithm is O(n+m) in the worst case.
Discuss: 讨论
Naive: 朴素,原始
Post: 帖子

KMP Pattern Searching:
The Naive pattern-searching algorithm doesn't work well in cases where we see many matching characters followed by a
mismatching character.
Match: 比得上…;满足…。
Mismatch:错配。

Examples:
1) txt[] = "AAAAAAAAAB", pat[] = "AAAB"
2) txt[] = "ABABABABABCABABABABC", pat[] = "ABABC"(not a worst case, but a bad case for Naive)

The KMP matching algorithm uses degenerating property (pattern having the same sub-patterns appearing more than once in
the pattern) of the pattern and improves the worst-case complexity to O(n+m).
Degenerating: 退化。
Generating: 引起。
Property: 特性。

The basic idea behind KMP's algorithm is: whenever we detect a mismatch (after some matches), we already know some of
the characters in the text of the next window. We take advantage of this information to avoid matching the characters
that we know will anyway match.
Whenever: 每当。
Detect: 识别。
Advantage: 有利的。

Matching Overview

txt = "AAAAABAABA"
pat = "AAAA"
We compare first window of txt with pat

txt = "AAAAABAAABA"
pat = "AAAA" [Initial position]
We find a match. This is same as Naive String Matching.

In the next step, we compare next window of txt with pat.

txt = "AAAAABAAABA"
pat = "AAAA" [Pattern shifted one position]
// Shift: 移动。

This is where KMP does optimization over Naive. In this second window, we only compare fourth A of pattern with fourth
character of current window of text to decide whether current window matches or not. Since we know first three
characters will anyway match, we skipped matching first three characters.
Optimization: 优化。
Current: 当前的。

Need of Preprocessing?
An important question arises from the above explanation, how to know how many characters to be skipped. To know this, we
pre-process pattern and prepare an integer array lps[] that tells us the count of characters to be skipped.
Arise: 产生。
Explanation: 解释。

Preprocessing Overview:
• KMP algorithm preprocesses pat[] and constructs an auxiliary lps[] of size m (same as the size of the pattern) which
is used to skip characters while matching.
• Name lps indicates the longest proper prefix which is also a suffix. A proper prefix is a prefix with a whole string
not allowed.
Auxiliary: 辅助的。