strStr()
is a function that takes two strings as input, haystack
and needle
, and returns the index of the first occurrence of needle
in haystack
, or -1
if needle
is not present in haystack
.
For example, if haystack = "hello world"
and needle = "world"
, then the function should return 6
, because "world"
first appears at index 6 in "hello world"
. If needle = "goodbye"
, then the function should return -1
, because "goodbye"
does not appear in "hello world"
.
This problem is often used in programming interviews because it requires the use of string manipulation techniques and searching algorithms. There are many ways to solve this problem, but some common approaches include using a sliding window or using built-in string functions in the programming language being used.
Implement strStr() Solutions
Solution 1: Brute Force
One straightforward solution to the strStr()
problem is to use a brute force approach, which involves checking each possible substring of haystack
that has the same length as needle
. If a substring is found that matches needle
, then its starting index in haystack
is returned. If no matching substring is found, then -1
is returned. Here’s some pseudocode to illustrate this approach:
function strStr(haystack, needle):
n = length(haystack)
m = length(needle)
for i = 0 to n - m:
if haystack[i:i+m] == needle:
return i
return -1
The time complexity of this solution is O((n – m + 1) * m), where n is the length of haystack
and m is the length of needle
. This is because we are checking each substring of length m
in haystack
, and there are n - m + 1
such substrings. Comparing two strings of length m
takes O(m) time. Therefore, the overall time complexity of this solution is O(nm).
Solution 2: Knuth-Morris-Pratt (KMP) Algorithm
The KMP algorithm is a more advanced solution to the strStr()
problem that uses a pre-processing step to build a pattern-matching table based on the needle
string. This table allows us to skip certain characters in the haystack
string during the matching process, which improves performance. Here’s some pseudocode to illustrate this approach:
function strStr(haystack, needle):
n = length(haystack)
m = length(needle)
if m == 0:
return 0
prefix = compute_prefix(needle)
j = 0
for i = 0 to n - 1:
while j > 0 and haystack[i] != needle[j]:
j = prefix[j-1]
if haystack[i] == needle[j]:
j += 1
if j == m:
return i - m + 1
return -1
function compute_prefix(needle):
m = length(needle)
prefix = [0] * m
j = 0
for i = 1 to m - 1:
while j > 0 and needle[i] != needle[j]:
j = prefix[j-1]
if needle[i] == needle[j]:
j += 1
prefix[i] = j
return prefix
The time complexity of this solution is O(n + m), where n is the length of haystack
and m is the length of needle
. The pre-processing step takes O(m) time to build the prefix
table, and the main loop of the algorithm takes O(n) time to scan through the haystack
string. The inner while loop in the main loop executes at most m times, so it does not contribute significantly to the overall time complexity.
Solution 3: Rabin-Karp Algorithm
The Rabin-Karp algorithm is another advanced solution to the strStr()
problem that uses a rolling hash function to compute the hash values of substrings of haystack
and needle
. This allows us to compare substrings more efficiently, because we can first compare their hash values before comparing the actual strings. Here’s some pseudocode to illustrate this approach:
function strStr(haystack, needle):
n = length(haystack)
m = length(needle)
if m == 0:
return 0
h = pow(26, m-1) % modulus
needle_hash = compute_hash(needle, h)
haystack_hash = compute_hash(haystack[:m], h)
for i = 0 to n - m:
if needle_hash == haystack_hash and needle == haystack[i:i+m]:
return i
if i < n - m:
haystack_hash = recalculate_hash(haystack[i], haystack[i+m], haystack_hash, h)
return -1
function compute_hash(string, h):
hash_value = 0
for c in string:
hash_value = (hash_value * 26 + ord(c) - ord('a')) % modulus
return hash_value
function recalculate_hash(old_char, new_char, old_hash, h):
new_hash = (old_hash - (ord(old_char) - ord('a')) * h) % modulus
new_hash = (new_hash * 26 + ord(new_char) - ord('a')) % modulus
return new_hash
The time complexity of this solution is O(n + m), where n is the length of haystack
and m is the length of needle
. The algorithm computes the hash values of needle
and the first substring of length m
in haystack
in O(m) time, and then scans through haystack
one character at a time, updating the hash value of each substring in O(1) time using the rolling hash function. The comparison of substrings takes O(m) time, but is only performed when their hash values match.
The Rabin-Karp algorithm has the same time complexity as the KMP algorithm, but maybe faster in practice for certain inputs because it can avoid some comparisons by using hash values. However, it requires more memory than the KMP algorithm to store the hash values and modulus value.
Which approach is better and why?
It depends on the specific use case and input data. The brute force solution has the worst time complexity and may be too slow for large inputs. The KMP algorithm has a better worst-case time complexity than the brute force solution, but requires extra memory to store the prefix
table.
The Rabin-Karp algorithm has the same time complexity as the KMP algorithm, but may be faster in practice for certain inputs. Therefore, it’s important to consider the specific constraints and requirements of the problem when choosing a solution.
The KMP algorithm is a good solution to the strStr()
problem because it has a better worst-case time complexity than the brute force solution, and it is relatively easy to understand and implement.
Implement strStr() in Python using KMP Algorithm:
def strStr(haystack: str, needle: str) -> int:
n = len(haystack)
m = len(needle)
if m == 0:
return 0
prefix = compute_prefix(needle)
j = 0
for i in range(n):
while j > 0 and haystack[i] != needle[j]:
j = prefix[j-1]
if haystack[i] == needle[j]:
j += 1
if j == m:
return i - m + 1
return -1
def compute_prefix(needle: str) -> List[int]:
m = len(needle)
prefix = [0] * m
j = 0
for i in range(1, m):
while j > 0 and needle[i] != needle[j]:
j = prefix[j-1]
if needle[i] == needle[j]:
j += 1
prefix[i] = j
return prefix
Implement strStr() in Ruby using KMP Algorithm
def str_str(haystack, needle)
n = haystack.length
m = needle.length
return 0 if m == 0
prefix = compute_prefix(needle)
j = 0
n.times do |i|
while j > 0 && haystack[i] != needle[j]
j = prefix[j-1]
end
if haystack[i] == needle[j]
j += 1
end
if j == m
return i - m + 1
end
end
return -1
end
def compute_prefix(needle)
m = needle.length
prefix = Array.new(m, 0)
j = 0
(1...m).each do |i|
while j > 0 && needle[i] != needle[j]
j = prefix[j-1]
end
if needle[i] == needle[j]
j += 1
end
prefix[i] = j
end
return prefix
end
Implement strStr() in JavaScript using KMP Algorithm
function strStr(haystack, needle) {
const n = haystack.length;
const m = needle.length;
if (m == 0) {
return 0;
}
const prefix = computePrefix(needle);
let j = 0;
for (let i = 0; i < n; i++) {
while (j > 0 && haystack[i] != needle[j]) {
j = prefix[j-1];
}
if (haystack[i] == needle[j]) {
j++;
}
if (j == m) {
return i - m + 1;
}
}
return -1;
}
function computePrefix(needle) {
const m = needle.length;
const prefix = Array(m).fill(0);
let j = 0;
for (let i = 1; i < m; i++) {
while (j > 0 && needle[i] != needle[j]) {
j = prefix[j-1];
}
if (needle[i] == needle[j]) {
j++;
}
prefix[i] = j;
}
return prefix;
}
Implement strStr() in Java using KMP Algorithm
public int strStr(String haystack, String needle) {
int n = haystack.length();
int m = needle.length();
if (m == 0) {
return 0;
}
int[] prefix = computePrefix(needle);
int j = 0;
for (int i = 0; i < n; i++) {
while (j > 0 && haystack.charAt(i) != needle.charAt(j)) {
j = prefix[j-1];
}
if (haystack.charAt(i) == needle.charAt(j)) {
j++;
}
if (j == m) {
return i - m + 1;
}
}
return -1;
}
public int[] computePrefix(String needle) {
int m = needle.length();
int[] prefix = new int[m];
int j = 0;
for (int i = 1; i < m; i++) {
while (j > 0 && needle.charAt(i) != needle.charAt(j)) {
j = prefix[j-1];
}
if (needle.charAt(i) == needle.charAt(j)) {
j++;
}
prefix[i] = j;
}
return prefix;
}
In each of these implementations, we first compute the prefix table using the compute_prefix()
function, which takes O(m) time. Then, we scan through the haystack
string and try to match needle
using the prefix table, which takes O(n) time in the worst case. Therefore, the overall time complexity of the algorithm is O(m + n).
Leave a Reply