包含 k 个不同字符的最长子字符串

我正在努力获取具有 k 个不同字符的最长子字符串。

例：

输入： abcbdbdbbdcdabd.

For k = 2, o/p is ‘bdbdbbd’
For k = 3, o/p is ‘bcbdbdbbdcd’
For k = 5, o/p is ‘abcbdbdbbdcdabd’

这是我的代码取自这个网站：

// Define the character range
public static final int CHAR_RANGE = 128;
// Function to find the longest substring of a given string containing
// `k` distinct characters using a sliding window
public static String findLongestSubstring(String str, int k)
{
// base case
if (str == null || str.length() == 0) {
return str;
}
// stores the longest substring boundaries
int end = 0, begin = 0;
// set to store distinct characters in a window
Set<Character> window = new HashSet<>();
// Count array `freq` stores the frequency of characters present in the
// current window. We can also use a map instead of a count array.
int[] freq = new int[CHAR_RANGE];
// `[low…high]` maintains the sliding window boundaries
for (int low = 0, high = 0; high < str.length(); high++)
{
window.add(str.charAt(high));
freq[str.charAt(high)]++;
// if the window size is more than `k`, remove characters from the left
while (window.size() > k)
{
// If the leftmost character's frequency becomes 0 after
// removing it in the window, remove it from the set as well
if (--freq[str.charAt(low)] == 0) {
window.remove(str.charAt(low));
}
low++;        // reduce window size
}
// update the maximum window size if necessary
if (window.size() ==k && end - begin < high - low)
{
end = high;
begin = low;
}
}
// return the longest substring found at `str[begin…end]`
return str.substring(begin, end + 1);
}
public static void main(String[] args)
{
String str = "abcbdbdbbdcdabd";
int k = 2;
System.out.print(findLongestSubstring(str, k));
}

现在我的问题是，在上面的代码中，我们有一个while循环，while (window.size() > k)，我看到我们可以只用像if(window.size() > k)这样的if条件替换。我不确定我在这里从 while 更改为 if 是否正确。根据代码，我们将在 for 循环中向窗口添加一个 chaarcter，并立即检查窗口是否超过 k，因此一个简单的 if 条件应该有效。有什么理由在这里使用 while 循环吗？

简短回答：将while更改为if不会影响答案的正确性或代码的渐近复杂性。

这种while和if都起作用的模式通常适用于这些滑动窗口问题，您试图找到满足某些条件的最长子字符串/子数组。将其保留为while使代码更容易推理或证明正确：你会得到很好的循环不变量，比如"在循环的每次迭代结束时，'low'是最小值，这样str[low,...,high]最多包含k个不同的字符"。你还得到了一个很好的属性，window.size最多只在k+1，最多k在每个循环的末尾

。将while更改为if似乎是一个"聪明"的优化：它仍然有效(对于这个确切的问题)，并且基于天真的"执行的操作"计数，它所做的总工作量较少。另一方面，代码变得更加脆弱，不那么通用，并且更难推理：既然window.size可以和你的字母表一样大，那么你的循环不变量是什么？现在，证明正确性的最简单方法是通过矛盾切换到参数，表明如果有一个更长的有效子字符串，你的滑动窗口不能在不找到它的情况下越过它。此外，如果从字符串交换为整数数组，空间复杂性可能会变得更糟。

如果你真的对代码进行计时，你可能会发现，尽管执行的工作较少，但性能实际上会随着if而变差。在现代处理器上，缓存局部性和可预测性远比小的常量因素重要：例如，在一个紧密的while循环中重复1000次hashset.remove()，可能不直观地比只做900hashset.remove()快得多，分散在更长的时间段内，并在两者之间混合其他操作。与往常一样，对两个版本进行基准测试，但请注意数据结构操作的间距会导致意外的性能峰值。

当然，重要的是要知道为什么使用if会像while一样提供最长的子字符串，并且能够证明这一点;但是，除非真正必要，否则我建议避免这些聪明的优化技巧，这些技巧以算法的简单性和明显正确性为代价提供假设的性能提升。

相关内容

最新更新

热门标签：