动态规划 - 使用递归和 DP 的最长公共子字符串



我正在尝试使用递归和DP找到两个字符串的最长公共子字符串。请注意,我不是指最长的连续子序列。所以,如果两个字符串是

String s1 = "abcdf"; String s2 = "bzcdf" 
Longest Common Substring == "cdf" (not "bcdf").
Basically they have to be continuous elements

我正在尝试使用递归和回溯来做到这一点。但是,问题是,如果我使用如下所示的递归,则 +1 是在帧中预先添加的,该帧在调用堆栈中位于较高位置,并且不知道要出现的字符是否确实是连续元素或否。因此,按照上面的例子,"bcdf"将是答案。

public class ThisIsLongestCommonSubsequence_NotSubstring {
public static void main(String[] args) {
    String s1 = "abcdgh";
    String s2 = "abefgh";
    System.out.println(fun(s1, s1.length()-1, s2, s2.length()-1));
}
static int fun(String s1, int i, String s2, int j)
{
    if(i == -1 || j == -1)
        return 0;
    int ret = 0;
    if(s1.charAt(i) == s2.charAt(j))
        ret = fun(s1, i-1, s2, j-1) + 1;
    else
        ret = max(fun(s1, i-1, s2, j), fun(s1, i, s2, j-1));
    return ret;
}
static int max(int a, int b)
{
    return a>b?a:b;
}
}

至于现在,下面的代码是我想出的。请注意,每次发现不匹配时,我都会将计数重置为 0。并使用称为 int count 的变量跟踪匹配字符的数量,并使用称为 int maxcount 的变量记录程序中任何点的最高字符数。我的代码在下面。

public class LongestContinuousSubstringGlobalvariable {
static int maxcount = 0;
public static void main(String[] args) {
    String s1 = "abcdghijl";
    String s2 = "abefghijk";
    fun(s1, s2, s1.length()-1, s2.length()-1, 0);
    System.out.println("maxcount == "+maxcount);
}
static void fun(String s1, String s2, int i, int j, int count)
{
    if(i == -1 || j==-1)
        return;
    if(s1.charAt(i) == s2.charAt(j))
    {
        if(count+1 >  maxcount)
            maxcount = count+1;
        fun(s1, s2, i-1, j-1, count+1); 
    }
    else
    {
        fun(s1, s2, i-1, j, 0);
        fun(s1, s2, i, j-1, 0);
    }
}
}

这工作正常。但是,我不喜欢我的代码的几件事

  1. 使用全局变量(静态 int maxcount)跨帧比较
  2. 我不认为这是真正的动态编程或回溯,因为较低的帧不会将其输出返回到较高的帧,然后决定如何处理它。

请给我你关于如何在不使用全局变量和回溯的情况下实现这一目标的意见。

PS :我知道解决问题的其他方法,例如保留矩阵,并执行类似

M[i][j] = M[i-1][j-1]+1 if(str[i] == str[j])

目标不是解决问题,而是找到一个优雅的递归/回溯解决方案。

它可能可以在Prolog中完成。以下是我可以在这篇文章的帮助下放下的代码: Foreach 在 Prolog 中不起作用,http://obvcode.blogspot.in/2008/11/working-with-strings-in-prolog.html 和 如何在列表列表中找到最长的列表?

myrun(S1, S2):-
    writeln("-------- codes of first string ---------"),
    string_codes(S1, C1list),
    writeln(C1list),
    writeln("-------- codes of second string ---------"),
    string_codes(S2, C2list),
    writeln(C2list),
    writeln("--------- substrings of first --------"),
    findall(X, sublist(X, C1list), L),   
    writeln(L),
    writeln("--------- substrings of second --------"),
    findall(X, sublist(X, C2list), M),
    writeln(M),
    writeln("------ codes of common substrings -------"),
    intersection(L,M, Outl),
    writeln(Outl), 
    writeln("--------- common strings in one line -------"),
    maplist(string_codes, Sl, Outl), 
    writeln(Sl),
    writeln("------ common strings one by one -------"),
    maplist(writeln, Sl),
    writeln("------ find longest -------"),
    longest(Outl, LongestL),
    writeln(LongestL),
    string_codes(LongestS, LongestL),
    writeln(LongestS).
sublist(S, L) :-
  append(_, L2, L),
  append(S, _, L2).
longest([L], L) :-
   !.
longest([H|T], H) :- 
   length(H, N),
   longest(T, X),
   length(X, M),
   N > M,
   !.
longest([H|T], X) :-
   longest(T, X),
   !.

它运行显示所有步骤:它将字符串转换为代码,然后从两者中创建所有可能的子字符串,然后找到常见的子字符串并列出它们:

?- myrun("abcdf", "bzcdf").
-------- codes of first string ---------
[97,98,99,100,102]
-------- codes of second string ---------
[98,122,99,100,102]
--------- substrings of first --------
[[],[97],[97,98],[97,98,99],[97,98,99,100],[97,98,99,100,102],[],[98],[98,99],[98,99,100],[98,99,100,102],[],[99],[99,100],[99,100,102],[],[100],[100,102],[],[102],[]]
--------- substrings of second --------
[[],[98],[98,122],[98,122,99],[98,122,99,100],[98,122,99,100,102],[],[122],[122,99],[122,99,100],[122,99,100,102],[],[99],[99,100],[99,100,102],[],[100],[100,102],[],[102],[]]
------ codes of common substrings -------
[[],[],[98],[],[99],[99,100],[99,100,102],[],[100],[100,102],[],[102],[]]
--------- common strings in one line -------
[,,b,,c,cd,cdf,,d,df,,f,]
------ common strings one by one -------

b
c
cd
cdf
d
df
f
------ find longest -------
[99,100,102]
cdf
true.

忽略末尾的"真实"。

如果删除解释部分,程序会短得多:

myrun(S1, S2):-
    string_codes(S1, C1list),
    string_codes(S2, C2list),
    findall(X, sublist(X, C1list), L),    
    findall(X, sublist(X, C2list), M),
    intersection(L,M, Outl),
    longest(Outl, LongestL),
    string_codes(LongestS, LongestL),
    writeln(LongestS).
sublist(S, L) :-
  append(_, L2, L),
  append(S, _, L2).
longest([L], L) :-
   !.
longest([H|T], H) :- 
   length(H, N),
   longest(T, X),
   length(X, M),
   N > M,
   !.
longest([H|T], X) :-
   longest(T, X),
   !.

?- myrun("abcdf", "bzcdf").
cdf
true.

最新更新