# longest common substring problem

*Longest common substring problem is one of famous problem in computer science.Experienced c/c++ programmer can face this type of problem. Given two or more strings ,find the longest common substring, for example the longest common substring of the strings “ABABC”, “BABCA” is string “ABC” of length 3 ( source: Wikipedia).
*

*Let us Take another example: string_1 = “Hello” and string_2=”Fellow”, The longest common substring is “ello” of length 4.*

*Recursive approach:*

*Recursive approach:*

*select first character from one string and select first character from second string.**if match continue with second character from both string ,increment the count.**if not matched select next character from one string and do match in second string and store the match count**select first character from string one and do match from next character in second string and store the match count.**return max of above two steps.*

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
#include<stdio.h> #include<stdbool.h> int LCS(char *p, char *q) { if (*p == '\0' || *q == '\0') return 0; else if (*p == *q) { return 1 + LCS(p + 1, q + 1); } else { int a = LCS(p + 1, q); int b = LCS(p, q + 1); if (a > b) return a; else return b; } } int main() { char *str_one = "fellow"; char *str_two = "hello"; printf("Length of longest common substring is:"); int res = LCS(str_one, str_two); printf("%d \n", res); return 0; } |

*complexity: close to 2^n
*

*As we can see the solution is correct, but look at the complexity. practically it is not possible to accept this code because of exponential complexity of above algorithm. in this solution same problem is solved repeatedly. Look at below is an calling tree for lcs([B, A, C, B],[A, B, C, A, B]). we will find that we are solving some of sub-problems again and again.*

*If we memorize the solution of sub-problem we can reduced the complexity. Let us modified the above code by memoization using 2d array. The memoization algorithm would compute the LCS in time O(m*n) where m and n are length of two strings. if the two strings are equal the time complexity would be O(n²).*

*Recursive approach with memoization ( dynamic programming)*

*Recursive approach with memoization ( dynamic programming)*

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
#include<stdio.h> #include<stdbool.h> #define row 100 #define col 100 static int m[row][col]; int LCS(char *p, char *q, int i, int j) { if (*p == '\0' || *q == '\0') return 0; if (m[i][j] == -1) { if (*p == *q) { m[i][j] = 1 + LCS(p + 1, q + 1, i + 1, j + 1); } else { int a = LCS(p + 1, q, i + 1, j); int b = LCS(p, q + 1, i, j + 1); if (a > b) { m[i][j] = a; } else { m[i][j] = b; } } } return m[i][j]; } void initm() { int i = 0; int j = 0; for (i = 0; i < row; i++) { for (j = 0; j < col; j++) m[i][j] = -1; } } int main() { initm(); char *str_one = "fellow"; char *str_two = "hello"; printf("Length of longest common substring is:"); int res = LCS(str_one, str_two, 0, 0); printf("%d \n", res); return 0; } |

This is not the algorithm for the substring, but the subsequence.