0

Given two strings:

str1 = "abcdefacbccbagfacbacer" str2 = "abc" 

I've to find the longest substring in str1 that is formed by the subset of characters of str2, in this case it would be - 7 (acbccba). What would be the approach to solve this in least complexity. First I thought of DP. But, I guess DP is really not required for this, as we have to search for substring, and not subsequence. Then I though of suffix tree. But that would require extra pre-processing time.

What would be the best way to do this? In fact, is this problem even suitable for a suffix tree, or DP?

5
  • Do you mean substring containing any permutation of (some subset of) character(s) from str2? Commented Dec 3, 2013 at 7:25
  • @ElliottFrisch Yes, any characters of str2. Commented Dec 3, 2013 at 7:26
  • @Elliott Frisch - Despite that yes, that looks like a no to me. The example longest result is "acbccba", which is clearly not a permutation of a subset of characters from "abc" - it has all those characters repeated. It looks to me like the "abc" is just a set of characters that can occur. Commented Dec 3, 2013 at 7:36
  • @Steve314 Actually it has not to contain all the characters from abc. For example, if there was a substring - ababababab, then that would have been an answer. Sorry for the misleading example. Commented Dec 3, 2013 at 7:39
  • @user3011937 - understood - "can occur" doesn't mean "must occur". What I think you misunderstood is that a permutation is the same items in (potentially) a different order (no repeats). "acbccba" is not a permutation of "abc" because you can't get "acbccba" simply by re-ordering the items in "abc". "cba" is a permutation of "abc", and "aabbccc" is a permutation of "acbccba". Commented Dec 3, 2013 at 7:52

4 Answers 4

5

The easiest approach by far:

  1. Build a hashset of the second string.
  2. Loop over the first string and for each character, check if it is in the hashset. Keep track of the longest substring.

Running time: O(n+m) where n is the length of str1 and m is the length of str2.


(Non-tested) code:

Set<Character> set = new HashSet<>(); for (int i = 0; i < str2.length(); i++) { set.add(str2.charAt(i)); } int longest = 0; int current = 0; int longestEnd = -1; for (int i = 0; i < str1.length(); i++) { if (set.contains(str1.charAt(i)) { current++; if (current > longest) { longest = current; longestEnd = i + 1; } } else { current = 0; } } String result = ""; if (longest > 0) { result = str1.substr(longestEnd - longest, longestEnd); } 
Sign up to request clarification or add additional context in comments.

1 Comment

and you can stop @ n-(longest found so far).
0

Just an idea: wrap second string in [] and use Pattern's match method:

Pattern p = Pattern.compile("(["+str2+"])"); Matcher m = p.matcher(str1); m.find(); 

and then m.group(1) shall find it.

3 Comments

I'm trying to avoid regexes here, as efficiency is a concern. This would anyways be O(|str1|) algo, plus extra overhead of regex pattern processing. Better to think is as general algorithm question.
How can you be sure that m.group(1) will be the longest?
Please add Pattern.quote to that, remove the outer () and use group(0) instead, but you won't find the longest one, only the first one - you need to find all of them and keep track of the longest one.
0

There is actually only one way i can think of:

  1. Go on the chars of the string str1.
  2. Foreach char in str1 check if its in str2
  3. increase a counter (i) eachtime the current char in str1 was found in str2
  4. Once the char in str1 not part of str2 save the counter (i) value if its begger than the maxfound counter in maxfound which represents the longest found sequence and reset the (i) counter.

Comments

0

Tested code in Perl.

 use strict; use warnings; my $str1 = "abcdefacbccbagfacbacer"; my $str2 = "abc"; my @str1 = split ("", $str1); my @str2 = split ("", $str2); my @res = (); my $index = undef; my $max = 0; my @max_char = (); for(my $i = 0; $i < @str1; $i++){ if ($str1[$i] =~ /[@str2]/){ push (@res , $str1[$i]); next; }else{ if(@res > $max){ $max = @res; @max_char = @res; $index = $i; } @res = (); } } if(@res > $max){ @max_char = @res; $index = $i; } $index = $index - $#max_char - 1; print "Longest substring = @max_char. Starting from index $index"; 

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.