19
\$\begingroup\$

Input:

Two strings without newlines or whitespaces.

Output:

Both input strings on separate lines, with spaces where necessary for one of the two strings. And a third line with the characters A, R, M and , representing added, removed, modified, and unchanged.

We add spaces to either the top or bottom input string (if we have to). The goal of this challenge is to output with the least amount of changes (ARM) possible, also known as the Levenshtein distance.

Example:

Let's say the input strings are ABCDEF and AFBECD, then the output would be this:

A B CDEF AFBECD A A RR 

Here are some other possible invalid outputs as example (and there are a lot more):

ABCDEF AFBECD MMMMM A BCDEF AFBECD A MMMR AB CDEF AFBECD MAMMMR ABC DEF AFBECD MMAMMR ABC DEF AFBECD MMAA RR ABCDEF AFB ECD MMR MA AB CDEF // This doesn't make much sense, AFBECD // but it's to show leading spaces are also allowed AM A RR 

None of these have only four changes however, so only A B CDEF\nAFBECD \n A A RR is a valid output for this challenge.

Challenge rules:

  • You can assume the input strings won't contain any new-lines or spaces.
  • The two input strings can be of different lengths.
  • One of the two input strings should remain as is, except for optional leading/trailing spaces.
  • If your languages doesn't support anything besides ASCII, you can assume the input will only contain printable ASCII characters.
  • The input and output format are flexible. You can have three separate Strings, a String array, a single String with new-lines, 2D character array, etc.
  • You are allowed to use something else instead of ARM, but state what you've used (i.e. 123, or abc., etc.)
  • If more than one valid output is possible with the same amount of changes (ARM), you can choose whether to output one of the possible outputs or all of them.
  • Leading and trailing spaces are optional:

    A B CDEF AFBECD A A RR 

    or

    "A B CDEF\nAFBECD\n A A RR" ^ Note there are no spaces here 

General rules:

  • This is , so shortest answer in bytes wins.
    Don't let code-golf languages discourage you from posting answers with non-codegolfing languages. Try to come up with an as short as possible answer for 'any' programming language.
  • Standard rules apply for your answer, so you are allowed to use STDIN/STDOUT, functions/method with the proper parameters, full programs. Your call.
  • Default Loopholes are forbidden.
  • If possible, please add a link with a test for your code.
  • Also, please add an explanation if necessary.

Test cases:

In: "ABCDEF" & "AFBECD" Output (4 changes): A B CDEF AFBECD A A RR In: "This_is_an_example_text" & "This_is_a_test_as_example" Possible output (13 changes): This_is_an _example_text This_is_a_test_as_example MAAAAAAA RRRRR In: "AaAaABBbBBcCcCc" & "abcABCabcABC" Possible output (10 changes): AaAaABBbBBcCcCc abcABCab cABC R MM MMMR MM R In: "intf(){longr=java.util.concurrent.ThreadLocalRandom.current().nextLong(10000000000L);returnr>0?r%2:2;}" & "intf(){intr=(int)(Math.random()*10);returnr>0?r%2:2;}" Possible output (60 changes): intf(){longr=java.util.concurrent.ThreadLocalRandom.current().nextLong(10000000000L);returnr>0?r%2:2;} intf(){i ntr=( i n t)(M ath.r andom ()* 10 );returnr>0?r%2:2;} MR M MRRRRRR RRRR RRRRRR MMMRR MMMMRRR RRRRRRRR MRRRRRRRRR RRRRRRRRRR In: "ABCDEF" & "XABCDF" Output (2 changes): ABCDEF XABCD F A R In: "abC" & "ABC" Output (2 changes): abC ABC MM 
\$\endgroup\$
11
  • \$\begingroup\$ Related \$\endgroup\$ Commented Oct 5, 2017 at 12:19
  • \$\begingroup\$ If there are multiple arrangements that are the same distance, is it OK to only output one of them? \$\endgroup\$ Commented Oct 5, 2017 at 12:28
  • \$\begingroup\$ @AdmBorkBork Yes, just one of the possible outputs is indeed the intended output (although outputting all available options is also fine). I'll clarify this in the challenge rules. \$\endgroup\$ Commented Oct 5, 2017 at 12:30
  • \$\begingroup\$ @Arnauld I've removed the rule about leading spaces, so leading and trailing spaces are both optional and valid on the unmodified line. (Which means the last test case in your answer is completely valid.) \$\endgroup\$ Commented Oct 5, 2017 at 14:32
  • 1
    \$\begingroup\$ @Ferrybig Ah ok, thanks for the explanation. But as for this challenge, just supporting printable ASCII is actually already enough. If you want to support more, be my guest. But as long as it works for the test cases given I'm ok with undefined behavior for graphene clusters consisting of more than 1 character like that. :) \$\endgroup\$ Commented Oct 6, 2017 at 7:09

6 Answers 6

8
\$\begingroup\$

JavaScript (ES6), 413 ... 343 342 bytes

Saved 5 bytes by tweaking the loop indices, as suggested by @KevinCruijssen

Takes input as 2 strings in currying syntax. Returns an array of 3 strings.

b=>a=>{m=[];x=a.length;y=b.length;for(i=j=0,c=d='';i<=y;c+=R='R')m[i]=[[c,i++]];for(;j++<x;)m[i=0][j]=[d+=A='A',j];for(;i<y;i++)for(j=0;j<x;)[C]=m[[X,S]=m[i][j],[Y,I]=m[i+1][j],[Z,D]=m[i][++j],Q=[Z+R,D+1],i+1][j]=b[i]==a[j-1]?[X+' ',S]:S<I?D<S?Q:[X+'M',S+1]:D<I?Q:[Y+A,I+1];return[(g=s=>C.replace(/./g,c=>c==s?' ':b[i++],i=0))(A),g(R,b=a),C]} 

Test cases

let f = b=>a=>{m=[];x=a.length;y=b.length;for(i=j=0,c=d='';i<=y;c+=R='R')m[i]=[[c,i++]];for(;j++<x;)m[i=0][j]=[d+=A='A',j];for(;i<y;i++)for(j=0;j<x;)[C]=m[[X,S]=m[i][j],[Y,I]=m[i+1][j],[Z,D]=m[i][++j],Q=[Z+R,D+1],i+1][j]=b[i]==a[j-1]?[X+' ',S]:S<I?D<S?Q:[X+'M',S+1]:D<I?Q:[Y+A,I+1];return[(g=s=>C.replace(/./g,c=>c==s?' ':b[i++],i=0))(A),g(R,b=a),C]} console.log(f('ABCDEF')('AFBECD').join`\n`); console.log(f('This_is_an _example_text')('This_is_a_test_as_example').join`\n`); console.log(f('AaAaABBbBBcCcCc')('abcABCabcABC').join`\n`); console.log(f("intf(){longr=java.util.concurrent.ThreadLocalRandom.current().nextLong(10000000000L);returnr>0?r%2:2;}")("intf(){intr=(int)(Math.random()*10);returnr>0?r%2:2;}").join`\n`); console.log(f('ABCDEF')('XABCDF').join`\n`);

Less golfed

b => a => { m = []; x = a.length; y = b.length; // initialize the leftmost column and the topmost row for(i = j = 0, c = d = ''; i <= y; c += R = 'R') m[i] = [[c, i++]]; for(; j++ < x;) m[i = 0][j] = [d += A = 'A', j]; // walk through the Levenshtein matrix for(; i < y; i++) for(j = 0; j < x;) [C] = m[ // C = current string, once updated [X, S] = m[i][j], // substitution [Y, I] = m[i + 1][j], // insertion [Z, D] = m[i][++j], // deletion Q = [Z + R, D + 1], // precomputed update for deletion i + 1 ][j] = b[i] == a[j - 1] ? [X + ' ', S] // unchanged character : S < I ? D < S ? Q : [X + 'M', S + 1] // deletion or substitution : D < I ? Q : [Y + A, I + 1]; // deletion or insertion return [ // g = helper function to format the output strings by inserting spaces (g = s => C.replace(/./g, c => c == s ? ' ' : b[i++], i = 0))(A), g(R, b = a), // final modification string, picked from the last visited cell C ] } 

Example

Below is the initial matrix for b = "foo" and a = "ok":

// 'o' 'k' [ [ [ '', 0 ], [ 'A', 1 ], [ 'AA', 2 ] ], [ [ 'R', 1 ], (undefined), (undefined) ], // 'f' [ [ 'RR', 2 ], (undefined), (undefined) ], // 'o' [ [ 'RRR', 3 ], (undefined), (undefined) ] ] // 'o' 

and here is the final matrix after all iterations:

// 'o' 'k' [ [ [ '', 0 ], [ 'A', 1 ], [ 'AA', 2 ] ], [ [ 'R', 1 ], [ 'M', 1 ], [ 'MA', 2 ] ], // 'f' [ [ 'RR', 2 ], [ 'R ', 1 ], [ 'R A', 2 ] ], // 'o' [ [ 'RRR', 3 ], [ 'RR ', 2 ], [ 'R M', 2 ] ] ] // 'o' 

The final modification string along with the Levenshtein distance are stored in the bottom-right cell.

\$\endgroup\$
2
  • \$\begingroup\$ Same change I suggested to save 1 byte regarding -1/+1 of j and x still applies to your latest edit: b=>a=>{m=[];x=a.length;y=b.length+1;for(i=y;i--;)m[i]=[[i,'R'.repeat(i)]];for(j=x+1;j--;)m[i=0][j]=[j,'A'.repeat(j)];for(;++i<y;)for(j=-1;++j<x;)R=m[S=(X=m[i-1][j])[0],I=(Y=m[i][j])[0],D=(Z=m[i-1][j+1])[0],Q=[D+1,Z[1]+'R'],i][j+1]=b[i-1]==a[j]?[S,X[1]+' ']:S<I?D<S?Q:[S+1,X[1]+'M']:D<I?Q:[I+1,Y[1]+'A'];return[(g=s=>R[1].replace(/./g,c=>c==s?' ':b[i++],i=0))('A'),g('R',b=a),R[1]]} :) \$\endgroup\$ Commented Oct 5, 2017 at 15:21
  • 1
    \$\begingroup\$ @KevinCruijssen I saved 5 bytes by taking your idea a step further. Thanks! \$\endgroup\$ Commented Oct 5, 2017 at 15:43
5
\$\begingroup\$

Haskell, 192 181 174 161 158 150 147 143 1581 bytes

e@(a:r)&f@(b:s)=snd$maximum[([1|' '<-s!!2],s)|s<-z(z(:))[a:" R",' ':b:"A",a:b:last("M":[" "|a==b])][r&f,e&s,r&s]] x&y=[x,y,"RA"!!(0^length x)<$x++y] z=zipWith 

Try it online! Example usage: "ABCDEF" & "AFBECD". Returns a list of three strings. This is an extension of my recursive solution to the ordinary Levenshtein distance question.

Explanation:

To compute the minimal modifications to get from "xyz" to "yw", we focus on the first character of both strings. There are three possibilities:

  • Remove: Drop x from the first string and recursively compute the modifications to get from "yz" to "yw". This yields the three lines ["yz","yw"," M"]. Add x to the first one, a space to the second one and R to the third one. We get
    xyz yw R M
  • Add: Drop y from the second string and compute "xyz" & "w", which returns the result ["xyz","w","MRR"]. We need to add a space on the first line, y to the second and A to the third line:
     xyz yw AMRR
  • Modified/Unchanged: We can combine those two cases because both require to drop the first character of both strings and compute the minimal modifications between the remaining strings: "yz" & "w". To the result ["yz","w","MR"], we add x on he first and y on the second line. Only for the last line we need to differentiate whether the initial characters are the same. If they are the same, a space is added to the third line, otherwise (as in this case because x \= y) an M is added:
    xyz yw MMR

From those three candidates, we need to find the one with the fewest modifications. This is equivalent to having the most spaces on the third line. Therefore we convert each candidate s (a list of three strings) to a tuple ([1|' '<-s!!2],s), where s appears as second component and the first component is a list with as many elements as there are spaces in the third line of s (s!!2 because of 0-indexing). As list element 1 is used, but the actual element is irrelevant as long as it is the same for all candidates.

Altogether, this yields the list of tuples

[([1],["xyz"," yw","R M"]),([],[" xyz","yw","AMRR"]),([],["xyz","yw","MMR"])]
The build-in maximum selects the largest element of this list, where tuples are compared lexicographically, that is component-wise from left to right. As [1] is larger than [], the first tuple is selected, and snd returns the second of component, that is the list of lines, of the tuple.


1 +15 bytes to fix a bug where A-changes at the end of a string would be displayed as R-changes

\$\endgroup\$
1
  • \$\begingroup\$ lol this makes the userscript think that this is 1 byte \$\endgroup\$ Commented Oct 10, 2017 at 15:12
4
\$\begingroup\$

Python 2, 548 536 484 5001 488 447 3812 373 371 357 350 bytes

s,t=input() n,m=len(s),len(t) r=range L=[[(j,'RA'[i<1]*j)for j in r(i,m-~i)]for i in r(n+1)] for i in r(n): for j in r(m):M,R=L[i][j:j+2];A=L[i+1][j];v,V=min(A,R,M);x=('AR'[v in R],'M_'[s[i]==t[j]])[v in M];_=M;X=eval(x)[1]+x;L[i+1][j+1]=v+(x<'_'),X for i in r(len(X)):s=s[:i]+' '*('B'>X[i])+s[i:];t=t[:i]+' '*('R'==X[i])+t[i:] print s+'\n'+t+'\n'+X 

Try it online!

Uses 'ARM_' instead of 'ARM '

Works by building up the Levenshtein matrix, and then traversing backwards to find the operations. Now builds the operation-string at the same time as the Levenshtein matrix, as in Arnauld's JS answer

1: More bytes, as it did not work if the first string was a single character.

2: Switched to building the combinations in the Levenshtein matrix.

\$\endgroup\$
4
  • \$\begingroup\$ +1 for taking less than 60 seconds for 6 character words like my initial (failed) attempt lol \$\endgroup\$ Commented Oct 5, 2017 at 14:00
  • \$\begingroup\$ Nice answer! +1 from me. Since I never program in Python I can't really help you with the golfing, except for one thing: m+i+1 can be m-~i. \$\endgroup\$ Commented Oct 5, 2017 at 14:13
  • \$\begingroup\$ You can use a tab instead of double spaces on line 7. \$\endgroup\$ Commented Oct 5, 2017 at 14:16
  • \$\begingroup\$ You can get to 463 bytes by reducing the wile loop to one line: while i+j<n+m:v,A=(L[i]+[m,m])[j:j+2];R,M=(L[i+1]+[m,m])[j:j+2];d=min(A,R,M);q=M==d or(R==d)*2;r+=' '*(d==v==M)or'AMR'[q];i+=q>0;j+=q<2 \$\endgroup\$ Commented Oct 5, 2017 at 18:42
1
\$\begingroup\$

Python 2, 310 bytes

from difflib import* a,b=input() U,j=[],' ' for g,s,t,x,y in SequenceMatcher(0,a,b).get_opcodes(): 	g,Y,T=g[0],y-x,t-s 	z,A,B=Y-T,a[s:t],b[x:y] 	if'q'<g:U+=[A+z*j,B+j*-z,'M'*min(T,Y)+'A'*z+'R'*-z] 	if'e'==g:U+=[A,B,j*Y] 	if'i'==g:U+=[j*Y,B,'A'*Y] 	if'e'>g:U+=[A,j*T,'R'*T] for e in[0,1,2]:print''.join(U[e::3]) 

Try it online!

Using difflib.SequenceMatcher that computes an alignment between two strings

\$\endgroup\$
4
  • \$\begingroup\$ This seems to give some incorrect results for some of the other test cases. More particular: "This_is_an_example_text","This_is_a_test_as_example" \$\endgroup\$ Commented Oct 6, 2017 at 16:10
  • \$\begingroup\$ @KevinCruijssen thanx ,I just fixed it ^_^ \$\endgroup\$ Commented Oct 6, 2017 at 16:25
  • \$\begingroup\$ Nice, gj! But umm.. the third test case (and fourth as well) is also incorrect unfortunately. One of the two lines should be unmodified (except for leading/trailing spaces). Both lines currently contain spaces in the middle. \$\endgroup\$ Commented Oct 6, 2017 at 17:55
  • \$\begingroup\$ @KevinCruijssen thanx again, I’m fixing it \$\endgroup\$ Commented Oct 6, 2017 at 17:58
1
\$\begingroup\$

Mathematica, 250 256 259 384 bytes

~0.00035 seconds for the java code case.

(i=o=p="";a=Array;r=StringLength;If[Length@#>0,g=#&@@#;h=#[[2]];u=r@g;v=r@h;i=i<>g;o=o<>h;w=Abs[v-u];k=a[" "&,w];If[u<v,i=i<>k,o=o<>k];p=p<>a["M"&,u~Min~v];p=p<>a[If[u>v,"R","A"]&,w],i=i<>#;o=o<>#;p=p<>a[" "&,r@#]]&/@SequenceAlignment[#,#2];{i,o,p})& 

Usage: f["ABCDE", "ABCDF"]

Try it online!

The code is based on a fact that SequenceAlignment works by default on

With the default setting SimilarityRules->Automatic, each match between two elements contributes 1 to the total similarity score, while each mismatch, insertion, or deletion contributes -1.

Namely, the scoring is calculated by M, A and R, accordingly.

Example:

example

\$\endgroup\$
3
  • 2
    \$\begingroup\$ Hmm, I've never programmed in Mathemetica, but isn't it possible to change i="";o="";p=""; to i="";o=i;p=i; to reduce 2 bytes? \$\endgroup\$ Commented Oct 6, 2017 at 6:50
  • 2
    \$\begingroup\$ What about i=o=p=""? \$\endgroup\$ Commented Oct 6, 2017 at 12:26
  • \$\begingroup\$ @DavidC Yes I had realized that and changed it already. thank you anyway \$\endgroup\$ Commented Oct 6, 2017 at 19:31
1
\$\begingroup\$

D, 351 345 bytes

-6 bytes thanx to KevinCruijssen

string f(string a,string b){import std.algorithm;string x,y,z;size_t i,j,k;foreach(c;levenshteinDistanceAndPath(a,b)[1]){final switch(c)with(EditOp){case none:x~=a[i++];y~=b[j++];z~=" ";break;case substitute:x~=a[i++];y~=b[j++];z~="M";break;case insert:x~=" ";y~=b[j++];z~="A";break;case remove:x~=a[i++];y~=" ";z~="R";}}return x~"\n"~y~"\n"~z;} 

Try it online!

\$\endgroup\$
1
  • \$\begingroup\$ You can golf six bytes by removing the last break;. +1 though, first time I'm seeing the programming language D. \$\endgroup\$ Commented Oct 10, 2017 at 13:00

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.