Find the list that best matches reference list

Question

I need to find how well several different lists match a reference list. I'm looking for a percentage or some kind of similarity score.

For example,

a = {"A278", "G279", "S280", "G281", "I282", "I283", "I284", "S285", "D286", "T287", "P288", "V289", "H290", "D291", "C292"} b = {"S280", "G281", "I282", "I284"} c = {"C275", "S276", "T277", "A278", "G279"}

How can I determine that b is a better match against a than c? a is the reference list.

Order matters.

After looking through the documentation, the only way I can think of doing this is to iterate through b and c and test if each element is MemberQ of a, tallying up the total and comparing the totals at the end. Is there a better approach?

You might consider looking through the whole bunch of *Distance[]/*Dissimilarity[] functions available. SequenceAlignment[] might also be of use. — J. M.'s missing motivation
– J. M.'s missing motivation, Commented Oct 8, 2018 at 4:24
Testing as described in the last paragraph of the question does not take account of order. If order actually does not matter, consider Complement. — bbgodfrey
– bbgodfrey, Commented Oct 8, 2018 at 4:33

user1066 · Accepted Answer · 2018-10-08 09:51:04Z

Maybe:

LongestCommonSequence[a, b]

{"S280", "G281", "I282", "I284"}

LongestCommonSequence[a, c]

{"A278", "G279"}

Length@LongestCommonSequence[a, #] & /@ {b, c}

{4, 2}

kglr · Accepted Answer · 2018-10-08 05:40:31Z

9

MaximalBy[Length[a⋂#]&]@{b,c}

{{"S280", "G281", "I282", "I284"}}

MinimalBy[Length@Complement[a,#]&]@{b,c}

{{"S280", "G281", "I282", "I284"}}

edited Oct 8, 2018 at 5:40

answered Oct 8, 2018 at 5:18

kglr

403k18 gold badges501 silver badges959 bronze badges

$\begingroup$ +1 for conciseness, this is better than my answer $\endgroup$

brienna
– brienna

2018-10-08 05:21:21 +00:00
Commented Oct 8, 2018 at 5:21
$\begingroup$ @briennakh, yours is probably faster. $\endgroup$

kglr
– kglr

2018-10-08 05:22:07 +00:00
Commented Oct 8, 2018 at 5:22

Add a comment |

MarcoB · Accepted Answer · 2019-02-20 18:19:41Z

6

Suppose I have the reference list a and a matrix otherLists of all other lists I want to compare against a:

otherLists[[Ordering[Length[#] & /@ (Complement[a, #] & /@ otherLists), 1]]]

This will return the list that best matches a.

edited Feb 20, 2019 at 18:19

MarcoB

68k19 gold badges98 silver badges205 bronze badges

answered Oct 8, 2018 at 4:39

brienna

1,0715 silver badges13 bronze badges

1

$\begingroup$ you can use just Length instead of Length[#] &. $\endgroup$

kglr
– kglr

2018-10-08 05:28:53 +00:00
Commented Oct 8, 2018 at 5:28
$\begingroup$ Please don't edit my answer @MarcoB — Write a comment. $\endgroup$

brienna
– brienna

2019-02-20 18:18:38 +00:00
Commented Feb 20, 2019 at 18:18
$\begingroup$ I rolled back my changes. @kglr 's comment is suggesting the same change. Is there a reason you prefer to retain your version? $\endgroup$

MarcoB
– MarcoB

2019-02-20 18:21:41 +00:00
Commented Feb 20, 2019 at 18:21
$\begingroup$ My answer works. If you want to optimize my answer, you can add your suggestion as a comment or upvote kglr's comment. Thank you. @MarcoB $\endgroup$

brienna
– brienna

2019-02-20 18:23:15 +00:00
Commented Feb 20, 2019 at 18:23
$\begingroup$ @briennakh It certainly works, but the usage of Length[#]& where Length would suffice is unnecessary. This site is collaboratively edited, so I made a change that, in my opinion, improved this answer for future readers. Anyway, I will leave it as it was; perhaps you might consider making a change yourself. $\endgroup$

MarcoB
– MarcoB

2019-02-20 18:33:34 +00:00
Commented Feb 20, 2019 at 18:33

| Show 1 more comment

NonDairyNeutrino · Accepted Answer · 2018-10-08 05:18:47Z

Going off your percent similarity idea, maybe something like

listsim[ref_, test_] := {#, 100. (1 - Length@Complement[ref, #]/Length@ref)} & /@ test listsim[a, {a, b, c}]

{{{A278,G279,S280,G281,I282,I283,I284,S285,D286,T287,P288,V289,H290,D291,C292},100.} {{S280,G281,I282,I284},26.6667}
{{C275,S276,T277,A278,G279},13.3333}}

which ended up being in a similar vein to your answer.

$\begingroup$ I like this!! Thanks! $\endgroup$

brienna
– brienna

2018-10-08 05:18:07 +00:00
Commented Oct 8, 2018 at 5:18 — brienna
– brienna, Commented Oct 8, 2018 at 5:18

Syed · Accepted Answer · 2024-04-06 13:50:24Z

Using SequenceAlignment:

a = {"A278", "G279", "S280", "G281", "I282", "I283", "I284", "S285", "D286", "T287", "P288", "V289", "H290", "D291", "C292"}; b = {"S280", "G281", "I282", "I284"}; c = {"C275", "S276", "T277", "A278", "G279"}; SequenceAlignment[a, b] // Select[VectorQ] // MaximalBy[Length]

{{"S280", "G281", "I282"}}

SequenceAlignment[a, c] // Select[VectorQ] // MaximalBy[Length]

{{"A278", "G279"}}

E. Chan-López · Accepted Answer · 2024-04-06 20:15:37Z

a = {"A278", "G279", "S280", "G281", "I282", "I283", "I284", "S285", "D286", "T287", "P288", "V289", "H290", "D291", "C292"}; b = {"S280", "G281", "I282", "I284"}; c = {"C275", "S276", "T277", "A278", "G279"};

Using DeleteCases:

DeleteCases[a, Except[Alternatives @@ b]]

{"S280", "G281", "I282", "I284"}

DeleteCases[a, Except[Alternatives @@ c]]

{"A278", "G279"}

Length@DeleteCases[a, Except[Alternatives @@ #]] & /@ {b, c}

{4, 2}

eldo · Accepted Answer · 2024-04-06 14:41:52Z

a = {"A278", "G279", "S280", "G281", "I282", "I283", "I284", "S285", "D286", "T287", "P288", "V289", "H290", "D291", "C292"}; b = {"S280", "G281", "I282", "I284"}; c = {"C275", "S276", "T277", "A278", "G279"};

Using SymmetricDifference (new in 13.1) and TakeSmallestBy (new in 10.1)

Extract[{b, c}, TakeSmallestBy[ Map[SymmetricDifference[a, #] &, {b, c}] -> {"Index"}, Length, 1]]

{{"S280", "G281", "I282", "I284"}}

Stack Exchange Network

Find the list that best matches reference list

7 Answers 7

Hot Network Questions

Find the list that best matches reference list

7 Answers 7

Related

Hot Network Questions