Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

7
  • How do you define a match? You say anywhere in file two, but doesn't it need to at least be in a quoted string? Can protein_3.p1 appear outside quotes in file2 and, if it does, is that a match? Commented Nov 24, 2020 at 19:26
  • Yes, an ideal solution would include protein_3.p1 appearing outside of quotes as a positive match. Commented Nov 24, 2020 at 19:36
  • And would thisisnotprotein_3.p1 be considered a match? Or do we need to find it as a "word", so surrounded by spaces or other non-word characters? Commented Nov 24, 2020 at 20:02
  • Either you have always only a unique "protein.*' for every row of file2, in a well-defined (fixed or by regexp) position of the last part, but you say "at any position", or your description is not determinative for the output. I mean probably you should slightly modify the description, and then the matching would be faster also. Commented Nov 24, 2020 at 20:04
  • I mean file2 cannot have: chromosome_1 programID transcript_id "protein_1.p1"; parent "protein_2"; right? Because it would have two matches "for any position". So it has a unique match into file1/filed1. Is this always in a specific field of file2? for example the 4th or the last field. Commented Nov 24, 2020 at 20:26