Identify strings shared between multiple files from the Linux command line

Question

Given a set of arbitrary files, what's the best way to identify the text strings shared between them (either in all files or a subset of them) from the Linux command line?

This would be useful for quickly identifying ways to write Yara rules for clusters of similar malicious files (for instance, malicious executables).

recvfrom · Accepted Answer · 2021-04-06 17:47:02Z

Here's one approach, for malicious files in a directory named malware:

find malware/ -type f | xargs -n1 -P1 -I{} sh -c 'strings {} | sort | uniq' | sort | uniq -c | sort -n

The output will look something like the following, where the first number on each line is the number of files containing the string:

 ... 1 Sleep ... 2 JFIF 2 SetBkColor ... 5 !This program cannot be run in DOS mode. 5 t@PW 5 @tVH ...

One useful variation of this when the input files are Windows executables is using strings -el instead of strings, which will cause UTF-16 little-endian strings (also known as wide character strings) to be shown.

To tie string sequences back to the corresponding files use strings -f malware/* | grep <string>.

Stack Exchange Network

Identify strings shared between multiple files from the Linux command line

1 Answer 1

Hot Network Questions

Identify strings shared between multiple files from the Linux command line

1 Answer 1

Related

Hot Network Questions