1

I would call this a curiosity more than a problem or anything else:

When using diff on two specific PDF files that are similar but not the same, I get some garbage dumped on the next command line.

For comparison, if I use two completely different files, I would get something like this

$ diff file1.pdf file2.pdf Binary files file1.pdf and file2.pdf differ 

However, when used on the files in question:

$ diff version1.pdf version2.pdf | head 13,15c13,15 < <xmp:CreateDate>2017-07-17T09:04:45-07:00</xmp:CreateDate> < <xmp:MetadataDate>2017-07-17T09:06:06-07:00</xmp:MetadataDate> < <xmp:ModifyDate>2017-07-17T09:06:06-07:00</xmp:ModifyDate> --- > <xmp:CreateDate>2017-07-17T09:05:14-07:00</xmp:CreateDate> > <xmp:MetadataDate>2017-07-17T09:06:04-07:00</xmp:MetadataDate> > <xmp:ModifyDate>2017-07-17T09:06:04-07:00</xmp:ModifyDate> 

[long output of differences removed]

> 0018068809 00000 n > 0018069141 00000 n > 0018073003 00000 n %%EOF189f22/Root 1 0 R/Info 321 0 R/ID[<1D82D53AF59549CBACDAEB25B1EED3D4><4527C00DC1904519AEFEA4304FBA532E>]>> \ No newline at end of file $ 1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;1;112;112;1;0x1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c1;2c 

The files are obviously different, but why should anything come out on the next command line? Should this be considered a bug with diff?

I couldn't find anything relevant except this issue (I thought it was relevant because (a) it's also related to PDF files and (b) the garbage output looks similar).

If you're curious about the actual files in question, they came from https://www.cosori.com/downloads/ (file 1 and file 2).

UPDATE: As suggested in the comments by @cas and @thecarpy, this is related to Why using cat on binary files messed up the terminal and how?, because if I do $ cat file1.pdf, I get a very similar situation as when using diff.

However, there seems to be a difference (if I understand correctly) in what the OP of that question considers to be "messed-up terminal": I also tried $ cat img.png, and I got the prompt appearing in a strange place, because, no doubt, of content in the binary file that matched control codes; but in my question, characters are entered into the command line after the diff (or cat) process is terminated. This also seems to me to be contradicting @x-tian's accepted answer to that question (that you need to pipe for commands to be interpreted).

To make the question more specific and more explicit: why is this happening exactly (i.e., what causes this)? If there's a pattern in the PDF file of the form

******** X 1;2c1;2c1;2c1;2c1......

where X is a control sequence that causes process to exit back to command line, what is X?

7
  • 2
    it's because the start of the pdf files fool diff into thinking that they're text files that can be diffed, and at least one of the output lines contains control characters that do weird things to your terminal. i.e. the same issue as cat-ing an image or other binary file to the terminal Commented Mar 7, 2018 at 12:28
  • 2
    e.g. See Why using cat on binary files messed up the terminal and how? Commented Mar 7, 2018 at 12:30
  • @cas: thanks, I hadn't seen that question before. I updated my question accordingly. Commented Mar 7, 2018 at 15:14
  • escape codes can cause the terminal to actually enter data into the command line as if typed. long ago that used to be a reasonably common prank/hack method to get people to display a file that would reprogram their terminal or keyboard or push keystrokes into their input buffer. BTW, you're mis-understanding what he said about piping into bash. Commented Mar 7, 2018 at 16:30
  • 1
    The example 1;2c, etc., is the terminal's response to the primary device attributes control (likely escape Z). Commented Mar 8, 2018 at 0:50

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.