Timeline for What makes grep consider a file to be binary?

Current License: CC BY-SA 4.0

25 events

when toggle format	what		by	license	comment
May 6 at 11:46	history	edited	Ciro Santilli OurBigBook.com	CC BY-SA 4.0	added 6 characters in body
Sep 27, 2024 at 10:03	comment	added	Thomas Guyot-Sionnest		@jrw if a writer seeks past the end of the file before writing, or if the file is truncated while a writer is writing to the file, the part between the beginning (or end of written sections from the truncating process) will be filled with null bytes (all zero bits). Full blocks will usually not even be allocated to disk unless written to, that's called a sparse file and can have a size much bigger than the real disk size. On most systems it's also possible to explicitly "poke holes" in the middle of files, i.e. deallocate blocks turning them to nulls.
Dec 3, 2023 at 20:59	history	edited	Cristian Ciupitu	CC BY-SA 4.0	new style formatting; real headers; links to source code
Mar 20, 2021 at 8:43	history	edited	Ciro Santilli OurBigBook.com	CC BY-SA 4.0	added 27 characters in body
Mar 20, 2021 at 8:27	history	edited	Ciro Santilli OurBigBook.com	CC BY-SA 4.0	deleted 89 characters in body
Mar 4, 2021 at 6:33	comment	added	Ciro Santilli OurBigBook.com		@Quasímodo cirosantilli.com/…
Mar 3, 2021 at 20:20	comment	added	Quasímodo		Dammit, the only answer that thoroughly and precisely addresses the questions is sits down here with 10% of the votes of the most voted one.
S Apr 15, 2018 at 7:34	history	suggested	user273376	CC BY-SA 3.0	Corrected links..
Apr 15, 2018 at 3:20	review	Suggested edits
S Apr 15, 2018 at 7:34
Nov 20, 2017 at 10:03	history	edited	Ciro Santilli OurBigBook.com	CC BY-SA 3.0	added 31 characters in body
Apr 13, 2017 at 12:36	history	edited	CommunityBot		replaced http://unix.stackexchange.com/ with https://unix.stackexchange.com/
Jun 12, 2016 at 14:12	comment	added	jrw32982		@CiroSantilli巴拿馬文件六四事件法轮功 sparse file
Jun 12, 2016 at 6:59	comment	added	Ciro Santilli OurBigBook.com		@jrw32982 thanks for input! What does a "hole" in the file mean?
Jun 12, 2016 at 4:02	comment	added	jrw32982		@CiroSantilli巴拿馬文件六四事件法轮功 The grep 2.16 source looks substantially different than the grep 2.24 source. There is no `encoding_error_output`. The checks are for a NUL in the first buffer or if there are "holes" in the file indicating a NUL character somewhere. If `-z` is specified, then it checks instead for `\x80` (`\200`).
Jun 9, 2016 at 18:13	comment	added	Ciro Santilli OurBigBook.com		@jrw32982 2.24, same I opened source for. Ubuntu 16.04.
Jun 8, 2016 at 23:33	comment	added	jrw32982		@CiroSantilli巴拿馬文件六四事件法轮功 what version of GNU grep did you test against?
Jun 8, 2016 at 19:20	comment	added	Ciro Santilli OurBigBook.com		@jrw32982 interesting. Maybe open up 2.16 and see if the `encoding_error_output` is there. Maybe it was added since.
Jun 8, 2016 at 18:15	comment	added	jrw32982		@StéphaneChazelas I was not able to reproduce the UTF locale part of this with GNU grep 2.16. `printf 'a\x80' \| LC_ALL=en_US.UTF-8 grep a` did not warn, whereas changing `80` to `00` did warn.
Apr 13, 2016 at 13:09	comment	added	Stéphane Chazelas		I didn't look into great detail either, but did very recently
Apr 13, 2016 at 13:05	comment	added	Ciro Santilli OurBigBook.com		@StéphaneChazelas "Note that the check for valid UTF-8 only happens in UTF-8 locales": do you mean about the `export LC_CTYPE='en_US.UTF-8'` as in my example, or something else? Buf read: amazing example, added to answer. You have obviously read the source more than me, reminds me of those hacker koans "The student was enlightened" :-)
Apr 13, 2016 at 13:00	history	edited	Ciro Santilli OurBigBook.com	CC BY-SA 3.0	added 934 characters in body
Apr 13, 2016 at 12:18	comment	added	Stéphane Chazelas		Note that the check for valid UTF-8 only happens in UTF-8 locales. Also note that the check is only done on the first buffer read from the file which for a regular file seems to be 32768 bytes on my system, but for a pipe or socket can be as small as one byte. Compare `(printf '\n\0y') \| grep y` with `(printf '\n'; sleep 1; printf '\0y') \| grep y` for instance.
Apr 13, 2016 at 12:10	history	edited	Ciro Santilli OurBigBook.com	CC BY-SA 3.0	added 40 characters in body
Apr 13, 2016 at 2:02	comment	added	user394		Impressive explication!
Apr 12, 2016 at 20:50	history	answered	Ciro Santilli OurBigBook.com	CC BY-SA 3.0

toggle format