Timeline for What makes grep consider a file to be binary?
Current License: CC BY-SA 4.0
25 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| May 6 at 11:46 | history | edited | Ciro Santilli OurBigBook.com | CC BY-SA 4.0 | added 6 characters in body |
| Sep 27, 2024 at 10:03 | comment | added | Thomas Guyot-Sionnest | @jrw if a writer seeks past the end of the file before writing, or if the file is truncated while a writer is writing to the file, the part between the beginning (or end of written sections from the truncating process) will be filled with null bytes (all zero bits). Full blocks will usually not even be allocated to disk unless written to, that's called a sparse file and can have a size much bigger than the real disk size. On most systems it's also possible to explicitly "poke holes" in the middle of files, i.e. deallocate blocks turning them to nulls. | |
| Dec 3, 2023 at 20:59 | history | edited | Cristian Ciupitu | CC BY-SA 4.0 | new style formatting; real headers; links to source code |
| Mar 20, 2021 at 8:43 | history | edited | Ciro Santilli OurBigBook.com | CC BY-SA 4.0 | added 27 characters in body |
| Mar 20, 2021 at 8:27 | history | edited | Ciro Santilli OurBigBook.com | CC BY-SA 4.0 | deleted 89 characters in body |
| Mar 4, 2021 at 6:33 | comment | added | Ciro Santilli OurBigBook.com | @Quasímodo cirosantilli.com/… | |
| Mar 3, 2021 at 20:20 | comment | added | Quasímodo | Dammit, the only answer that thoroughly and precisely addresses the questions is sits down here with 10% of the votes of the most voted one. | |
| S Apr 15, 2018 at 7:34 | history | suggested | user273376 | CC BY-SA 3.0 | Corrected links.. |
| Apr 15, 2018 at 3:20 | review | Suggested edits | |||
| S Apr 15, 2018 at 7:34 | |||||
| Nov 20, 2017 at 10:03 | history | edited | Ciro Santilli OurBigBook.com | CC BY-SA 3.0 | added 31 characters in body |
| Apr 13, 2017 at 12:36 | history | edited | CommunityBot | replaced http://unix.stackexchange.com/ with https://unix.stackexchange.com/ | |
| Jun 12, 2016 at 14:12 | comment | added | jrw32982 | @CiroSantilli巴拿馬文件六四事件法轮功 sparse file | |
| Jun 12, 2016 at 6:59 | comment | added | Ciro Santilli OurBigBook.com | @jrw32982 thanks for input! What does a "hole" in the file mean? | |
| Jun 12, 2016 at 4:02 | comment | added | jrw32982 | @CiroSantilli巴拿馬文件六四事件法轮功 The grep 2.16 source looks substantially different than the grep 2.24 source. There is no encoding_error_output. The checks are for a NUL in the first buffer or if there are "holes" in the file indicating a NUL character somewhere. If -z is specified, then it checks instead for \x80 (\200). | |
| Jun 9, 2016 at 18:13 | comment | added | Ciro Santilli OurBigBook.com | @jrw32982 2.24, same I opened source for. Ubuntu 16.04. | |
| Jun 8, 2016 at 23:33 | comment | added | jrw32982 | @CiroSantilli巴拿馬文件六四事件法轮功 what version of GNU grep did you test against? | |
| Jun 8, 2016 at 19:20 | comment | added | Ciro Santilli OurBigBook.com | @jrw32982 interesting. Maybe open up 2.16 and see if the encoding_error_output is there. Maybe it was added since. | |
| Jun 8, 2016 at 18:15 | comment | added | jrw32982 | @StéphaneChazelas I was not able to reproduce the UTF locale part of this with GNU grep 2.16. printf 'a\x80' | LC_ALL=en_US.UTF-8 grep a did not warn, whereas changing 80 to 00 did warn. | |
| Apr 13, 2016 at 13:09 | comment | added | Stéphane Chazelas | I didn't look into great detail either, but did very recently | |
| Apr 13, 2016 at 13:05 | comment | added | Ciro Santilli OurBigBook.com | @StéphaneChazelas "Note that the check for valid UTF-8 only happens in UTF-8 locales": do you mean about the export LC_CTYPE='en_US.UTF-8' as in my example, or something else? Buf read: amazing example, added to answer. You have obviously read the source more than me, reminds me of those hacker koans "The student was enlightened" :-) | |
| Apr 13, 2016 at 13:00 | history | edited | Ciro Santilli OurBigBook.com | CC BY-SA 3.0 | added 934 characters in body |
| Apr 13, 2016 at 12:18 | comment | added | Stéphane Chazelas | Note that the check for valid UTF-8 only happens in UTF-8 locales. Also note that the check is only done on the first buffer read from the file which for a regular file seems to be 32768 bytes on my system, but for a pipe or socket can be as small as one byte. Compare (printf '\n\0y') | grep y with (printf '\n'; sleep 1; printf '\0y') | grep y for instance. | |
| Apr 13, 2016 at 12:10 | history | edited | Ciro Santilli OurBigBook.com | CC BY-SA 3.0 | added 40 characters in body |
| Apr 13, 2016 at 2:02 | comment | added | user394 | Impressive explication! | |
| Apr 12, 2016 at 20:50 | history | answered | Ciro Santilli OurBigBook.com | CC BY-SA 3.0 |