Skip to main content

You are not logged in. Your edit will be placed in a queue until it is peer reviewed.

We welcome edits that make the post easier to understand and more valuable for readers. Because community members review edits, please try to make the post substantially better than how you found it, for example, by fixing grammar or adding additional resources and hyperlinks.

Required fields*

10
  • "I think from the project maintainers point of view this should be solvable: lsfd could honor the LC_CTYPE / LANG environment variables, assume the input is from that locale and translate it to UTF-8": that would only work for things that deal with text. Filenames are not text. They can be interpreted as text by users and can be text encoded in different charset by different users in different charsets. With lsfd -Jp "$pid" | jq -j '.lsfd[]|.name'' for instance, I want to get a list of raw file paths whether they're meant to represent text in one or several different charsets or not. Commented Oct 3, 2023 at 11:46
  • "lsfd could honor the LC_CTYPE / LANG environment variables, assume the input is from that locale and translate it to UTF-8". With the way lsfd behave now, you can get that outcome with lsfd | iconv -t utf-8 Commented Oct 3, 2023 at 11:50
  • @StéphaneChazelas no, that's not really true at all. You've not considered the issue of a character set having character codes overlapping ASKII values for JSON special characters. I'm talking about pre-processing the input, iconv post-processes the output. That can have a different result. Commented Oct 3, 2023 at 11:53
  • @StéphaneChazelas And I disagree with your characterisation of filenames as "not text". They are generated by humans at keyboards and by-and large there for humans to read. Otherwise we'd just have numerical abstract identifiers on them all. The fact that nobody is preventing you writing invalid byte sequences into the file name isn't proof that they are not text, just the lack of a safeguard. The whole point of what I'm saying here is that in light of that lack of a safeguard in the Kernel, lsof should have it's own. Commented Oct 3, 2023 at 11:57
  • In the case of lsfd (not lsof), that's the same, the text that lsfd outputs in its JSON (object key names) is ASCII (not ASKII) only so invariant across locales on a system (if we ignore the bogus ms-kanji still found on some BSDs). The only bytes >= 0x80 it outputs comes from input (process names, file names...) which don't have to be text, so that's the same ones iconv -t utf-8 will recode and that lsfd would recode if it was doing the recoding internally (typically with iconv()). Commented Oct 3, 2023 at 12:44