du/diff command with -a list also directories and subdirectories. I want only the files in subdirectories and directories, not these ones.
I know about --exclude option, but i dont know how to manipulate it to do that. thanks.
If I understand you correctly, you only want to see the sizes of all files in the directory tree, not the total size of the contents of any directories themselves. Unfortunately, the --exclude option of du doesn't appear to support using something like / to indicate directories, e.g. du --exclude='*/' will still output the sizes of directories.
Instead of using any options of du itself to filter out directories, you can use a command like find to get a list of files only (e.g. using its -type f option), and then pass this list to du. The find command outputs each filename on its own line, and we can pipe this list of filnames to du with the aid of xargs. The xargs command expects individual arguments to be delimited by any whitespace character (e.g. space, tab, newline), but in case any filenames contain whitespace, then xargs will not do what we expect, so instead we tell find to delimit the filenames with NULL characters with -print0, and tell xargs to expect such input with -0:
find . -type f -print0 | xargs -0 du -b
I would like to find difference in bytes in files. [...] diff prints some files without + or -. Why? They may differ in some other attribute except size?
To do this, you need to directly compare the file sizes of the two files whose sizes you wish to compare. The diff command does not do this. Rather, diff is used for comparing the contents of two files, e.g. if file a.txt contains the following...
a b c
... and file b.txt contains the following...
a b d
then diff a.txt b.txt outputs this:
3c3 < c --- > d
This tells you that difference between the two files is this: on line 3 of a.txt, the line c was removed (<) and the line d was added (>).
Using diff with the -u option causes it to format the output in the style of a "unified context" patch file, as is used by the patch command, and similar in style to patch files used by other tools, such as Git. That is, diff -u a.txt b.txt gets you this instead:
--- a.txt 2023-08-22 00:38:07.477617454 +0100 +++ b.txt 2023-08-22 00:38:12.533616240 +0100 @@ -1,3 +1,3 @@ a b -c +d
This should help you understand why you are seeing + and - in the output of the command you have run. Specifically, cd $dira && du -ab | sort -k2 outputs the sizes of the contents of $dira, sorted by item name, and thus diff -u <(...) <(...) takes two such outputs and shows you the differences between those outputs. Lines preceded by - indicate files that exist in $dira but not in $dirb, and vice-versa for lines preceded by +.
The diff command does not do anything more intelligent, such as directly showing you the difference in file sizes between specific pairs of files in $dira and $dirb. For that, you need to somehow specify which pairs of files you'd like to compare the sizes of.
For example, if you want to compare the sizes of $dira/news_a and $dirb/news_b, then you should do so directly. If you want to only compare the sizes of pairs of files in $dir_a and $dir_b whose names are exactly the same, e.g. $dir_a/news_a and $dir_b/news_a, then this can be done programatically, as in the following Bash script:
#!/bin/bash script_location="$( dirname "$(readlink -f "${BASH_SOURCE:-$0}")" )" dir_a="$1" dir_b="$2" cd "$dir_a" dir_a_filenames="$(find . -type f)" cd "$script_location" cd "$dir_b" dir_b_filenames="$(find . -type f)" # Combine filename lists all_filenames="$( sort -u <(echo "$dir_a_filenames") <(echo "$dir_b_filenames") )" # For each filename in $all_filenames, compare the size of that file in $dir_a with the same file in $dir_b IFS=$'\n' cd "$script_location" for file in $(echo "$all_filenames"); do file_a="$dir_a/$file" file_b="$dir_b/$file" file_a_size="$(if [ -f "$file_a" ]; then stat --format='%s' "$file_a"; else echo 0; fi)" file_b_size="$(if [ -f "$file_b" ]; then stat --format='%s' "$file_b"; else echo 0; fi)" size_diff=$(($file_b_size - $file_a_size)) echo -e "$file\tA size = $file_a_size\tB size = $file_b_size\tSize difference = $size_diff" done
The $IFS environment variable defines what characters are used as item delimiters in constructs such as for loops. Here, we set it to the newline character, $'\n', for a similar reason as we used NULL delimiters with xargs earlier.
We use stat instead of du to get the file sizes, since it is a bit quicker, and we treat the sizes of non-existent files as being zero for the purposes of reporting their size and calculating the size differences; the command [ -f filename ] is used to check whether the file filename exists.
The Bash syntax $((...)) is used to perform calculations, e.g. $((2+3)) outputs 5; here we are just subtracting one file size from the other.