7
\$\begingroup\$

The code works, but I could use help testing it more comprehensively and double-checking that everything I'm doing makes sense, and that all the options can coexist with each other. I'm running into the problem that I've been staring at my own code for too long and could really just use outside impressions in general.

#!/usr/bin/env bash ########################################################################### # Hardlinks all files from one directory to another (like copying but # # both files end up pointing to the same data), optionally recursing, # # and optionally resolving conflicts interactively. # ########################################################################### # Usage: # # lndir [options] <source_dir> <destination_dir> # # # # Options: # # -r, --recursive Process directories recursively # # -d, --dryrun Dry run mode (no changes are made) # # -i, --interactive Resolve conflicts interactively # # -s, --sync Two-way synchronization mode (non-recursive) # # -f, --force Force overwrite of conflicting files # # -n, --nobak Do not create backup (.bak) when overwriting # # -h, --help Display this help message # ########################################################################### _lndir() { # Helper functions for messages and help _show_help() { local s; [ -t 1 ] && s="$(tput smul 2>/dev/null || echo '')" local r; [ -t 1 ] && r="$(tput rmul 2>/dev/null || echo '')" echo "NAME" echo " lndir - Hardlink files from one directory to another, with optional sync mode" echo "SYNOPSIS" echo " lndir [${s}options${r}] <${s}source_dir${r}> <${s}destination_dir${r}>" echo "OPTIONS" echo " -r, --recursive Process directories recursively" echo " -d, --dryrun Dry run mode (no changes are made)" echo " -i, --interactive Prompt on conflict" echo " -s, --sync Two-way synchronization mode (non-recursive)" echo " -f, --force Force overwrite of conflicting files" echo " -n, --nobak, Do not create backup (.bak) when overwriting" echo " -h, --help Display this help message" } _error() { echo "[ERR] lndir: $*" >&2; } _warn() { echo "[WAR] lndir: $*" >&2; } _info() { echo "[INF] lndir: $*"; } # Option defaults local recursive=false local dryrun=false local interactive=false local sync=false local force=false local nobak=false local from_dir local to_dir # Parse arguments while [ $# -gt 0 ]; do case "$1" in -r|--recursive) recursive=true shift ;; -d|--dryrun|--dry-run) dryrun=true shift ;; -i|--interactive) interactive=true shift ;; -s|--sync) sync=true shift ;; -f|--force) force=true shift ;; -n|--nobak|--no-backup) nobak=true shift ;; -h|--help) _show_help return 0 ;; # End of options: Parse remainder as positional --) shift while [ $# -gt 0 ]; do if [ -z "$from_dir" ]; then from_dir="$1" else to_dir="$1" fi shift done break ;; -*) _error "Unknown option '$1'." return 1 ;; *) if [ -z "$from_dir" ]; then from_dir="$1" else to_dir="$1" fi shift ;; esac done # $from_dir is defined first, so if it isn't defined we know neither is $to_dir if [ -z "$from_dir" ]; then _error "Source and destination directories not provided." _show_help return 1 elif [ -z "$to_dir" ]; then _error "Destination directory not provided." _show_help return 1 fi # Backup a file by moving it to file.bak (adding extra .bak if needed) _backup_file() { local file="$1" local backup="$file.bak" while [ -e "$backup" ]; do backup="$backup.bak" done if mv "$file" "$backup"; then _info "Backup created: $file -> $backup" else _error "Failed to create backup for $file" fi } # Interactive conflict resolution # Prompts the user using the format: # "Choose an action for <basename>: 1 (keep source), 2 (keep destination), s (skip), c (cancel), d (diff), v (VSCode diff): " _handle_conflict() { local src="$1" local dst="$2" local basename local choice while true; do basename="$(basename "$src")" _info "Choose an action for $basename: 1 (keep source), 2 (keep destination), s (skip), c (cancel), d (diff), v (VSCode diff): " read -r choice case "$choice" in 1) _update_link "$src" "$dst" break ;; 2) _update_link "$dst" "$src" break ;; s) _info "Skipping conflict for $basename" break ;; c) _warn "Operation cancelled by user." return 1 ;; d) if ! diff "$src" "$dst"; then _warn "\`diff\` command failed or differences found." fi ;; v) if ! code --diff "$src" "$dst"; then _warn "\`code --diff\` command failed or differences found." fi ;; *) _warn "Invalid option. Please select again." ;; esac done } # Common helper to unconditionally update (overwrite) a link # Backs up the target unless nobak is set _update_link() { local src="$1" local dst="$2" if [ "$dryrun" = true ]; then _info "Would update link: $src -> $dst" else if [ "$nobak" = false ]; then _backup_file "$dst" fi if ln -f "$src" "$dst"; then _info "Updated link: $dst" else _error "Failed to update link: $dst" fi fi } # Common helper to resolve a conflict using force or interactive options _try_update_link() { local src="$1" local dst="$2" if [ "$force" = true ]; then if [ "$dryrun" = true ]; then _info "Would force update: $src -> $dst" else if [ "$nobak" = false ]; then _backup_file "$dst" fi if ln -f "$src" "$dst"; then _info "Force updated: $dst" else _error "Failed to force update: $dst" fi fi elif [ "$interactive" = true ]; then if [ "$dryrun" = true ]; then _info "Would prompt for conflict resolution for: $src -> $dst" else _handle_conflict "$src" "$dst" fi else _warn "File exists, cannot hardlink: $dst" fi } # One-way linking mode: hardlink files from source to destination _link_files() { # Unquoted "#" asserts start of string local parent_dir="${1/#"$from_dir"/"$to_dir"}" if [ ! -d "$parent_dir" ]; then if [ "$dryrun" = true ]; then _info "Would create directory: $parent_dir" else mkdir -p "$parent_dir" fi fi find "$1" -mindepth 1 -maxdepth 1 ! -name .DS_Store -print0 | while IFS= read -r -d $'\0' file; do local filename; filename="$(basename "$file")" local to_file="$parent_dir/$filename" if [ -d "$file" ]; then if [ "$recursive" = true ]; then _link_files "$file" fi else if [ -f "$to_file" ]; then local existing_link; existing_link="$(find "$to_file" -samefile "$file" 2>/dev/null)" if [ "$existing_link" ]; then _info "Hardlink already exists: $to_file" else _try_update_link "$file" "$to_file" fi else if [ "$dryrun" = true ]; then _info "Would link: $file -> $to_file" else ln -v "$file" "$to_file" fi fi fi done } # Sync mode: two-way synchronization of files in the top level of both directories # For files that exist in both directories, if they are not hardlinked: # - If the files are identical, update the destination to be a hardlink to the source. # - If they differ, then: # - With -f/--force, automatically overwrite (using source as authoritative), # backing up first unless -n/--nobak is given. # - With -i/--interactive, prompt for conflict resolution using the format above. # - Otherwise, print a conflict warning to stderr and skip. # Files that exist in only one directory are skipped. _sync_mode() { local -a from_files local -a to_files local -a all_files mapfile -t from_files < <(find -L "$from_dir" -maxdepth 1 -type f ! -name '.DS_Store' ! -name '*.bak' -exec basename {} \; | sort) mapfile -t to_files < <(find -L "$to_dir" -maxdepth 1 -type f ! -name '.DS_Store' ! -name '*.bak' -exec basename {} \; | sort) mapfile -t all_files < <(printf '%s\n' "${from_files[@]}" "${to_files[@]}" | sort -u) local file local src_file local dst_file for file in "${all_files[@]}"; do src_file="$from_dir/$file" dst_file="$to_dir/$file" if [ -e "$src_file" ] && [ -e "$dst_file" ]; then # Check whether they point to the same file; in other words whether their inodes are the same local existing_link; existing_link="$(find "$to_file" -samefile "$file" 2>/dev/null)" if [ "$existing_link" ]; then _info "Already synced: $file" else if cmp -s "$src_file" "$dst_file"; then _update_link "$src_file" "$dst_file" else _try_update_link "$src_file" "$dst_file" fi fi else _warn "Skipping, exists only in one directory: $file" fi done } # Clean up functions to avoid polluting global namespace local __old_trap; __old_trap="$(trap -p RETURN)" trap 'unset -f _lndir _show_help _error _warn _info _backup_file _handle_conflict _update_link _try_update_link _link_files _sync_mode; [ "$__old_trap" ] && eval "$__old_trap" || trap - RETURN' RETURN # Main execution: if sync mode is enabled, run sync mode; otherwise, one-way linking if [ "$sync" = true ]; then _sync_mode else _link_files "$from_dir" if [ "$dryrun" = false ]; then # If true, found files in $to_dir which aren't in $from_dir local found_extra=false find "$to_dir" -type f -print0 | while IFS= read -r -d $'\0' dest_file; do src_file="${dest_file/#$to_dir/$from_dir}" if [ ! -e "$src_file" ]; then if [ "$found_extra" = false ]; then _info "Extra files in destination not in source:" found_extra=true # Ensures header prints only once fi _info "$dest_file" fi done find "$to_dir" -type d -empty -print0 | while IFS= read -r -d $'\0' dir; do _warn "Empty directory: $dir" done else _info "Dry run enabled, no changes were made." fi fi } _lndir "$@" 
\$\endgroup\$
2
  • 2
    \$\begingroup\$ This functionality (and much more) is provided by GNU cp with -l/--link, apart from the --dryrun option. \$\endgroup\$ Commented Feb 7 at 8:02
  • \$\begingroup\$ I figured it had to exist somewhere, written by someone much smarter than me! My thought was worst case I learn a thing or two. 🙃 \$\endgroup\$ Commented Feb 7 at 13:47

3 Answers 3

5
\$\begingroup\$

Overview

It is great that the code has documentation in the form of comments at the top of the file, and that you provided usage information via the --help user option. The usage help is pretty standard for Unix scripts, which is a big plus.

The other comments are helpful as well.

It is great that you provide a --dryrun option.

Portability

When I run the command as:

lndir d1 d2 

I creates a directory with this bad name (with the double quotes in the name):

"d2" 

I already have these 2 directories:

d1 d2 

I expected the hard links to show up in the d2 directory, but they did not. Perhaps there is a portability issue. Maybe my version of bash or Linux variant is different from yours. Or, perhaps the shell I'm running in presents a different environment from yours.

bash --version GNU bash, version 4.2.46(2)-release (x86_64-redhat-linux-gnu) 

Regardless, when I change this line:

local parent_dir="${1/#"$from_dir"/"$to_dir"}" 

to:

local parent_dir="${1/#$from_dir/$to_dir}" 

then it behaves as expected: the links show up in d2.

Error reporting

The code does a good job of reporting an error if I don't supply both source_dir and destination_dir on the command line.

However, if the source_dir that I specify does not exist, I get an error from find like this:

find: 'foo': No such file or directory 

That's not too bad, but maybe it would be better to detect if the directory exists before running find. Also, the script continues to create the destination_dir if it also does not exist. It would probably be better to exit the script immediately if the source_dir does not exist.

Layout

The code layout is very good for the most part, with consistent indentation and usage of functions.

However, there are a handful of very long lines which are hard to read and understand and will be hard to maintain. Anything you can do to break those up would be beneficial.

Testing

Yes, testing is hard work. Just keep throwing new cases and combinations at it.

Tools

The ShellCheck tool can be used to analyze your code for potential bugs and other style issues. When I run it, it is pretty clean, with only 2 minor complaints. Good job there.

\$\endgroup\$
4
  • 1
    \$\begingroup\$ I looked at the shellcheck errors here, and I'm pretty convinced they are both false-positives. \$\endgroup\$ Commented Feb 7 at 8:17
  • \$\begingroup\$ @TobySpeight: I was surprised at how clean the report was to begin with. \$\endgroup\$ Commented Feb 7 at 11:28
  • \$\begingroup\$ Thank you! What is your bash --version? I think I'm going to remove those quotes regardless because while the quoting works fine for me I don't think it would ever actually be necessary... I ended up doing that because it's easier for me to interpret but I think it's unusual. \$\endgroup\$ Commented Feb 7 at 13:43
  • \$\begingroup\$ @Sinjai: You're welcome. I updated my answer with my bash version. \$\endgroup\$ Commented Feb 7 at 13:57
4
\$\begingroup\$

There's a possible bug that could cause infinite runaway in recursive operation.

We use test -d to determine whether a file is a directory, without also checking whether it's a symbolic link. That means that if we create a link that points to .., for example, the program will attempt to create an infinite set of nested directories.


There's some scope for simplification with our options. Since they are only ever set to the strings true or false, we can execute them as commands. For example, instead of this test:

 if [ "$nobak" = false ] 

we can simply write:

 if ! $nobak 

(There's no need to quote the expansions, as we're in complete control of the content of these variables.)

\$\endgroup\$
1
  • \$\begingroup\$ Good calls, thank you! I hated those = bool checks! \$\endgroup\$ Commented Feb 7 at 13:45
2
\$\begingroup\$

nesting

We have a just a single toplevel function, _lndir(). On the one hand, burying all its private helpers inside that function is kind of a nice touch. But on the other hand, I'm afraid I don't see the point. It's not like caller shall source lndir, right? We're running in a child bash which does some work and exits. So it's unclear why we're trying hard to keep the toplevel namespace so clean.

Consider putting more of the code at top level. It would save you one or two TAB indents.

nit: Since it's part of the Public API you export, it seems OK to strip leading _ underscore from one function and name it lndir.

spelling nit: Consider adopting the abbreviation "[WRN]" for warnings.

DRY

Defining a bunch of locals is nice. However, in the case of these two variables it inhibited DRYing up the code, so you might want to revisit that decision.

 if [ -z "$from_dir" ]; then from_dir="$1" else to_dir="$1" fi shift done break ... *) if [ -z "$from_dir" ]; then from_dir="$1" else to_dir="$1" fi shift 

Also that break seems redundant, given that both inner and outer loops terminate when all args are consumed.

another helper

 _error "Source and destination directories not provided." 

The associated checking code is very nice, but perhaps it could be buried in a helper function. It's further motivation for having a pair of {from,to}_dir globals.

long filenames

Many filesystems impose length limits on filenames. I envision a midnight cron job gone wrong and ignored for days, so backup="$backup.bak" produces files like "README.bak.bak.bak".

Consider instead using names like "README.bak2", ".bak3", etc.

two-way

As a user, it's unclear to me why sync mode is "a different thing". To sync directories A and B, couldn't I just $ lndir A B && lndir B A ? Couldn't the implementation in the code be simply "do it twice"?

manifest

Creating hardlinks rather than symlinks is not a choice I would make, but I imagine you have your reasons. Now next month the user will have to remember "what goes with what". I would start to worry if I saw a README with a link count of 3 or more, and maybe be faced with a find over a giant filesystem to sort out the details.

Consider recording the results of your operations in a "MANIFEST.txt" file, for the benefit of some poor system operator who comes along months later and wants to understand the mess. Or perhaps name it ".manifest.txt".

If someone wanted to unlink those READMEs, the OP code makes that a bit more challenging than it needs to be. It would cost a scan over the filesystem for high link counts, then further analysis of candidates to establish which pairs, triples, and N-tuples of files are "the same", leading to N-1 unlinks and creation of symlinks. Since knowledge of which was "the original" pathname would be lost forever, we'd need a compromise like favoring the alphabetically first pathname.

Storing a bookkeeping dot file in a destination directory could reveal where its contents originated from. It could also simplify subsequent lndir invocations, since the tool could learn details from the filesystem rather than from command line args.

dry run

The current --dryrun support is great. (Though frequent use of make -n would likely make me confuse it with --nobak.)

Imagine that --dryrun was the first requirement you implemented. It could lead to an interestingly different structure of the project.

First phase would be recursing through directories and examining link counts to assemble a TODO list of operations to perform. And --dryrun displays that for the user, then exits.

Second phase performs the operations on that list. This offers the advantage of making the hard part, the dangerous mutation part, very simple. It's just one file at a time, with no state for scanning or recursing. That makes it a perfect fit for automated unit tests that exercise a very simple helper.

\$\endgroup\$
5
  • \$\begingroup\$ I believe syncing originally ended up as a different concept because I had a use case where I needed to fix old links but not create new ones - so syncing file1 and file1 only if it already existed in both directories. But it wouldn't try linking dir1/file2 to dir2/file2 if it didn't already exist. That's what differentiates a "sync" from two copies. Potentially syncing could imply fixing broken links only (identical files, unlinked) but it doesn't currently. \$\endgroup\$ Commented Feb 9 at 13:33
  • \$\begingroup\$ Yes, the scope/namespace stuff only matters if you source the script instead of executing it. Defining it like this also lets you share it as a function rather than a whole script if you want to, though mostly I put everything inside a function so I can always return and never exit, to reuse code anywhere. The trap is unnecessary, but is it harmful? \$\endgroup\$ Commented Feb 9 at 13:35
  • \$\begingroup\$ from_dir and to_dir are the same variables set in two different places, not different variables with the same names repeated. The script sets positional arguments in two different places, I don't see a way to avoid repeating myself there, though I'd be happy to. The repetition removal I see is removing the "-- indicates end of options" feature. \$\endgroup\$ Commented Feb 9 at 13:39
  • \$\begingroup\$ Also, I believe you're correct about the redundant break. I hadn't considered that it'll be ready to break out naturally at that point. \$\endgroup\$ Commented Feb 9 at 13:46
  • \$\begingroup\$ Wouldn't that necessitate iterating through the files twice? \$\endgroup\$ Commented Feb 12 at 20:24

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.