1

Okay, I think this is possible, but I can't quite figure it out. This is the situation.

A folder contains the log files of all the processes on my robot. The structure looks sort of like this:

$ ls -lrt total 8 drwxrwxr-x 2 per per 4096 nov 3 12:46 launch01 -rw-rw-r-- 1 per per 0 nov 3 12:47 camera112.log -rw-rw-r-- 1 per per 0 nov 3 12:47 motors121.log -rw-rw-r-- 1 per per 0 nov 3 12:47 lidar111.log drwxrwxr-x 2 per per 4096 nov 3 12:49 launch02 -rw-rw-r-- 1 per per 0 nov 3 12:49 motors122.log -rw-rw-r-- 1 per per 0 nov 3 12:49 lidar211.log -rw-rw-r-- 1 per per 0 nov 3 12:49 camera113.log 

The files camera112.log, motors121.log and lidar111.log are associated to the logs in folder launch01. I would like to write a script that gets all the files that belong to a specific launch and tar them into one tarball. Since timestamps can change between slightly by files and the numbers in the files are only nearly related, I think the best way to gather all relevant files is to get all files which are below launch01 (inclusive), up to the next directory in the list (exclusive). The number of files can vary, as can the time stamps and names. What is consistent is the folder, then a bunch of files, then the next folder, then files, etc. Ultimately, I would like to get the latest set of logs easily.

Unsure of the approach here. Any ideas how to go about this?

Clarifications:

  • Number of files can vary.
  • The exact timestamp is not reliable (as above, the folder launch01 is different than camera112.log) but relative timestamps work fine. For instance, if I could tar all files from launch01 (inclusive) to launch02 (exclusive) in the list provided by ls -lrt, that works great.
6
  • Welcome to the site. Please elaborate what you mean by "timestamps can change between slightly by files". Do you mean the timestamps are no reliable means to associate the files belonging together? The sort order of the ls -lrt command uses the timestamps, so if you can't rely on them ... Commented Nov 3, 2021 at 13:00
  • 2
    "all files which are below launch01" presumably you mean "all files that are newer than launch01"? Above and below have only visual meaning Commented Nov 3, 2021 at 14:32
  • 1
    Adding to @Theophrastus comment, maybe there's another way of linking the files to the folders that don't rely on something so unreliable as the dates; if you can think such option exists. Commented Nov 3, 2021 at 15:17
  • @roaima - "all files which are below launch01" - I took that to be referring to the visual of the output of ls -lrt... so, below launch01/ and above launch02/ ("up to the next directory in the list") Commented Nov 3, 2021 at 15:25
  • @Greenonline oh yes, I completely missed that possibility; I was looking at the set of files shown in the question Commented Nov 3, 2021 at 15:31

1 Answer 1

1

Splitting the task into chunks, using your input of

drwxrwxr-x 2 per per 4096 nov 3 12:46 launch01 -rw-rw-r-- 1 per per 0 nov 3 12:47 camera112.log -rw-rw-r-- 1 per per 0 nov 3 12:47 motors121.log -rw-rw-r-- 1 per per 0 nov 3 12:47 lidar111.log drwxrwxr-x 2 per per 4096 nov 3 12:49 launch02 -rw-rw-r-- 1 per per 0 nov 3 12:49 motors122.log -rw-rw-r-- 1 per per 0 nov 3 12:49 lidar211.log -rw-rw-r-- 1 per per 0 nov 3 12:49 camera113.log 

Create the "ordered" list of the filenames only

Use either one of these:

ls -lrt | tr -s ' ' | cut -d' ' -f9 ls -lrt | awk '{print $9}' 

gives:

launch01 camera112.log motors121.log lidar111.log launch02 motors122.log lidar211.log camera113.log 

Farm the list off into sections

Modifying this answer to Split one file into multiple files based on delimiter, create a file called awk_pattern containing the following:

BEGIN{ fn = "part1.txt"; n = 1 } { if (substr($0,1,6) == "launch") { close (fn) n++ fn = "part" n ".txt" } print > fn } 

and then running

ls -lrt | awk '{print $9}' | awk -f awk_pattern 

gives the required output:

part1.txt

launch01 

and then

part2.txt

launch01 camera112.log motors121.log lidar111.log 

part3.txt

launch02 motors122.log lidar211.log camera113.log 

Although the first file (part1.txt) should be discarded as it contains only one line...

rm part1.txt 

tar the contents of each part

From 6.3 Reading Names from a File

tar -c -v -z -T part2.txt -f part2.tgz 

Looping through the tar files

for part_file in $(ls part*) do tar_file = ${part_file%.*} # tar_file = basename ${part_file} .txt tar -c -v -z -T ${part_file} -f ${tar_file}.tgz done 

This should give

part1.tgz part2.tgz part3.tgz 

Again, part1.tgz should be discarded:

rm part1.tgz 

Putting it all together

#!/bin/bash ls -lrt | awk '{print $9}' | awk -f awk_pattern for part_file in $(ls part*) do tar_file = ${part_file%.*} tar -c -v -z -T ${part_file} -f ${tar_file}.tgz done rm part1.txt rm part1.tgz 

As just one script (incorporating the awk pattern)

#!/bin/bash ls -lrt | awk '{print $9}' | awk 'BEGIN{ fn = "part1.txt"; n = 1 } { if (substr($0,1,6) == "launch") { close (fn) n++ fn = "part" n ".txt" } print > fn }' for part_file in $(ls part*) do tar_file = ${part_file%.*} tar -c -v -z -T ${part_file} -f ${tar_file}.tgz done rm part1.txt rm part1.tgz 

This (hopefully) should work, although I have only tested the first two steps, i.e. up to the tar part, as I don't have the files to tarball up.


Possible improvements:

  1. Post-processing: Remove the part*.txt files (rm part*.txt)

  2. Post-processing: Remove the log files once tar'd up (rm *.log)

  3. Post-processing: Remove the directories once tar'd up (rm -R -- */)

    See this answer to How do I remove all sub-directories from within a directory?.

  4. Prevent awk from producing the useless part1.txt file

  5. Save the tar files elsewhere (... -f ${tar_path}/${tar_file}.tgz)

  6. Don't use intermediary part*.txt files.

1
  • Tested on OS X. Commented Nov 4, 2021 at 11:16

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.