XFS: rm does not terminate, xfs_repair not possible

Question

I have a 500 GByte disk with a single XFS file system on it (EDIT: the OS is on another disk). On this disk I have backup data in the form of multiple hard-linked copies of the original data. After each new backup I delete the directory containing the data of the oldest backup. The corresponding rm process sometimes does not terminate (and consumes a lot of CPU). Killing it (-9) does not help, only rebooting the system does.

I tried running xfs_repair on that volume. However, it seems I do not have enough RAM for that (the machine has 4 GByte RAM and only supports 32bit).

The location of the machine makes it very hard for me to physically touch the hard disk.

How can I repair my file system and/or make rm terminate?

EDIT: I ran xfs_repair -v -t 1 /dev/disk/xxx with xfs_repair version 3.1.7. EDIT: Output:

Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - agno = 0 failed to create prefetch thread: Resource temporarily unavailable - agno = 1 failed to create prefetch thread: Resource temporarily unavailable - agno = 2 failed to create prefetch thread: Resource temporarily unavailable - agno = 3 failed to create prefetch thread: Resource temporarily unavailable - agno = 4 failed to create prefetch thread: Resource temporarily unavailable - agno = 5 failed to create prefetch thread: Resource temporarily unavailable - agno = 6 failed to create prefetch thread: Resource temporarily unavailable - agno = 7 fatal error -- calloc failed in dir_hash_init

What you are doing is not backup, it if revision control. I would recommend using a revision control tool in the future. — ctrl-alt-delor
– ctrl-alt-delor, Commented Dec 15, 2013 at 12:45
What was the output of xfs_repair -v -t 1? have you read this FAQ? Is your OS on this same filesystem or is it isolated? — bsd
– bsd, Commented Dec 15, 2013 at 14:00
richard, it is a backup. Whenever I delete a file by accident, I can recover it. In addition to the backup I have access to previous versions of files and, more importantly, to complete snapshots of my filesystem (what help is a previous version of a config file if the corresponding version of the tool is not available?). I am happy with this solution and do not indent to switch my setup to something else. — C-Otto
– C-Otto, Commented Dec 15, 2013 at 14:37
bdowning: I do not have that output available, sorry. Reproducing it takes a long while. I have read the FAQ. The OS is isolated from the disk. — C-Otto
– C-Otto, Commented Dec 15, 2013 at 14:39
how much swap space do you have allocated? My server has a single 10TB xfs filesystem. I have a 60GB SSD for my OS with about 4-6GB for the OS and all of the rest is configured as swap specifically so I can run xfs_check, xfs_repair and xfs_fsr. — bsd
– bsd, Commented Dec 15, 2013 at 18:40

frostschutz · Accepted Answer · 2013-12-15 14:02:15Z

Did you try strace the rm process so you could see what it's doing? When deleting lots and lots of files, XFS can just be painfully slow. I once foolishly used ccache on XFS, and it was way faster to move all other files, format, and move the files back, than attempt to rm -r the millions of ccache files. It would still have terminated eventually had I let it run its course.

As for xfs_repair, I never noticed it using a lot of memory but all my machines do have plenty of memory, so...

You could add swap (if that helps). Alternatively you could export the block device (using NBD through OpenVPN or SSH tunnel) to a machine that has more RAM available, although I am not sure if that would be faster or slower than transferring an image of the entire filesystem (possibly using xfsdump). Depends on how much data xfs_repair has to read/write during the process.

I will run strace the next time. However, as the problem only appears once in a while on the same kind of data, I don't think this is a normal (slow) delete operation. Adding swap might help, I'll try. — C-Otto
– C-Otto, Commented Dec 15, 2013 at 14:41
I guess swap won't help on this system, as it is a 32bit system. — C-Otto
– C-Otto, Commented Dec 15, 2013 at 14:59
"strace does not show anything" If rm is doing absolutely nothing at all, then you would appear to have a problem that is unrelated to the fact that you are running XFS. Does rm even get invoked? — user
– user, Commented Dec 16, 2013 at 20:08
Michael, I run strace on the PID of the rm process (which, as I already mentioned, consumes quite a lot of CPU). — C-Otto
– C-Otto, Commented Dec 18, 2013 at 8:18

Zelda · Accepted Answer · 2013-12-15 11:21:23Z

From the 256Mb SGI Indigo 2 I remember XFS checks could be a memory clog. We had to use XFS for the 2Gb+ files we had. In case of problems we backed the data to an external drive (scsi) and restored, after reformatting the data filesystem giving the problems.

Of course only works if you have the drive capacity (500Gb USB 3.0 drives are less then €50) and your system is not on the same partition (that is not 100% clear from your question).

Stack Exchange Network

XFS: rm does not terminate, xfs_repair not possible

2 Answers 2

You must log in to answer this question.

Hot Network Questions

XFS: rm does not terminate, xfs_repair not possible

2 Answers 2

You must log in to answer this question.

Related

Hot Network Questions