0

My riak cluster have 10-node running version 2.9.8. All nodes same version. The node which named [email protected] had used about 95% space of disk. And other nodes only used less than 50% space of disk.

I'd tried to find out the data Compaction error like this post says:

find . -name "LOG" -exec grep -l 'Compaction error' {} \; ./308285501624487334308589769401090949458673270784/LOG ./336830455478606531929755488790080852186328203264/LOG ./365375409332725729550921208179070754913983135744/LOG ./793549717144513693868406999013919295828807122944/LOG 

The error messages in partition logs like below:

2024/05/25-16:30:51.332435 7f04c47f8700 Finalize level: 5, grooming 1 2024/05/25-16:30:51.332506 7f04c47f8700 Finalize level: 6, grooming 0 2024/05/25-16:30:51.332570 7f04c3ff7700 Compacting 1@6 + 0@7 files 2024/05/25-16:30:51.333295 7f04c3ff7700 compacted to: files[ 3 0 3 765 482 109 126 ] 2024/05/25-16:30:51.333312 7f04c3ff7700 Compaction error: IO error: /data/riak/leveldb/308285501624487334308589769401090949458673270784/sst_7/307388.sst: No such file or directory 2024/05/25-16:30:51.333319 7f04c3ff7700 Waiting after background compaction error: IO error: /data/riak/leveldb/308285501624487334308589769401090949458673270784/sst_7/307388.sst: No such file or directory 2024/05/25-16:30:52.334919 7f04c3ff7700 Finalize level: 5, grooming 1 2024/05/25-16:30:52.335003 7f04c3ff7700 Finalize level: 6, grooming 0 2024/05/25-16:30:52.335061 7f04c37f6700 Compacting 1@6 + 0@7 files 2024/05/25-16:30:52.335507 7f04c37f6700 compacted to: files[ 3 0 3 765 482 109 126 ] 2024/05/25-16:30:52.335522 7f04c37f6700 Compaction error: IO error: /data/riak/leveldb/308285501624487334308589769401090949458673270784/sst_7/307389.sst: No such file or directory 2024/05/25-16:30:52.335528 7f04c37f6700 Waiting after background compaction error: IO error: /data/riak/leveldb/308285501624487334308589769401090949458673270784/sst_7/307389.sst: No such file or directory 2024/05/25-16:30:53.337142 7f04c37f6700 Finalize level: 5, grooming 1 

All partitions used about 30GB each one, except which nodes have compation erros. Below the size of these partitions:

1.3T ../308285501624487334308589769401090949458673270784 67G ../336830455478606531929755488790080852186328203264 159G ../365375409332725729550921208179070754913983135744 577G ../793549717144513693868406999013919295828807122944 

Did the disk keep growing caused by these compation errors? After repair these partitions/vnodes, will the space been released? If not, what can I do?

1 Answer 1

0

The compaction errors claim that the sst file is missing (in leveldb, multiple pieces of data are saved inside each sst file). Having a file missing seems like a reasonable reason not to be able to compact data to there.

Did you try the "repairing corrupt LevelDB" instructions?

If not, I would recommend you try those.

If you already tried the repairs and that didn't work, I would then try repairing all partitions listed as corrupt using the "Repairing a partition" section.

If that doesn't work, I would then suggest stopping the target node, deleting all the data in each subfolder of the leveldb folder then starting the node again and running an all partition repair.

Finally, if that fails, stop the problem node and do a force-remove of it. Once complete, wipe the problem node and do a full re-install. Once the node has been re-installed, you can then re-add it to the cluster.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.