1

We are operating Cassandra in an environment where occasionally the host machine running Cassandra is unpowered without a proper shutdown. We are ok about losing data, but our problem is that in rare situations Cassadra is not able to start up after an unclean shutdown. Startup then fails due to a Commit log corruption:

ERROR [main] 2024-05-02 12:39:13,834 JVMStabilityInspector.java:196 - Exiting due to error while processing commit log during initialization. org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Mutation checksum failure at 2378 in Next section at 2016 in CommitLog-7-1714580480939.log at org.apache.cassandra.db.commitlog.CommitLogReader.readSection(CommitLogReader.java:387) at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:244) at org.apache.cassandra.db.commitlog.CommitLogReader.readCommitLogSegment(CommitLogReader.java:147) at org.apache.cassandra.db.commitlog.CommitLogReplayer.replayFiles(CommitLogReplayer.java:191) at org.apache.cassandra.db.commitlog.CommitLog.recoverFiles(CommitLog.java:223) at org.apache.cassandra.db.commitlog.CommitLog.recoverSegmentsOnDisk(CommitLog.java:204) at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:353) at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:744) at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:878) 

We found out about the option "cassandra.commitlog.ignorereplayerrors", which seems to affect the error handling when the Commit log is processed. Some questions about this setting:

  1. Is it safe to enable this setting, if losing data written to Cassandra is acceptable?
  2. Are all Commit log corruptions ignored when enabling this setting or can it still happen that Cassandra is not able to start? Note that our key requirement is that Cassandra is able to start up after a power outage without manual intervention.
  3. Should "nodetool repair" be executed after Commit log corruptions have occurred?

Update to question, Oct 10, 2024:

We have enabled the "ignorereplayerrors" option and after a while we still ran into a situation where Cassandra was not able to start due to a Commit Log error.

This time the following error occurred at startup:

ERROR [main] 2024-10-21 09:18:28,276 CommitLogReplayer.java:494 - Ignoring commit log replay error org.apache.cassandra.db.commitlog.CommitLogReadHandler$CommitLogReadException: Encountered bad header at position 606236 of commit log /opt/cassandra/data/commitlog/CommitLog-7-1729338769682.log, with invalid CRC. The end of segment marker should be zero. 

We did some research and found the option "cassandra.commitlog.allow_ignore_sync_crc", which seems to affect the behavior in this case. After enabling this option, Cassandra was able to start.

Is it safe to enable this option, if data loss is acceptable?

Furthermore, we are wondering if there are still corruptions left, where Cassandra will not be able to start. Therefore, we are thinking about adding a startup check to our application, which is able to detect when Cassandra is stuck, check the system.log for defective Commit Log files and then delete those files automatically. Is this a reasonable strategy? Note that we cannot burden the users of our product to fix the database themselves by manually deleting corrupted Commit Log files.

1
  • 1
    You can manually delete the contents of the commitlog directory from the command line since you're not worried about data loss. and you will definitely need to run nodetool repair. Commented Oct 13, 2024 at 2:44

1 Answer 1

1

Is it safe to enable this setting, if losing data written to Cassandra is acceptable?

Yes.

Are all Commit log corruptions ignored when enabling this setting or can it still happen that Cassandra is not able to start? Note that our key requirement is that Cassandra is able to start up after a power outage without manual intervention.

I can't say for sure, but it will keep things going in this case.

Should "nodetool repair" be executed after Commit log corruptions have occurred?

Yes, you should definitely run a repair after restart (if this error has occurred).

To activate, look for your jvm.options file and set -Dcassandra.commitlog.ignorereplayerrors=true as one of the startup parameters.

6
  • Thanks a lot for your answer, Aaron! Regarding the repair: Do you think we need to do a full repair or is an incremental repair sufficient to recover inconsistencies due to the improper shutdown? Commented Oct 8, 2024 at 16:55
  • I would run a full repair, just to be on the safe side. Commented Oct 8, 2024 at 18:05
  • @jakob edit made. Commented Oct 8, 2024 at 18:05
  • 1
    Understood! Thanks, Aaron! Commented Oct 11, 2024 at 18:24
  • We have enabled the ignorereplayerrors option and after a while we still ran into a situation where Cassandra was not able to start due to a Commit Log error. I have written an update to my question, regarding the new issue. Any feedback is highly appreciated! Thanks! Commented Oct 22, 2024 at 20:19

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.