Search K
Appearance
Appearance
These errors happen on Cassandra write timeouts. If Dovecot can't be sure that the write succeeded, it will log this error and keep the object in the object storage. Eventually these should be deleted though when Cassandra is having less problems.
For this find the uncertain-delete(1)
script in the obox package.
NOTE
The error and the description on this page should generally not happen on a default setup. Only if the no-cleanup-uncertain
Dictmap Parameters is explicitly given should this page be relevant.
TIP
Monitor the fs_dictmap_dict_write_uncertain
event and especially whether the cleanup
field contains failed
. And inspect the logs for messages like file write state is uncertain for object ID
.
A detailed explanation:
When Cassandra can't achieve the requested consistency (each-quorum in this case) during the write time, it returns a failure. However, the write may still have partially succeeded and Cassandra may eventually repair/replicate it enough times so that the write did actually succeed. In these cases Dovecot can't be sure whether the write will eventually succeed or not, and it logs the "success is uncertain, not deleting object ID" error. If the failure was about an email object, it means that Dovecot replies to the IMAP/LMTP client that the mail couldn't be saved, so typically the client will re-deliver the mail again at a later time. There are two possible outcomes after this:
After an uncertain write, Dovecot immediately attempts to delete the uncertain data. The deletion of course may again fail with uncertainty, but unless it completely failed it is still eventually going to work.
The purpose of the uncertain-delete(1)
script is to find out these leaked storage objects and delete them to avoid wasting disk space - there is no user visible impact. So the script should be run a long time (e.g. 1 day) after the "success is uncertain, not deleting object ID" errors to give Cassandra some time to finish its repairs/replication and find out whether the write actually succeeded or not. If the uncertain-delete(1)
is run too early, Cassandra could repair the write and it would point to a storage object that no longer exists, resulting in Obox Troubleshooting: Object exists in dict, but not in storage errors.
It's not safe to run a script that just deletes uncertain writes a long time after they happen. Otherwise mails could get lost: