Search K
Appearance
Appearance
For example:
Query 'SELECT i,n FROM dovecot.user_mailbox_objects WHERE u = '1@2' AND g = 0xc92f64f79f0d1ed01e6d5b314f04886c AND b = 0' failed: Operation timed out - received only 0 responses.
This typically means that there are too many tombstones. Dovecot is supposed to prevent this from happening by dividing mails into buckets that have maximum of 10k messages. However, that doesn't always work perfectly.
There are two common possibilities for the "Operation timed out":
There are so many tombstones that Cassandra is too slow to go through them all and give a response. This will trigger a timeout on Cassandra side. You may get a warning in Cassandra logs about it, e.g.:
ReadCommand.java:569 - Read 6 live rows and 50000 tombstone cells for query ...
There are so many tombstones that Cassandra reaches the maximum number of tombstones before the query is aborted. Cassandra logs about it, e.g.:
MessageDeliveryTask.java:76 - Scanned over 100001 tombstones during query ...; query aborted
If the query doesn't fail entirely / always, you can use TRACING ON;
in cqlsh
before running the SELECT
to see what it reports.
If Cassandra reaches tombstone_failure_threshold
(in cassandra.yaml
, default: 100000) the query processing is stopped and an error is logged by Cassandra. This is still visible to Dovecot as "Operation timed out".
Potential solutions:
tombstone_failure_threshold
If Cassandra reaches read timeout in processing, the query fails with "Operation timed out".
Potential solutions:
page_size
in dovecot-dict-cql.conf.ext
to enable Cassandra paging. This should tell Cassandra servers to send the output in smaller chunks and prevent the timeout. Of course, going through the pages can still take a long time.request_timeout
setting in dovecot-dict-cql.conf.ext
. However, this can't be set higher than read_request_timeout
(or read_request_timeout_in_ms
) in Cassandra server's cassandra.yaml
.To remove the tombstones immediately:
gc_grace_seconds
in the affected Cassandra table to a small enough value. It needs to be lower than the tombstone creation time.nodetool compact
).gc_grace_seconds
back to the original value.It may be useful to shrink gc_grace_seconds
permanently. Its idea is to prevent zombie rows from coming back to life when Cassandra nodes are out of service for a while. However, most installations already see these zombie rows anyway and have ways to handle them.