Package io.mats3
Class MatsInitiator.MatsMessageSendException
java.lang.Object
java.lang.Throwable
java.lang.Exception
io.mats3.MatsInitiator.MatsMessageSendException
- All Implemented Interfaces:
- Serializable
- Enclosing interface:
- MatsInitiator
Will be thrown by the 
MatsInitiator.initiate(InitiateLambda)-method if Mats fails to send the messages
 after the MatsInitiator.InitiateLambda has been run, any external resource (typically DB) has been committed, and
 then some situation occurs that makes it impossible to send out messages. (Some developers might recognize
 this as the "VERY BAD!-initiation" situation).
 
 This is a rare, but unfortunate situation, but which is hard to guard completely against, in particular in the
 "Best Effort 1-Phase Commit" paradigm that the current Mats implementations runs on. What it means, is that if
 you e.g. in the initiate-lambda did some "job allocation" logic on a table in a database, and based on that
 allocation sent out e.g. 5 messages, the job allocation will now have happened, but the messages have
 not actually been sent. The result is that in the database, you will see those jobs as processed
 (semantically "started processing"), but in reality the downstream endpoints never started working on them since
 the message was not actually sent out.
 
 This situation can to a degree be alleviated if you catch this exception, and then use a compensating
 transaction to de-allocate the jobs in the database again. However, since bad things often happen in
 clusters, you might not be able to do the de-allocation either (due to the database becoming inaccessible at the
 same instant - e.g. the reason that the messages could not be set was that the network cable became unplugged, or
 that this node actually lost power at that instant). A way to at least catch when this happens, is to employ a
 state machine to the job allocation logic: First pick jobs for this node by setting the state column of
 job-entries whose state is "UNPROCESSED" to some status like "ALLOCATED" (along with a column of which node
 allocated them (i.e. "hostname" of this node) and a column for timestamp of when they were allocated). In the
 initiator, you pick the jobs that was allocated to this node, set the status to "SENT" and send the outgoing
 messages. Finally, in the terminator endpoint (which you specify in the initiation), you set the status to
 "DONE". Then you make a health check: Assuming that in normal conditions such jobs should always be processed in
 seconds, you make a health check that scans the table for rows which have been in the "ALLOCATED" or "SENT"
 status for e.g. 15 minutes: Such rows are very suspicious, and should be checked up by humans. Sitting in
 "ALLOCATED" status would imply that the node that allocated the job went down (and has not (yet) come back up)
 before it managed to initiate the messages, while sitting in "SENT" would imply that the message had started, but
 not gotten through the processing: Either that message flow sits in a downstream Dead Letter Queue due to some
 error, or you ended up in the situation explained here: The database commit went through, but the messages was
 not sent.
 
 Please note that this should, in a somewhat stable operations environment, happen extremely seldom: What needs to
 occur for this to happen, is that in the sliver of time between the commit of the database and the commit of the
 message broker, this node crashes, the network is lost, or the message broker goes down. Given that a check for
 broker liveliness is performed right before the database commit, that time span is very tight. But to make the
 most robust systems that can monitor themselves, you should consider employing a state machine handling as
 outlined above. You might never see that health check trip, but now you can at least sleep without thinking about
 that 1 billion dollar order that was never processed.
 
 PS: Best effort 1PC: Two transactions are opened: one for the message broker, and one for the database. The
 business logic and possibly database reads and changes are performed. The database is committed first, as that
 has many more failure scenarios than the message systems, e.g. data or code problems giving integrity constraint
 violations, and spurious stuff like MS SQL's deadlock victim, etc. Then the message queue is committed, as the
 only reason for the message broker to not handle a commit is basically that you've had infrastructure problems
 like connectivity issues or that the broker has crashed.
 
 Notice that it has been decided to not let this exception extend the MatsInitiator.MatsBackendException, even though it
 is definitely a backend problem. The reason is that it in all situations where MatsBackendException is
 raised, the other resources have not been committed yet, as opposed to situations where this
 MatsMessageSendException is raised. Luckily, in this time and age, we have multi-exception catch blocks
 if you want to handle both the same.- See Also:
- 
Constructor SummaryConstructorsConstructorDescriptionMatsMessageSendException(String message) MatsMessageSendException(String message, Throwable cause) 
- 
Method SummaryMethods inherited from class java.lang.ThrowableaddSuppressed, fillInStackTrace, getCause, getLocalizedMessage, getMessage, getStackTrace, getSuppressed, initCause, printStackTrace, printStackTrace, printStackTrace, setStackTrace, toString
- 
Constructor Details- 
MatsMessageSendException
- 
MatsMessageSendException
 
-