- Every synchronous methods in the LaunchQueue has now it's async counterpart.
- Replaced calling LaunchQueue sync methods with their respective async counterparts in deployment actors so they don't block any more.
- Fixed relevant deployment tests
- Added a supervision strategy to DeploymentActor that restart child actors even on ActorInitializationException
Details
- Reviewers
jeschkies unterstein timcharper jdef meichstedt jenkins - Commits
- rMARATHONae563c83d524: Added async methods to LaunchQueueDelegate interface and used them in…
rMARATHON541db8de8944: Added async methods to LaunchQueueDelegate interface and used them in…
rMARATHONa5460bbedf09: Added async methods to LaunchQueueDelegate interface and used them in…
rMARATHON8dddeedbf963: Added async methods to LaunchQueueDelegate interface and used them in…
rMARATHON97c751e68822: Added async methods to LaunchQueueDelegate interface and used them in…
rMARATHON23df6002da8b: Async LaunchQueue methods are now used in deployments
rMARATHON8e7e5309d219: Added async methods to LaunchQueueDelegate interface and used them in…
rMARATHON5bb5e39ca471: Added async methods to LaunchQueueDelegate interface and used them in…
rMARATHONbb618473da0d: Added async methods to LaunchQueueDelegate interface and used them in… - JIRA Issues
- JIRA MARATHON-7358 Deployments performance degradation in 1.4
JIRA MARATHON-7472 Deployment never ends due to a failure in TaskStartActor initialisation
unit
Diff Detail
- Repository
- rMARATHON marathon
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueue.scala | ||
---|---|---|
51 | Not that I know of. I could create one, but since this part (along with deployments) is probably part of Action/v3/what ever, this ticket would never be touched. | |
src/main/scala/mesosphere/marathon/core/task/tracker/impl/InstanceTrackerDelegate.scala | ||
46 | No, it conflicts with implicit ec: ExecutionContext used below. | |
src/test/scala/mesosphere/marathon/core/deployment/impl/TaskStartActorTest.scala | ||
35 | No, that's a different promise - DeploymentActor becomes this one and completes it when the deployment is done. It should be also Promise[Done] but that has nothing to do with current diff. I can make it a separate one. |
src/main/scala/mesosphere/marathon/core/deployment/impl/DeploymentActor.scala | ||
---|---|---|
5 | eh? | |
64 | maybe check for NonFatal instead of catching everything here? | |
src/main/scala/mesosphere/marathon/core/task/tracker/impl/InstanceTrackerDelegate.scala | ||
46 | I thought we were supposed to standardize on the marathon core async global execution context in order for our MDC/actions context propagation to work properly. this looks like a red flag. what am I missing? | |
src/test/scala/mesosphere/marathon/core/deployment/impl/TaskStartActorTest.scala | ||
255 | NonFatal? |
Error message:
Stage Compile and Test failed.
(๑′°︿°๑)
Error message:
Stage Compile and Test failed.
(๑′°︿°๑)
- Fourth round of feedback
src/main/scala/mesosphere/marathon/core/task/tracker/impl/InstanceTrackerDelegate.scala | ||
---|---|---|
46 | Everbody is using our custom global execution context (linter rejects the scala's one) but if you import it here at the top of the file you'll get: [error] /Users/adukhovniy/mesosphere/marathon/src/main/scala/mesosphere/marathon/core/task/tracker/impl/InstanceTrackerDelegate.scala:33: ambiguous implicit values: [error] both value ec of type scala.concurrent.ExecutionContext [error] and value global in object ExecutionContexts of type => scala.concurrent.ExecutionContext ... |
Error message:
Stage Compile and Test failed.
(๑′°︿°๑)
- Adding debug test logging to figure out AppDeployIntegrationTest failures and some minor improvements
src/main/scala/mesosphere/marathon/core/deployment/impl/TaskReplaceActor.scala | ||
---|---|---|
135–145 | On a second thought - I realized yesterday why I didn't make the method completely async in the first place: method changes instancesStarted (actor's state) and doing it from an async block will create a race condition. I could make it Atomic of course but prefer not to. Reverted to first implementation. |
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://downloads.mesosphere.io/marathon/snapshots/marathon-1.5.0-SNAPSHOT-594-g197d5f5.tgz", "sha1"" "48c0dfed54da6f430f7bf0b8603b79c171832f59"
\\ ٩( ᐛ )و //
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://downloads.mesosphere.io/marathon/snapshots/marathon-1.5.0-SNAPSHOT-594-g197d5f5.tgz", "sha1"" "6ffd649de7558f44756cbd481c706c1dae1e8f73"
\\ ٩( ᐛ )و //
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://downloads.mesosphere.io/marathon/snapshots/marathon-1.5.0-SNAPSHOT-594-g197d5f5.tgz", "sha1"" "9f71f447c17bc3ec8d30c632a41d4d8d24e40e23"
\\ ٩( ᐛ )و //
src/main/scala/mesosphere/marathon/core/deployment/impl/TaskReplaceActor.scala | ||
---|---|---|
133 | thanks for this |
I don't know if that's the extra logging (on DEBUG level) but AppDeployIntegrationTest doesn't fail anymore :(
src/main/scala/mesosphere/marathon/core/deployment/impl/TaskReplaceActor.scala | ||
---|---|---|
135–145 | Hm, Why is it return a Future[Done]? It will alway finished immediately without waiting for the addAsync. Is this what we want? |
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://downloads.mesosphere.io/marathon/snapshots/marathon-1.5.0-SNAPSHOT-594-g197d5f5.tgz", "sha1"" "5007fd2138789f182ef08ef100479a026db22ebe"
\\ ٩( ᐛ )و //
src/main/scala/mesosphere/marathon/core/deployment/impl/TaskReplaceActor.scala | ||
---|---|---|
135–145 | You're right. Should be if else. Done |
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://downloads.mesosphere.io/marathon/snapshots/marathon-1.5.0-SNAPSHOT-594-g197d5f5.tgz", "sha1"" "8cc026e0597181c28596b4bad9eeac0a14a04c42"
\\ ٩( ᐛ )و //
Error message:
Stage Compile and Test failed.
(๑′°︿°๑)
It seems like at least a few tests AppDeployIntegrationTest are flaky. I'll keep iterating because I don't want to introduce new flaky tests. I'll set up a test loop here: Marathon/job/marathon-sandbox/job/marathon-loop-ad-async-launch-queue/ and will gather statistics tomorrow.
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://downloads.mesosphere.io/marathon/snapshots/marathon-1.5.0-SNAPSHOT-594-g197d5f5.tgz", "sha1"" "d318228b94e56e12b6d29295d5b0fe06ff2cf091"
\\ ٩( ᐛ )و //
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://downloads.mesosphere.io/marathon/snapshots/marathon-1.5.0-SNAPSHOT-594-g197d5f5.tgz", "sha1"" "46123f8ea8cd73c0af885cd84d812e08e553fe89"
\\ ٩( ᐛ )و //
Error message:
Stage Compile and Test failed.
(๑′°︿°๑)
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://downloads.mesosphere.io/marathon/snapshots/marathon-1.5.0-SNAPSHOT-594-g197d5f5.tgz", "sha1"" "7384a0a6f54a84f0aa17db10277f6ecdd3e5d8d0"
\\ ٩( ᐛ )و //
Error message:
Stage Compile and Test failed.
(๑′°︿°๑)
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://downloads.mesosphere.io/marathon/snapshots/marathon-1.5.0-SNAPSHOT-594-g197d5f5.tgz", "sha1"" "ce64d62191fe40872e314c699b4333681ee29642"
\\ ٩( ᐛ )و //
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://downloads.mesosphere.io/marathon/snapshots/marathon-1.5.0-SNAPSHOT-618-g48bfffd.tgz", "sha1": "ad5182db4e3ff2d1e3084fc9dde27e2eef9817c2"
Is async required? Does currentInstances.count take long?