The complete offer gets rejected, if one matcher runs into a timeout.
Remove custom ctor, that translates from Seq to Map.
LaunchQueue.notifyInstanceUpdate should not send queue info in return, since this is very expensive and not needed
Log debug messages on debug.
It does not make sense to start an offer match, if there is minimal time left.
Details
- Reviewers
• jasongilanfarr meichstedt - Commits
- rMARATHON7c5eaf4a0c44: Handle OfferMatching timeouts as well as some performance improvements
rMARATHON8d821960fbed: Handle OfferMatching timeouts as well as some performance improvements
rMARATHONb72d3f03c392: Handle OfferMatching timeouts as well as some performance improvements
Launch Simulator and start a simple app with 50K tasks
Launch Simulator and start 5000 apps with one task
Diff Detail
- Repository
- rMARATHON marathon
- Lint
Automatic diff as part of commit; lint not applicable. - Unit
Automatic diff as part of commit; unit tests not applicable.
Build has FAILED
Link to build: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/620/
See console output for more information: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/620/console
Build has FAILED
Link to build: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/627/
See console output for more information: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/627/console
Mostly questions.
I'd like to see the test plan include full scale testing with caps, e.g.:
1 app max # of instances
N apps with 1, 10, 100, 1000 instances per app
Largest group able to be deployed with 1, 10, 100, 1000 instances per app
src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueue.scala | ||
---|---|---|
80 | Out of context for this review, but shouldn't this come on event bus or a Source that we give to the queue so it can be informed? Seems like we could do better here basically. | |
src/main/scala/mesosphere/marathon/core/matcher/base/util/ActorOfferMatcher.scala | ||
26 | Should the ask also be using the deadline? | |
43 | Seems a little short to me considering we still have to send the match to mesos right? | |
src/main/scala/mesosphere/marathon/core/task/update/impl/steps/NotifyLaunchQueueStepImpl.scala | ||
21–22 | Why not Done instead of a Future? |
src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueue.scala | ||
---|---|---|
80 | Hm, risky. This function is called from within the taskStatusUpdateSteps which are processed sequentially: ContinueOnErrorStep(notifyHealthCheckManagerStepImpl), ContinueOnErrorStep(notifyRateLimiterStepImpl), ContinueOnErrorStep(notifyLaunchQueueStepImpl), ContinueOnErrorStep(taskStatusEmitterPublishImpl), ContinueOnErrorStep(postToEventStreamStepImpl), ContinueOnErrorStep(scaleAppUpdateStepImpl) If the notifyLaunchQueueStepImpl returns immediately without waiting until the TaskLauncherActor has updated its internal state, there might be a race at least with the scaleAppUpdateStepImpl. This will ask the launchQueue for it's state and react on that. The message delivery from two different senders is not guaranteed to be in order, so if the TaskLauncherActor receives the ask from the SchedulerActor before the update message, it will respond with incorrect values. That whole construct is not very nice and we can do better; for now I think the future is the way to go ;) | |
src/main/scala/mesosphere/marathon/core/launchqueue/impl/LaunchQueueDelegate.scala | ||
34–35 | See above, I'm afraid we have to ask :| |
Thanks!
src/main/scala/mesosphere/marathon/core/launcher/InstanceOpFactory.scala | ||
---|---|---|
34 | This is another candidate, but this can be cleaned up ins a subsequent patch |
Build is green https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/672/ for more details.
src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueue.scala | ||
---|---|---|
80 | That's probably a perfect example for a Stream. | |
src/main/scala/mesosphere/marathon/core/matcher/base/util/ActorOfferMatcher.scala | ||
26 | Yes it does | |
43 | This is an optimization. If the match will take longer, a timeout is raised and interpreted as empty match. This does not take time to send to mesos into account. |
This is another candidate, but this can be cleaned up ins a subsequent patch