Rework DeploymentManager logic
ClosedAll Users
Actions

Authored by zen-dog on Nov 18 2016, 5:53 PM.

Details

Reviewers

aquamatthias
meichstedt

Commits

rMARATHON95969d4ec96d: DeploymentManagers handles leader abdication correctly
rMARATHONa8bac97b8bf0: Rework DeploymentManager logic
rMARATHON3edb7b76bb6c: Remove SchedulerDriver from the PerformDeployment message
rMARATHON1a4d91cddd63: Rework DeploymentManager logic
rMARATHON07a96bff38a4: Fixed small typo in the DeploymentManagerTest
rMARATHON77480be2f189: Implemented part of the review change requests
rMARATHON0757f9ae7cfc: Rework DeploymentManager logic
rMARATHON6ac141df7dff: Remove SchedulerDriver from the PerformDeployment message
rMARATHONc571775db076: Fix MarathonSchedulerActor "Cancellation timeout" test
rMARATHON1b3b8ee4dd79: Fix MarathonSchedulerActor "Cancellation timeout" test
rMARATHONfcf377b9f45f: Rework DeploymentManager logic
rMARATHON0512601d1f99: DeploymentManagers handles leader abdication correctly
rMARATHON25c3b470fae0: Fix MarathonSchedulerActor "Cancellation timeout" test
rMARATHON4ff8a81fb223: Rework DeploymentManager logic
rMARATHON902525004ad1: Fix MarathonSchedulerActor "Cancellation timeout" test
rMARATHONe21dfeab7024: Rebased on master and implemented review change requests
rMARATHON11769f986b47: Fixed small typo in the DeploymentManagerTest
rMARATHON5ef326a717de: Rebased on master and implemented review change requests
rMARATHON29a981004e3c: Rework DeploymentManager logic
rMARATHONe0aa5d87671c: Fix MarathonSchedulerActor "Cancellation timeout" test
rMARATHONe7b3f8652e08: Fix MarathonSchedulerActor "Cancellation timeout" test
rMARATHON50c5b624725c: Implemented part of the review change requests
rMARATHONba74e901da0b: Rework DeploymentManager logic

JIRA Issues

JIRA MARATHON_EE-1042 Move deployment and logic out of MarathonSchedulerActor -> DeploymentManager

Summary

Rework DeploymentManager logic

Deployments are started with the new StartDeployment message
Removed MarathonSchedulerActor "awaitCancelation" logic - forced deployments are now completely handled by the DeploymentManager
DeploymentManager has it's own deployment repository reference and makes all the repository operations
DeploymentManager also handles a failed deployment when it's locked (force = false) through StartDeployment message
Fixed unit tests to reflect the changes and added new tests
Correct handling of leader abdication/reelection. DeploymentManager now goes into suspended receive mode where it ignores all the messages except for "RecoverDeployments" which signals leader election.
Added documentation on deployment manager message flow
Removed deployment repository reference from MarathonSchedulerActor

Test Plan

unit-test, integration tests

Diff Detail

Repository

rMARATHON marathon

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

There are a very large number of changes, so older changes are hidden. Show Older Changes

src/main/scala/mesosphere/marathon/MarathonSchedulerActor.scala
264–265	I left it there because this way the `even publishing` happens strictly after the lock is removed. Otherwise there is no guarantee for that.

Implemented review change requests

Minor improvement in the MarathonSchedulerActor logic

src/main/scala/mesosphere/marathon/MarathonSchedulerActor.scala
246–247	It is.
263–264	I simplified the calling logic and added a comment to it.

src/main/scala/mesosphere/marathon/MarathonSchedulerActor.scala
263–264	Oversimplified. Now this is possible: Deployment A starts --> lock is acquired by A Deployment B starts with force=true --> lock is acquired by B Deployment A fails --> this will remove the lock acquired by B Deployment B now is in progress and no lock for all apps of A and B is available.

This revision now requires changes to proceed.Nov 29 2016, 2:24 PM

Fixed incorrect lock handling in the MarathonSchedulerActor

src/main/scala/mesosphere/marathon/MarathonSchedulerActor.scala
263–264	Right. Thanks for the catch. I've made `lockedRunSpecs` a `Map[PathId, Int]` to save the number of locks being held. This makes it better but the solution is not very robust. Since DeploymentManager has all the information about running deployments I would suggest removing the locks from `MarathonSchedulerActor` completely and ask DeploymentManager instead. This should be fairly simple to implement but I would wait with it until our ITs are green again.

Thanks. Mostly minor comments.

src/main/scala/mesosphere/marathon/MarathonSchedulerActor.scala
52	According to the code, this comment is misleading: a lock is acquired if deployment is started a lock is acquired if a kill operation is executed a lock is acquired if a scale operation is executed This basically means: a kill/scale operation should not be performed, while a deployment is in progress a deployment should not be started, if a scale/kill operation is in progress Asking the deployment manager would not be sufficient.
254	Please add a sentence to force: If a deployment is forced the new deployment will be started the old deployment will be cancelled and release all claimed locks only in this case, one RunSpec can have 2 locks!
src/main/scala/mesosphere/marathon/upgrade/DeploymentManager.scala
150	The DeploymentManager is started with Marathon even if we are not elected as leader. That is the reason why this is an Option. You can not call get safely here.
193–198	Just for clarity: please add: `if !isScheduledDeployment(plan.id)`
323	You not only mark this deployment, but also start a deployment actor. Suggestion: `startDeployment`??
src/test/scala/mesosphere/marathon/MarathonSchedulerActorTest.scala
509–510	yes.

This revision now requires changes to proceed.Dec 1 2016, 11:23 AM

Rebased on master and implemented review change requests
Handled one case where DeploymentManager should send MarathonSchedulerActor a notification about canceled deployment
Removed SchedulerDriver from the DeploymentManager since it wasn't really needed anywhere
Removed "Cancellation timeout" MarathonSchedulerActor test since it wasn't needed anymore
Fixed some comments

aquamatthias added a subscriber: jenkins.Dec 5 2016, 9:55 AM

Retest this please

Build has FAILED

Link to build: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/474/
See console output for more information: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/474/console

Yet another rebase

Build has FAILED

Link to build: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/488/
See console output for more information: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/488/console

@zen-dog Thanks. A step in the right direction. This definitively needs further cleanup.

This revision is now accepted and ready to land.Dec 5 2016, 3:32 PM

Fixed scapegoat warnings

Build has FAILED

Link to build: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/491/
See console output for more information: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/491/console

Build has FAILED

Link to build: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/492/
See console output for more information: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/492/console

Build has FAILED

Link to build: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/496/
See console output for more information: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/496/console

Build has FAILED

Link to build: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/501/
See console output for more information: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/501/console

Fixed small typo in the DeploymentManagerTest
Implemented part of the review change requests
Rebased on master and implemented review change requests

Build has FAILED

Link to build: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/504/
See console output for more information: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/504/console

src/main/scala/mesosphere/marathon/MarathonSchedulerActor.scala
86	The name is not clear imo. What the message does is basically a GET, although I understand that it should only be used when trying to continue deployments that have been suspended because of a loss of leadership, which should be reflected in the name of this message. What about `RetrieveDeployments` or `RetrieveSuspendedDeployments`? The associated response should be named accordingly, `DeploymentsRecovered` is not good name either (imo).
88	See above: not a good name. `DeploymentsRecovered` implies that deployments have been recovered which is not the case here. Rather `DeploymentsToRecover` or `SuspendedDeployments` or something like that.
91	Not introduced with this diff, but why do we send from `deadLetters`?
223	not high prio for this PR, but couldn't we extract all that lock stuff into something that only handles those, thus moving all of these helpers out of this actor?
src/main/scala/mesosphere/marathon/upgrade/DeploymentManager.scala
213	If storing the plan fails, it is still marked as scheduled. It should only be marked as scheduled if storing succeeded, otherwise the actor's state and the repo are out of sync. I also wonder why you wait for the plan to be stored and then send a message to self. Wouldn't it be clearer to store new plan and map into that future with `self ! LaunchDeploymentActor(plan, recipient)` Do the internal state mutation on receiving that message, after the plan has been persisted: case LaunchDeploymentActor(plan, recipient) => { markScheduled(plan) origSender ! DeploymentStarted(plan) startDeployment(plan, origSender) ?
233	It's easier to understand that way now, but it's easy to mess up with permutations in the future. The current state is exhaustive wrt `StartDeployment`. If one of these cases is slightly changed, there are chances the the PF doesn't catch certain messages anymore. How about catching the message in 1 case and calling distinct methods from there? Note all 3 cases will compute conflicting deployments to check whether the case matches, and in 2 of them the conflicting deployments will be computed again in the function body. When matching once you can store the result of this computation. Also, case matches with more than 2 lines prevent readability imo.
253	Same as above: the internal state is changed before persisting the change. please persist and then change state.

This revision now requires changes to proceed.Dec 6 2016, 2:11 PM

src/main/scala/mesosphere/marathon/upgrade/DeploymentManager.scala
213	Consider following scenario with two deployments (same pathId) A and B where B is forced and ZK storing is slow: StartDeployment A is coming in A is stored (wait for store to succeed) StartDeployment B is coming in and is forced B is stored too since it's not in the internal state yet and therefore there are no conflicts LaunchDeploymentActor A comes in and proceeds LaunchDeploymentActor B comes in and... proceeds too! The whole point of saving deployments in the internal state first was to avoid such conflicts. The case where storing fails is not handled (and wasn't handled in the previous version) since then master fails and the new master picks it up from there with a new state. This feels like a longer discussion so maybe we should sit down together and talk about it.

src/main/scala/mesosphere/marathon/upgrade/DeploymentManager.scala
213	I don't see why starting DeploymentActor B couldn't mean stopping DeploymentActor A when DeploymentPlan B overrules A. But I see that this probably complicates things on another level, I'm open for discussion. It might be that failing futures re persisting the plans wasn't handled before, but that's no excuse :P And no – Marathon does not fail over just because of a failed write. It fails over when the zk connection is lost, not when a read or write operation fails. This should definitely be handled here, no matter in which order the steps are performed.
219	could we stop doing that then? if there is one place in the code where we send with `deadletters` as sender, we basically need to check for noSender everywhere, as it's really hard to track `originalSender`'s throughout the codebase.

Implemented basic recovery from failing repository operations

Build has FAILED

Link to build: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/582/
See console output for more information: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/582/console

Build has FAILED

Link to build: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/583/
See console output for more information: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/583/console

Rebased and removed an unused import

Build has FAILED

Link to build: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/584/
See console output for more information: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/584/console

Fixed scapegoat async/await warnings

Build has FAILED

Link to build: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/586/
See console output for more information: https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/586/console

Fixed scaladoc error

Build is green https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/588/ for more details.

Thanks, just 3 super mini comments :)

src/main/scala/mesosphere/marathon/upgrade/DeploymentManager.scala
249	`NonFatal(e)`
266	`NonFatal(e)`
306	`NonFatal(e)`

This revision now requires changes to proceed.Dec 8 2016, 4:06 PM

Implemented feedback: catching NonFatal instead of Throwable

This comment was removed by zen-dog.

Thanks!

This revision is now accepted and ready to land.Dec 8 2016, 4:42 PM

Build is green https://jenkins.mesosphere.com/service/jenkins/job/public-test-marathon-phabricator/594/ for more details.

Closed by commit rMARATHON1a4d91cddd63: Rework DeploymentManager logic (authored by zen-dog). · Explain WhyDec 8 2016, 4:58 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

			Path	Packages
M			src/main/scala/mesosphere/marathon/MarathonModule.scala (4 lines)
M			src/main/scala/mesosphere/marathon/MarathonSchedulerActor.scala (188 lines)
M			src/main/scala/mesosphere/marathon/upgrade/AppStartActor.scala (5 lines)
M			src/main/scala/mesosphere/marathon/upgrade/DeploymentActor.scala (12 lines)
M			src/main/scala/mesosphere/marathon/upgrade/DeploymentManager.scala (428 lines)
M			src/main/scala/mesosphere/marathon/upgrade/StartingBehavior.scala (4 lines)
M			src/main/scala/mesosphere/marathon/upgrade/TaskReplaceActor.scala (5 lines)
M			src/main/scala/mesosphere/marathon/upgrade/TaskStartActor.scala (7 lines)
M			src/test/scala/mesosphere/marathon/MarathonSchedulerActorTest.scala (62 lines)
M			src/test/scala/mesosphere/marathon/integration/AppDeployWithLeaderAbdicationIntegrationTest.scala (6 lines)
M			src/test/scala/mesosphere/marathon/upgrade/AppStartActorTest.scala (4 lines)
M			src/test/scala/mesosphere/marathon/upgrade/DeploymentActorTest.scala (14 lines)
M			src/test/scala/mesosphere/marathon/upgrade/DeploymentManagerTest.scala (173 lines)
M			src/test/scala/mesosphere/marathon/upgrade/TaskReplaceActorTest.scala (8 lines)
M			src/test/scala/mesosphere/marathon/upgrade/TaskStartActorTest.scala (8 lines)

Diff	ID	Base	Description	Created	Lint	Unit
Base			Base
Diff 1	672	f6fccc3		Nov 18 2016, 5:53 PM	★	★
Diff 2	812	a019255	Rebased and implemented review changes	Nov 28 2016, 6:18 PM	★	★
Diff 3	815	a019255	Implemented review change requests	Nov 28 2016, 8:44 PM	★	★
Diff 4	883	a019255		Nov 30 2016, 11:59 AM	★	★
Diff 5	1037	aef5d19	- Fixed small typo in the DeploymentManagerTest	Dec 4 2016, 8:13 PM	★	★
Diff 6	1048	aef5d19	Retest this please	Dec 5 2016, 12:03 PM	★	★
Diff 7	1058	61a0511	Yet another rebase	Dec 5 2016, 2:56 PM	★	★
Diff 8	1062	61a0511	Fixed scapegoat warnings	Dec 5 2016, 3:41 PM	★	★
Diff 9	1074	2cda52a	- Fixed small typo in the DeploymentManagerTest	Dec 5 2016, 7:49 PM	★	★
Diff 10	1146	4a19a86	- Implemented basic recovery from failing repository operations	Dec 8 2016, 2:29 AM	★	★
Diff 11	1147	ffd6bc6	Rebased and removed an unused import	Dec 8 2016, 11:00 AM	★	★
Diff 12	1149	ffd6bc6	- Fixed scapegoat async/await warnings	Dec 8 2016, 11:34 AM	★	★
Diff 13	1150	ffd6bc6	Fixed scaladoc error	Dec 8 2016, 12:25 PM	★	★
Diff 14	1158	ffd6bc6	Implemented feedback: catching NonFatal instead of Throwable	Dec 8 2016, 4:38 PM	★	★
Diff 15	1160	4c83608	rMARATHON1a4d91cddd636d86f0b837482dd1c56a965ef012	Dec 8 2016, 4:58 PM	★	★

Commit	Tree	Parents	Author	Summary	Date
4fdbfc94c66f	9984bc705482	e21dfeab7024	Aleksey Dukhovniy	Fixed scapegoat async/await warnings	Dec 8 2016, 11:34 AM
e21dfeab7024	b2c829574a49	77480be2f189	Aleksey Dukhovniy	Rebased on master and implemented review change requests (Show More…)	Nov 29 2016, 4:01 PM
77480be2f189	23beda8ea035	11769f986b47	Aleksey Dukhovniy	Implemented part of the review change requests (Show More…)	Nov 28 2016, 5:48 PM
11769f986b47	b94b8dbffb99	e7b3f8652e08	Aleksey Dukhovniy	Fixed small typo in the DeploymentManagerTest	Nov 27 2016, 5:31 PM
e7b3f8652e08	334553cef4c9	95969d4ec96d	Aleksey Dukhovniy	Fix MarathonSchedulerActor "Cancellation timeout" test (Show More…)	Nov 18 2016, 5:25 PM
95969d4ec96d	31e75148954a	4ff8a81fb223	Aleksey Dukhovniy	DeploymentManagers handles leader abdication correctly (Show More…)	Nov 18 2016, 12:37 AM
4ff8a81fb223	d9ba6a344c12	ba74e901da0b	Aleksey Dukhovniy	Rework DeploymentManager logic (Show More…)	Nov 17 2016, 4:28 PM
ba74e901da0b	60c02490eabf	3edb7b76bb6c	Aleksey Dukhovniy	Rework DeploymentManager logic (Show More…)	Nov 10 2016, 3:53 PM
3edb7b76bb6c	ae82edf1206a	25c3b470fae0	Aleksey Dukhovniy	Remove SchedulerDriver from the PerformDeployment message (Show More…)	Nov 8 2016, 10:22 AM
25c3b470fae0	8b0f8cf82558	ffd6bc625015	Aleksey Dukhovniy	Fix MarathonSchedulerActor "Cancellation timeout" test	Nov 6 2016, 3:24 PM

Diff 1160

View Options

src/main/scala/mesosphere/marathon/MarathonModule.scala

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Line(s)
141	141	new DeploymentManager(
142	142	instanceTracker,
143	143	killService,
144	144	launchQueue,
145	145	schedulerActions,
146	146	storage,
147	147	healthCheckManager,
148	148	eventBus,
149		readinessCheckExecutor
	149	readinessCheckExecutor,
	150	deploymentRepository
150	151	)
151	152	)
152	153	}
153	154
154	155	system.actorOf(
155	156	MarathonSchedulerActor.props(
156	157	createSchedulerActions,
157	158	deploymentManagerProps,
158	159	historyActorProps,
159		deploymentRepository,
160	160	healthCheckManager,
161	161	killService,
162	162	launchQueue,
163	163	driverHolder,
164	164	electionService,
165	165	eventBus
166	166	)(mat).withRouter(RoundRobinPool(nrOfInstances = 1, supervisorStrategy = supervision)),
167	167	"MarathonScheduler")
Show All 28 Lines

View Options

src/main/scala/mesosphere/marathon/MarathonSchedulerActor.scala

1	1		package mesosphere.marathon
2	2
3			import java.util.concurrent.TimeoutException
4
5	3		import akka.actor._
6	4		import akka.event.{ EventStream, LoggingReceive }
7			import akka.pattern.ask
8	5		import akka.stream.Materializer
9	6		import mesosphere.marathon.MarathonSchedulerActor.ScaleRunSpec
10	7		import mesosphere.marathon.core.election.{ ElectionService, LocalLeadershipEvent }
11	8		import mesosphere.marathon.core.event.{ AppTerminatedEvent, DeploymentFailed, DeploymentSuccess }
12	9		import mesosphere.marathon.core.health.HealthCheckManager
13	10		import mesosphere.marathon.core.instance.Instance
14	11		import mesosphere.marathon.core.instance.Instance.AgentInfo
15	12		import mesosphere.marathon.core.launchqueue.LaunchQueue
16	13		import mesosphere.marathon.core.task.Task
17	14		import mesosphere.marathon.core.task.termination.{ KillReason, KillService }
18	15		import mesosphere.marathon.core.task.tracker.InstanceTracker
19	16		import mesosphere.marathon.state.{ PathId, RunSpec }
20			import mesosphere.marathon.storage.repository.{ DeploymentRepository, GroupRepository, ReadOnlyAppRepository, ReadOnlyPodRepository }
	17		import mesosphere.marathon.storage.repository.{ GroupRepository, ReadOnlyAppRepository, ReadOnlyPodRepository }
21	18		import mesosphere.marathon.stream._
22	19		import mesosphere.marathon.upgrade.DeploymentManager._
23	20		import mesosphere.marathon.upgrade.{ DeploymentManager, DeploymentPlan, ScalingProposition }
24	21		import mesosphere.mesos.Constraints
25	22		import org.apache.mesos
26	23		import org.apache.mesos.Protos.{ Status, TaskState }
27	24		import org.apache.mesos.SchedulerDriver
28	25		import org.slf4j.LoggerFactory
29	26
30	27		import scala.async.Async.{ async, await }
31			import scala.concurrent.duration._
32	28		import scala.concurrent.{ ExecutionContext, Future }
33	29		import scala.util.control.NonFatal
34	30		import scala.util.{ Failure, Success, Try }
35	31
36	32		class LockingFailedException(msg: String) extends Exception(msg)
37	33
38	34		class MarathonSchedulerActor private (
39	35		createSchedulerActions: ActorRef => SchedulerActions,
40	36		deploymentManagerProps: SchedulerActions => Props,
41	37		historyActorProps: Props,
42			deploymentRepository: DeploymentRepository,
43	38		healthCheckManager: HealthCheckManager,
44	39		killService: KillService,
45	40		launchQueue: LaunchQueue,
46	41		marathonSchedulerDriverHolder: MarathonSchedulerDriverHolder,
47	42		electionService: ElectionService,
48			eventBus: EventStream,
49			cancellationTimeout: FiniteDuration = 1.minute)(implicit val mat: Materializer) extends Actor
	43		eventBus: EventStream)(implicit val mat: Materializer) extends Actor
50	44		with ActorLogging with Stash {
51	45		import context.dispatcher
52	46		import mesosphere.marathon.MarathonSchedulerActor._
53	47
54			var lockedRunSpecs = Set.empty[PathId]
	48		/**
	49		* About locks:
	50		* - a lock is acquired if deployment is started
	51		* - a lock is acquired if a kill operation is executed
	52		* - a lock is acquired if a scale operation is executed
		aquamatthiasUnsubmitted Done According to the code, this comment is misleading: a lock is acquired if deployment is started a lock is acquired if a kill operation is executed a lock is acquired if a scale operation is executed This basically means: a kill/scale operation should not be performed, while a deployment is in progress a deployment should not be started, if a scale/kill operation is in progress Asking the deployment manager would not be sufficient.
	53		*
	54		* This basically means:
	55		* - a kill/scale operation should not be performed, while a deployment is in progress
	56		* - a deployment should not be started, if a scale/kill operation is in progress
	57		* Since multiple conflicting deployment can be handled at the same time lockedRunSpecs saves
	58		* the lock count for each affected PathId. Lock is removed if lock count == 0.
	59		*/
	60		// TODO (AD): DeploymentManager has already all the information about running deployments.
	61		// MarathonSchedulerActor should only save the locks resulting from scale and kill operations,
	62		// asking DeploymentManager for deployment locks.
	63		val lockedRunSpecs = collection.mutable.Map[PathId, Int]().withDefaultValue(0)
55	64		var schedulerActions: SchedulerActions = _
56	65		var deploymentManager: ActorRef = _
57	66		var historyActor: ActorRef = _
58	67		var activeReconciliation: Option[Future[Status]] = None
59	68
60	69		override def preStart(): Unit = {
61	70		schedulerActions = createSchedulerActions(self)
62	71		deploymentManager = context.actorOf(deploymentManagerProps(schedulerActions), "DeploymentManager")
63	72		historyActor = context.actorOf(historyActorProps, "HistoryActor")
64	73
65	74		electionService.subscribe(self)
66	75		}
67	76
68	77		override def postStop(): Unit = {
69	78		electionService.unsubscribe(self)
70	79		}
71	80
72	81		def receive: Receive = suspended
73	82
74	83		def suspended: Receive = LoggingReceive.withLabel("suspended"){
75	84		case LocalLeadershipEvent.ElectedAsLeader =>
76	85		log.info("Starting scheduler actor")
77			deploymentRepository.all().runWith(Sink.seq).onComplete {
	86		deploymentManager ! LoadDeploymentsOnLeaderElection
		meichstedtUnsubmitted Not Done The name is not clear imo. What the message does is basically a GET, although I understand that it should only be used when trying to continue deployments that have been suspended because of a loss of leadership, which should be reflected in the name of this message. What about `RetrieveDeployments` or `RetrieveSuspendedDeployments`? The associated response should be named accordingly, `DeploymentsRecovered` is not good name either (imo).
78			case Success(deployments) => self ! RecoverDeployments(deployments)
79			case Failure(_) => self ! RecoverDeployments(Nil)
80			}
81	87
82			case RecoverDeployments(deployments) =>
	88		case LoadedDeploymentsOnLeaderElection(deployments) =>
		meichstedtUnsubmitted Not Done See above: not a good name. `DeploymentsRecovered` implies that deployments have been recovered which is not the case here. Rather `DeploymentsToRecover` or `SuspendedDeployments` or something like that.
83	89		deployments.foreach { plan =>
84	90		log.info(s"Recovering deployment:\n$plan")
85	91		deploy(context.system.deadLetters, Deploy(plan, force = false))
		meichstedtUnsubmitted Not Done Not introduced with this diff, but why do we send from `deadLetters`?
86	92		}
87	93
88	94		log.info("Scheduler actor ready")
89	95		unstashAll()
90	96		context.become(started)
91	97		self ! ReconcileHealthChecks
92	98
93	99		case LocalLeadershipEvent.Standby =>
94	100		// ignored
95	101		// FIXME: When we get this while recovering deployments, we become active anyway
96	102		// and drop this message.
97	103
98	104		case _ => stash()
99	105		}
100	106
101			def started: Receive = LoggingReceive.withLabel("started")(sharedHandlers orElse {
	107		def started: Receive = LoggingReceive.withLabel("started") {
102	108		case LocalLeadershipEvent.Standby =>
103	109		log.info("Suspending scheduler actor")
104	110		healthCheckManager.removeAll()
105			deploymentManager ! StopAllDeployments
106			lockedRunSpecs = Set.empty
	111		deploymentManager ! ShutdownDeployments
	112		lockedRunSpecs.clear()
107	113		context.become(suspended)
108	114
109	115		case LocalLeadershipEvent.ElectedAsLeader => // ignore
110	116
111	117		case ReconcileTasks =>
112	118		import akka.pattern.pipe
113	119		import context.dispatcher
114	120		val reconcileFuture = activeReconciliation match {
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Line(s)
170	176
171	177		res onComplete { _ =>
172	178		self ! cmd.answer // unlock app
173	179		}
174	180
175	181		res.sendAnswer(origSender, cmd)
176	182		}
177	183		withLockFor(runSpecId) { killTasks() }
178			})
179	184
180			/**
181			* handlers for messages that unlock run specs and to retrieve running deployments
182			*/
183			def sharedHandlers: Receive = {
184	185		case DeploymentFinished(plan) =>
185			lockedRunSpecs --= plan.affectedRunSpecIds
	186		removeLocks(plan.affectedRunSpecIds)
186	187		deploymentSuccess(plan)
187	188
188	189		case DeploymentManager.DeploymentFailed(plan, reason) =>
189			lockedRunSpecs --= plan.affectedRunSpecIds
	190		removeLocks(plan.affectedRunSpecIds)
190	191		deploymentFailed(plan, reason)
191	192
192			case RunSpecScaled(id) => lockedRunSpecs -= id
	193		case RunSpecScaled(id) => removeLock(id)
193	194
194			case TasksKilled(runSpecId, _) => lockedRunSpecs -= runSpecId
	195		case TasksKilled(runSpecId, _) => removeLock(runSpecId)
195	196
196	197		case RetrieveRunningDeployments =>
197	198		deploymentManager forward RetrieveRunningDeployments
198	199		}
199	200
200	201		/**
201			* Waits for all the rnunSpecs affected by @plan to be unlocked
202			* and starts @plan. If it receives a CancellationTimeoutExceeded
203			* message, it will mark the deployment as failed and go into
204			* the started state.
205			*
206			* @param plan The deployment plan we are trying to execute.
207			* @param origSender The original sender of the Deploy message.
208			* @return
209			*/
210			@SuppressWarnings(Array("all")) // async/await
211			def awaitCancellation(plan: DeploymentPlan, origSender: ActorRef, cancellationHandler: Cancellable): Receive =
212			sharedHandlers.andThen[Unit] { _ =>
213			if (tryDeploy(plan, origSender)) {
214			cancellationHandler.cancel()
215			}
216			} orElse {
217			case CancellationTimeoutExceeded =>
218			val reason = new TimeoutException("Exceeded timeout for canceling conflicting deployments.")
219			async { // linter:ignore UnnecessaryElseBranch
220			await(deploymentFailed(plan, reason))
221			origSender ! CommandFailed(Deploy(plan, force = true), reason)
222			unstashAll()
223			context.become(started)
224			}
225			case _ => stash()
226			}
227
228			/**
229			* If all required runSpecs are unlocked, start the deployment,
230			* unstash all messages and put actor in started state
231			*
232			* @param plan The deployment plan that has been sent with force=true
233			* @param origSender The original sender of the deployment
234			*/
235			def tryDeploy(plan: DeploymentPlan, origSender: ActorRef): Boolean = {
236			val affectedRunSpecs = plan.affectedRunSpecIds
237			if (!lockedRunSpecs.exists(affectedRunSpecs)) {
238			deploy(origSender, Deploy(plan, force = false))
239			unstashAll()
240			context.become(started)
241			true
242			} else {
243			false
244			}
245			}
246
247			/**
248	202		* Tries to acquire the lock for the given runSpecIds.
249	203		* If it succeeds it executes the given function,
250	204		* otherwise the result will contain an LockingFailedException.
251	205		*/
252	206		def withLockFor[A](runSpecIds: Set[PathId])(f: => A): Try[A] = {
253	207		// there's no need for synchronization here, because this is being
254	208		// executed inside an actor, i.e. single threaded
255			val conflicts = lockedRunSpecs intersect runSpecIds
256			if (conflicts.isEmpty) {
257			lockedRunSpecs ++= runSpecIds
	209		if (noConflictsWith(runSpecIds)) {
	210		addLocks(runSpecIds)
258	211		Try(f)
259	212		} else {
260	213		Failure(new LockingFailedException("Failed to acquire locks."))
261	214		}
262	215		}
263	216
	217		def noConflictsWith(runSpecIds: Set[PathId]): Boolean = {
	218		val conflicts = lockedRunSpecs.keySet intersect runSpecIds
	219		conflicts.isEmpty
	220		}
	221
	222		def removeLocks(runSpecIds: Set[PathId]): Unit = runSpecIds.foreach(removeLock)
	223		def removeLock(runSpecId: PathId): Unit = {
		meichstedtUnsubmitted Not Done not high prio for this PR, but couldn't we extract all that lock stuff into something that only handles those, thus moving all of these helpers out of this actor?
	224		if (lockedRunSpecs.contains(runSpecId)) {
	225		val locks = lockedRunSpecs(runSpecId) - 1
	226		if (locks <= 0) lockedRunSpecs -= runSpecId else lockedRunSpecs(runSpecId) -= 1
	227		}
	228		}
	229
	230		def addLocks(runSpecIds: Set[PathId]): Unit = runSpecIds.foreach(addLock)
	231		def addLock(runSpecId: PathId): Unit = lockedRunSpecs(runSpecId) += 1
	232
264	233		/**
265	234		* Tries to acquire the lock for the given runSpecId.
266	235		* If it succeeds it executes the given function,
267	236		* otherwise the result will contain an AppLockedException.
268	237		*/
269	238		def withLockFor[A](runSpecId: PathId)(f: => A): Try[A] =
270	239		withLockFor(Set(runSpecId))(f)
271	240
272	241		// there has to be a better way...
273	242		@SuppressWarnings(Array("OptionGet"))
274	243		def driver: SchedulerDriver = marathonSchedulerDriverHolder.driver.get
275	244
276	245		def deploy(origSender: ActorRef, cmd: Deploy): Unit = {
277	246		val plan = cmd.plan
278			val ids = plan.affectedRunSpecIds
	247		val runSpecIds = plan.affectedRunSpecIds
		aquamatthiasUnsubmitted Done We try and that can fail. So the force flag is not evaluated in the DeploymentManager?
		zen-dogAuthorUnsubmitted Not Done It is.
279	248
280			val res = withLockFor(ids) {
281			deploy(driver, plan)
	249		// If there are no conflicting locks or the deployment is forced we lock passed runSpecIds.
	250		// Afterwards the deployment plan is sent to DeploymentManager. It will take care of cancelling
	251		// conflicting deployments, scheduling new one or (if there were conflicts but the deployment
	252		// is not forced) send to the original sender and AppLockedException with conflicting deployments.
	253		//
	254		// If a deployment is forced (and there exists an old one):
		aquamatthiasUnsubmitted Done Please add a sentence to force: If a deployment is forced the new deployment will be started the old deployment will be cancelled and release all claimed locks only in this case, one RunSpec can have 2 locks!
	255		// - the new deployment will be started
	256		// - the old deployment will be cancelled and release all claimed locks
	257		// - only in this case, one RunSpec can have 2 locks
	258		if (noConflictsWith(runSpecIds) \|\| cmd.force) {
	259		addLocks(runSpecIds)
282	260		}
283
284			res match {
285			case Success(f) =>
286			f.map(_ => if (origSender != Actor.noSender) origSender ! cmd.answer)
287			case Failure(e: LockingFailedException) if cmd.force =>
288			deploymentManager ! CancelConflictingDeployments(plan)
289			val cancellationHandler = context.system.scheduler.scheduleOnce(
290			cancellationTimeout,
291			self,
292			CancellationTimeoutExceeded)
293
294			context.become(awaitCancellation(plan, origSender, cancellationHandler))
295			case Failure(e: LockingFailedException) =>
296			deploymentManager.ask(RetrieveRunningDeployments)(2.seconds)
297			.mapTo[RunningDeployments]
298			.foreach {
299			case RunningDeployments(plans) =>
300			def intersectsWithNewPlan(existingPlan: DeploymentPlan): Boolean = {
301			existingPlan.affectedRunSpecIds.intersect(plan.affectedRunSpecIds).nonEmpty
	261		deploymentManager ! StartDeployment(plan, origSender, cmd.force)
302	262		}
303			val relatedDeploymentIds: Seq[String] = plans.collect {
304			case DeploymentStepInfo(p, _, _, _) if intersectsWithNewPlan(p) => p.id
305			}
306			origSender ! CommandFailed(cmd, AppLockedException(relatedDeploymentIds))
307			}
308			}
309			}
310	263
311			def deploy(driver: SchedulerDriver, plan: DeploymentPlan): Future[Unit] = {
	264		def deploymentSuccess(plan: DeploymentPlan): Unit = {
		aquamatthiasUnsubmitted Done Mmmh - shared responsibilities make it very unclear. If we want to maintain the lock in this actor, perhaps we need the information to StartDeployment(plan, sender, force, `lockAquired: Boolean`) ??
		zen-dogAuthorUnsubmitted Not Done I simplified the calling logic and added a comment to it.
		aquamatthiasUnsubmitted Done Oversimplified. Now this is possible: Deployment A starts --> lock is acquired by A Deployment B starts with force=true --> lock is acquired by B Deployment A fails --> this will remove the lock acquired by B Deployment B now is in progress and no lock for all apps of A and B is available.
		zen-dogAuthorUnsubmitted Not Done Right. Thanks for the catch. I've made `lockedRunSpecs` a `Map[PathId, Int]` to save the number of locks being held. This makes it better but the solution is not very robust. Since DeploymentManager has all the information about running deployments I would suggest removing the locks from `MarathonSchedulerActor` completely and ask DeploymentManager instead. This should be fairly simple to implement but I would wait with it until our ITs are green again.
312			deploymentRepository.store(plan).map { _ =>
	265		log.info(s"Deployment ${plan.id}:${plan.version} of ${plan.target.id} finished")
		aquamatthiasUnsubmitted Not Done Is this the responsibility of the MarathonSchedulerActor? Can the DeploymentManager handle this?
		zen-dogAuthorUnsubmitted Not Done I left it there because this way the `even publishing` happens strictly after the lock is removed. Otherwise there is no guarantee for that.
313			deploymentManager ! PerformDeployment(driver, plan)
314			}
315			}
316
317			def deploymentSuccess(plan: DeploymentPlan): Future[Unit] = {
318			log.info(s"Deployment ${plan.id}:${plan.version} of ${plan.target.id} successful")
319	266		eventBus.publish(DeploymentSuccess(plan.id, plan))
320			deploymentRepository.delete(plan.id).map(_ => ())
321	267		}
322	268
323			def deploymentFailed(plan: DeploymentPlan, reason: Throwable): Future[Unit] = {
	269		def deploymentFailed(plan: DeploymentPlan, reason: Throwable): Unit = {
		aquamatthiasUnsubmitted Not Done Is this the responsibility of the MarathonSchedulerActor? Can the DeploymentManager handle this?
		zen-dogAuthorUnsubmitted Not Done I left it there because this way the `purging of the launch queue` happens strictly after the lock is removed. Otherwise there is no guarantee for that.
324	270		log.error(reason, s"Deployment ${plan.id}:${plan.version} of ${plan.target.id} failed")
325	271		plan.affectedRunSpecIds.foreach(runSpecId => launchQueue.purge(runSpecId))
326	272		eventBus.publish(DeploymentFailed(plan.id, plan))
327			reason match {
328			case _: DeploymentCanceledException =>
329			deploymentRepository.delete(plan.id).map(_ => ())
330			case _ =>
331			Future.successful(())
332	273		}
333	274		}
334			}
335	275
336	276		object MarathonSchedulerActor {
337	277		@SuppressWarnings(Array("MaxParameters"))
338	278		def props(
339	279		createSchedulerActions: ActorRef => SchedulerActions,
340	280		deploymentManagerProps: SchedulerActions => Props,
341	281		historyActorProps: Props,
342			deploymentRepository: DeploymentRepository,
343	282		healthCheckManager: HealthCheckManager,
344	283		killService: KillService,
345	284		launchQueue: LaunchQueue,
346	285		marathonSchedulerDriverHolder: MarathonSchedulerDriverHolder,
347	286		electionService: ElectionService,
348	287		eventBus: EventStream)(implicit mat: Materializer): Props = {
349	288		Props(new MarathonSchedulerActor(
350	289		createSchedulerActions,
351	290		deploymentManagerProps,
352	291		historyActorProps,
353			deploymentRepository,
354	292		healthCheckManager,
355	293		killService,
356	294		launchQueue,
357	295		marathonSchedulerDriverHolder,
358	296		electionService,
359	297		eventBus
360	298		))
361	299		}
362	300
363			case class RecoverDeployments(deployments: Seq[DeploymentPlan])
	301		case class LoadedDeploymentsOnLeaderElection(deployments: Seq[DeploymentPlan])
364	302
365	303		sealed trait Command {
366	304		def answer: Event
367	305		}
368	306
369	307		case object ReconcileTasks extends Command {
370	308		def answer: Event = TasksReconciled
371	309		}
▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

View Options

src/main/scala/mesosphere/marathon/upgrade/AppStartActor.scala

1	1	package mesosphere.marathon.upgrade
2	2
3	3	import akka.actor._
4	4	import akka.event.EventStream
5	5	import mesosphere.marathon.core.event.DeploymentStatus
6	6	import mesosphere.marathon.core.launchqueue.LaunchQueue
7	7	import mesosphere.marathon.core.readiness.ReadinessCheckExecutor
8	8	import mesosphere.marathon.core.task.tracker.InstanceTracker
9	9	import mesosphere.marathon.state.RunSpec
10	10	import mesosphere.marathon.{ AppStartCanceledException, SchedulerActions }
11		import org.apache.mesos.SchedulerDriver
12	11	import org.slf4j.LoggerFactory
13	12
14	13	import scala.concurrent.Promise
15	14	import scala.util.control.NonFatal
16	15
17	16	class AppStartActor(
18	17	val deploymentManager: ActorRef,
19	18	val status: DeploymentStatus,
20		val driver: SchedulerDriver,
21	19	val scheduler: SchedulerActions,
22	20	val launchQueue: LaunchQueue,
23	21	val instanceTracker: InstanceTracker,
24	22	val eventBus: EventStream,
25	23	val readinessCheckExecutor: ReadinessCheckExecutor,
26	24	val runSpec: RunSpec,
27	25	val scaleTo: Int,
28	26	promise: Promise[Unit]) extends Actor with StartingBehavior {
Show All 27 Lines
56	54	}
57	55	}
58	56
59	57	object AppStartActor {
60	58	@SuppressWarnings(Array("MaxParameters"))
61	59	def props(
62	60	deploymentManager: ActorRef,
63	61	status: DeploymentStatus,
64		driver: SchedulerDriver,
65	62	scheduler: SchedulerActions,
66	63	launchQueue: LaunchQueue,
67	64	taskTracker: InstanceTracker,
68	65	eventBus: EventStream,
69	66	readinessCheckExecutor: ReadinessCheckExecutor,
70	67	runSpec: RunSpec,
71	68	scaleTo: Int,
72	69	promise: Promise[Unit]): Props = {
73		Props(new AppStartActor(deploymentManager, status, driver, scheduler, launchQueue, taskTracker, eventBus,
	70	Props(new AppStartActor(deploymentManager, status, scheduler, launchQueue, taskTracker, eventBus,
74	71	readinessCheckExecutor, runSpec, scaleTo, promise))
75	72	}
76	73	}

View Options

src/main/scala/mesosphere/marathon/upgrade/DeploymentActor.scala

1	1	package mesosphere.marathon
2	2	package upgrade
3	3
4	4	import java.net.URL
5	5
6	6	import akka.actor._
7	7	import akka.event.EventStream
8		import mesosphere.marathon.SchedulerActions
9	8	import mesosphere.marathon.core.event.{ DeploymentStatus, DeploymentStepFailure, DeploymentStepSuccess }
10	9	import mesosphere.marathon.core.health.HealthCheckManager
11	10	import mesosphere.marathon.core.instance.Instance
12	11	import mesosphere.marathon.core.launchqueue.LaunchQueue
13	12	import mesosphere.marathon.core.pod.PodDefinition
14	13	import mesosphere.marathon.core.readiness.ReadinessCheckExecutor
15	14	import mesosphere.marathon.core.task.termination.{ KillReason, KillService }
16	15	import mesosphere.marathon.core.task.tracker.InstanceTracker
17	16	import mesosphere.marathon.io.storage.StorageProvider
18	17	import mesosphere.marathon.state.{ AppDefinition, RunSpec }
19	18	import mesosphere.marathon.upgrade.DeploymentManager.{ DeploymentFailed, DeploymentFinished, DeploymentStepInfo }
20	19	import mesosphere.mesos.Constraints
21		import org.apache.mesos.SchedulerDriver
22	20	import com.typesafe.scalalogging.StrictLogging
23	21
24	22	import scala.concurrent.duration._
25
26	23	import scala.concurrent.{ Future, Promise }
27	24	import scala.util.{ Failure, Success }
28	25
29	26	private class DeploymentActor(
30	27	deploymentManager: ActorRef,
31	28	receiver: ActorRef,
32		driver: SchedulerDriver,
33	29	killService: KillService,
34	30	scheduler: SchedulerActions,
35	31	plan: DeploymentPlan,
36	32	instanceTracker: InstanceTracker,
37	33	launchQueue: LaunchQueue,
38	34	storage: StorageProvider,
39	35	healthCheckManager: HealthCheckManager,
40	36	eventBus: EventStream,
▲ Show 20 Lines • Show All 89 Lines • ▼ Show 20 Line(s)
130	126	}
131	127	}
132	128	}
133	129
134	130	// scalastyle:on
135	131
136	132	def startRunnable(runnableSpec: RunSpec, scaleTo: Int, status: DeploymentStatus): Future[Unit] = {
137	133	val promise = Promise[Unit]()
138		context.actorOf(AppStartActor.props(deploymentManager, status, driver, scheduler, launchQueue, instanceTracker,
	134	context.actorOf(AppStartActor.props(deploymentManager, status, scheduler, launchQueue, instanceTracker,
139	135	eventBus, readinessCheckExecutor, runnableSpec, scaleTo, promise))
140	136	promise.future
141	137	}
142	138
143	139	def scaleRunnable(runnableSpec: RunSpec, scaleTo: Int,
144	140	toKill: Option[Seq[Instance]],
145	141	status: DeploymentStatus): Future[Unit] = {
146	142	logger.debug("Scale runnable {}", runnableSpec)
Show All 14 Lines
161	157	killService.killInstances(tasks, KillReason.DeploymentScaling).map(_ => ())
162	158	}
163	159	}
164	160
165	161	def startTasksIfNeeded: Future[Unit] = {
166	162	tasksToStart.fold(Future.successful(())) { tasksToStart =>
167	163	logger.debug(s"Start next $tasksToStart tasks")
168	164	val promise = Promise[Unit]()
169		context.actorOf(TaskStartActor.props(deploymentManager, status, driver, scheduler, launchQueue, instanceTracker, eventBus,
	165	context.actorOf(TaskStartActor.props(deploymentManager, status, scheduler, launchQueue, instanceTracker, eventBus,
170	166	readinessCheckExecutor, runnableSpec, scaleTo, promise))
171	167	promise.future
172	168	}
173	169	}
174	170
175	171	killTasksIfNeeded.flatMap(_ => startTasksIfNeeded)
176	172	}
177	173
178	174	def stopRunnable(runnableSpec: RunSpec): Future[Unit] = {
179	175	val tasks = instanceTracker.specInstancesLaunchedSync(runnableSpec.id)
180	176	// TODO: the launch queue is purged in stopRunnable, but it would make sense to do that before calling kill(tasks)
181	177	killService.killInstances(tasks, KillReason.DeletingApp).map(_ => ()).andThen {
182	178	case Success(_) => scheduler.stopRunSpec(runnableSpec)
183	179	}
184	180	}
185	181
186	182	def restartRunnable(run: RunSpec, status: DeploymentStatus): Future[Unit] = {
187	183	if (run.instances == 0) {
188	184	Future.successful(())
189	185	} else {
190	186	val promise = Promise[Unit]()
191		context.actorOf(TaskReplaceActor.props(deploymentManager, status, driver, killService,
	187	context.actorOf(TaskReplaceActor.props(deploymentManager, status, killService,
192	188	launchQueue, instanceTracker, eventBus, readinessCheckExecutor, run, promise))
193	189	promise.future
194	190	}
195	191	}
196	192
197	193	def resolveArtifacts(urls: Map[URL, String]): Future[Unit] = {
198	194	val promise = Promise[Boolean]()
199	195	context.actorOf(ResolveArtifactsActor.props(urls, promise, storage))
Show All 16 Lines
216	212	case object Shutdown
217	213
218	214	val GracefulDeploymentShutdownTimeout = 30000.milliseconds
219	215
220	216	@SuppressWarnings(Array("MaxParameters"))
221	217	def props(
222	218	deploymentManager: ActorRef,
223	219	receiver: ActorRef,
224		driver: SchedulerDriver,
225	220	killService: KillService,
226	221	scheduler: SchedulerActions,
227	222	plan: DeploymentPlan,
228	223	taskTracker: InstanceTracker,
229	224	launchQueue: LaunchQueue,
230	225	storage: StorageProvider,
231	226	healthCheckManager: HealthCheckManager,
232	227	eventBus: EventStream,
233	228	readinessCheckExecutor: ReadinessCheckExecutor): Props = {
234	229
235	230	Props(new DeploymentActor(
236	231	deploymentManager,
237	232	receiver,
238		driver,
239	233	killService,
240	234	scheduler,
241	235	plan,
242	236	taskTracker,
243	237	launchQueue,
244	238	storage,
245	239	healthCheckManager,
246	240	eventBus,
247	241	readinessCheckExecutor
248	242	))
249	243	}
250	244	}

View Options

src/main/scala/mesosphere/marathon/upgrade/DeploymentManager.scala

1	1		package mesosphere.marathon
2	2		package upgrade
3	3
4	4		import akka.actor.SupervisorStrategy.Stop
5	5		import akka.actor._
6	6		import akka.event.EventStream
7			import mesosphere.marathon.MarathonSchedulerActor.{ RetrieveRunningDeployments, RunningDeployments }
	7		import akka.stream.Materializer
	8		import mesosphere.marathon.MarathonSchedulerActor.{ CommandFailed, DeploymentStarted, LoadedDeploymentsOnLeaderElection, RetrieveRunningDeployments, RunningDeployments }
8	9		import mesosphere.marathon.core.health.HealthCheckManager
9	10		import mesosphere.marathon.core.launchqueue.LaunchQueue
10	11		import mesosphere.marathon.core.readiness.{ ReadinessCheckExecutor, ReadinessCheckResult }
11	12		import mesosphere.marathon.core.task.Task
12	13		import mesosphere.marathon.core.task.termination.KillService
13	14		import mesosphere.marathon.core.task.tracker.InstanceTracker
14	15		import mesosphere.marathon.io.storage.StorageProvider
15			import mesosphere.marathon.state.{ RootGroup, PathId, Timestamp }
16			import mesosphere.marathon.upgrade.DeploymentActor.Cancel
17			import mesosphere.marathon.{ ConcurrentTaskUpgradeException, DeploymentCanceledException, SchedulerActions }
18			import org.apache.mesos.SchedulerDriver
	16		import mesosphere.marathon.state.PathId
	17		import mesosphere.marathon.storage.repository.DeploymentRepository
	18		import mesosphere.marathon.stream.Sink
	19		import org.slf4j.LoggerFactory
19	20
	21		import scala.async.Async.{ async, await }
20	22		import scala.collection.immutable.Seq
21	23		import scala.collection.mutable
22	24		import scala.concurrent.{ Future, Promise }
23	25		import scala.util.control.NonFatal
	26		import scala.util.{ Failure, Success }
24	27
	28		// format: OFF
	29		/*
	30		* Basic deployment message flow:
	31		* ===========================================
	32		* 1. Every deployment starts with a StartDeployment message. In the simplest case when there are no conflicts
	33		* deployment is added to the list of running deployments and saved in the repository.
	34		* 2. On completion of the store operation (which returns a Future) we proceed with the deployment by sending
	35		* ourselves a LaunchDeploymentActor message.
	36		* 3. Upon receiving a LaunchDeploymentActor we check if the deployment is still scheduled. It can happen that
	37		* while we're were waiting for the deployment to be stored another (forced) deployment canceled (and deleted
	38		* from the repository) this one. In this case the deployment is discarded.
	39		* 4. When the deployment is finished, DeploymentActor sends a FinishedDeployment message which will remove it
	40		* from the list of running deployment and delete from repository.
	41		*
	42		* Basic flow visualised:
	43		*
	44		* -> StartDeployment (1) \| - save deployment and mark it as [Scheduling]
	45		* \| -> repository.store(plan).onComplete (2)
	46		* \| <- LaunchDeploymentActor
	47		* \|
	48		* -> LaunchDeploymentActor \| - mark deployment as [Deploying]
	49		* \| - spawn DeploymentActor (3)
	50		* \| <- FinishedDeployment
	51		* \|
	52		* -> FinishedDeployment (4) \| - remove from runningDeployments
	53		* \| - delete from repository
	54		* ˅
	55		*
	56		* Deployment message flow with conflicts:
		aquamatthiasUnsubmitted Not Done Thanks for great documentation!
	57		* ===========================================
	58		* This is a more complicated flow since there are multiple steps that are performed asynchronously and each individually
	59		* can fail. Imagine a situation where we have already deployments A, B and C in process and a new deployment D arrives
	60		* which has conflicts with all 3 of existing ones. In this case we have to:
	61		*
	62		* - repository.delete(A, B and C)
	63		* - cancel(A, B and C)
	64		* - repository.store(D)
	65		* - deploy(D)
	66		*
	67		* Repository store() and delete() operations may fail individually hence we have here:
	68		* 3 deletes + 1 store = 4 operations that may fail. This is unlikely but still possible.
	69		*
	70		* The matter is even more complicated by the fact that during any of those operations a new and forced deployment E
	71		* my arrive for which all of existing ones are conflicts. Additionally current leader may abdicate at any moment and
	72		* the new leader has to find a consistent state in the repository from which it should be able to reconcile.
	73		*
	74		* Due to the fact that deployments are stored/deleted individually and that we're only saving the desired (future)
	75		* state makes a full and complete recovery very hard if possible. Thus deployment manager tries to leave the repository
	76		* consistent (no conflicting deployments are stored) so even in a case of failure it should be possible for:
	77		* - the new master to reconcile deployments and end in a valid (but might be incomplete) state
	78		* - framework user to resend the deployment (force = true) to achieve desired stat
	79		*
	80		* Current implementation:
	81		* ===========================================
	82		* 1. Find all conflicting deployments and mark new plan as [Scheduled]
	83		* 2. Delete all conflicts from the repository first. This way even if the master crashes we shouldn't have any
	84		* conflicts stored. However in the worst case (a crash after delete() and before store() we could end up with old
	85		* deployments canceled and new one not existing. This is where framework user has to resend the deployment. On
	86		* competition of the delete send self a CancelDeletedConflicts message.
	87		* In case of a failure the next forced deployment will go through conflicts and try to delete them again.
	88		* 3. Cancel all conflicting deployment by spawning a StopActor. The cancellation futures are saved and the deployments
	89		* are marked as [Cancelling]. Store the new plan in the repository. On completion of the store operation send self
	90		* a WaitForCanceledConflicts.
	91		* Note: Only after the deployment is stored we can send the original sender a positive response.
	92		* 4. If the deployment is still scheduled we wait for all the cancellation futures of the conflicts to complete.
	93		* When that's the case a LaunchDeploymentActor message is sent to ourselves and the rest of the deployment is
	94		* equal to the basic flow.
	95		* If any of the steps fail we send recover by sending self a FailedRepositoryOperation message. If the deployment is
	96		* still scheduled and not being canceled by the next one than we remove it from internal state. Otherwise the next
	97		* deployment will take care of it.
	98		*
	99		* Handling conflicts visualised:
	100		* ===========================================
	101		*
	102		* -> StartDeployment (1, 2) \| - mark deployment as [Scheduled]
	103		* \| - repository.delete(conflicts)
	104		* \| <- CancelDeletedConflicts
	105		* \|
	106		* -> CancelDeletedConflicts (3) \| - cancel conflicting deployments with StopActor
	107		* \| - mark them as [Canceled] and save the cancellation future
	108		* \| - repository.store(plan)
	109		* \| <- WaitForCanceledConflicts
	110		* \|
	111		* -> WaitForCanceledConflicts (4) \| -> Future.sequence(cancellations).onComplete (2.5)
	112		* \| <- LaunchDeploymentActor
	113		* ... ˅
	114		*
	115		* Note: deployment repository serializes operations by the key (deployment id). This allows us to keep store/delete
	116		* operations of the dependant deployments in order.
	117		*
	118		* Future work:
	119		* ===========================================
	120		* - Master should be able to reconcile completely from repository state because both current and desired
	121		* state are stored
	122		* - Multiple store/delete operations might be replaced by storing one deployment graph similar to how
	123		* we handle groups
	124		* - Move conflict resolution/dependencies/other upgrade logic out into a separate layer
	125		*/
	126		// format: ON
25	127		class DeploymentManager(
26	128		taskTracker: InstanceTracker,
27	129		killService: KillService,
28	130		launchQueue: LaunchQueue,
29	131		scheduler: SchedulerActions,
30	132		storage: StorageProvider,
31	133		healthCheckManager: HealthCheckManager,
32	134		eventBus: EventStream,
33			readinessCheckExecutor: ReadinessCheckExecutor) extends Actor with ActorLogging {
	135		readinessCheckExecutor: ReadinessCheckExecutor,
	136		deploymentRepository: DeploymentRepository,
	137		deploymentActorProps: (ActorRef, ActorRef, KillService, SchedulerActions, DeploymentPlan, InstanceTracker, LaunchQueue, StorageProvider, HealthCheckManager, EventStream, ReadinessCheckExecutor) => Props = DeploymentActor.props)(implicit val mat: Materializer) extends Actor {
34	138		import context.dispatcher
35	139		import mesosphere.marathon.upgrade.DeploymentManager._
36	140
37			val runningDeployments: mutable.Map[String, DeploymentInfo] = mutable.Map.empty[String, DeploymentInfo]
38			val deploymentStatus: mutable.Map[String, DeploymentStepInfo] = mutable.Map.empty[String, DeploymentStepInfo]
	141		private[this] val log = LoggerFactory.getLogger(getClass)
39	142
	143		val runningDeployments: mutable.Map[String, DeploymentInfo] = mutable.Map.empty
	144		val deploymentStatus: mutable.Map[String, DeploymentStepInfo] = mutable.Map.empty
	145
40	146		override def supervisorStrategy: SupervisorStrategy = OneForOneStrategy() {
41	147		case NonFatal(e) => Stop
42	148		}
43	149
44			def receive: Receive = {
	150		def receive: Receive = suspended
		aquamatthiasUnsubmitted Done This operation will fail if we are not leader. This needs to be handled.
		zen-dogAuthorUnsubmitted Not Done I'm not getting it if we're not a leader (waiting for marathon scheduler actor to wake us on leader election, ignoring everything else)
		aquamatthiasUnsubmitted Not Done The DeploymentManager is started with Marathon even if we are not elected as leader. That is the reason why this is an Option. You can not call get safely here.
45			case CancelConflictingDeployments(plan) =>
46			val conflictingDeployments = for {
47			info <- runningDeployments.values
48			if info.plan.isAffectedBy(plan)
49			} yield info
50	151
51			val cancellations = conflictingDeployments.map { info =>
52			stopActor(info.ref, new DeploymentCanceledException("The upgrade has been cancelled"))
	152		// TODO (AD): suspend-on-leader-abdication behavior should be implemented with the help of
	153		// TODO (AD): LeadershipModule.startWhenLeader() when moved this class to a core component.
		aquamatthiasUnsubmitted Done Marathon has a Leadership Actor, which will start actors onElected and stop onDefeated. This is probably the right choice here.
		zen-dogAuthorUnsubmitted Not Done I would suggest to do this after `DeploymentManager` is moved to a core component. Its' much easier to implement then. I'll leave a TODO.
	154		def suspended: Receive = {
	155		// Should only be sent by MarathonSchedulerActor on leader election. Reads all deployments from the
	156		// repository and sends them to sender. MarathonSchedulerActor needs to create locks for all running
	157		// deployments before they can be started.
	158		case LoadDeploymentsOnLeaderElection =>
	159		log.info("Recovering all deployments on leader election")
	160		val recipient = sender()
	161		deploymentRepository.all().runWith(Sink.seq).onComplete {
	162		case Success(deployments) => recipient ! LoadedDeploymentsOnLeaderElection(deployments)
	163		case Failure(t) =>
	164		log.error(s"Failed to recover deployments from repository: $t")
	165		recipient ! LoadedDeploymentsOnLeaderElection(Nil)
53	166		}
54
55			Future.sequence(cancellations).onComplete { _ =>
56			log.info(s"Conflicting deployments for deployment ${plan.id} have been canceled")
57			scheduler.schedulerActor ! ConflictingDeploymentsCanceled(
58			plan.id,
59			if (conflictingDeployments.nonEmpty) {
60			conflictingDeployments.map(_.plan).to[Seq]
61			} else Seq(plan))
	167		context.become(started)
	168		case _ =>
	169		// All messages are ignored until master reelection
62	170		}
63	171
64			case StopAllDeployments =>
65			for ((_, DeploymentInfo(ref, _)) <- runningDeployments)
	172		def started: Receive = {
	173		// Should only be sent by MarathonSchedulerActor on leader abdication. All the deployments are stopped
	174		// and the context becomes suspended. It's important not to receive DeploymentFinished messages from
	175		// running DeploymentActors because that will delete stored deployments from the repository.
	176		case ShutdownDeployments =>
	177		log.info("Shutting down all deployments on leader abdication")
	178		for ((_, DeploymentInfo(Some(ref), _, _, _)) <- runningDeployments)
66	179		ref ! DeploymentActor.Shutdown
67	180		runningDeployments.clear()
68	181		deploymentStatus.clear()
	182		context.become(suspended)
69	183
70	184		case CancelDeployment(id) =>
71			val origSender = sender()
	185		cancelDeployment(sender(), id)
		aquamatthiasUnsubmitted Done I'm a fan of calling a method in the receive function, if it is more than 2 lines. So it easy to distinguish behaviour and communication message. But this is a taste thing..
		aquamatthiasUnsubmitted Done If there is no running deployment, a DeploymentCanceledException is thrown? Sounds wrong.
		zen-dogAuthorUnsubmitted Not Done Same logic as it was before - didn't change it. And since you can see "canceling" as an "upgrade operation" it can also fail... I guess there is some logic somewhere there.
72	186
	187		case DeploymentFinished(plan) =>
	188		log.info(s"Removing ${plan.id} from list of running deployments")
		aquamatthiasUnsubmitted Done The extracted `ref` is not used.
	189		runningDeployments -= plan.id
	190		deploymentStatus -= plan.id
	191		deploymentRepository.delete(plan.id)
	192
	193		case LaunchDeploymentActor(plan, origSender) if isScheduledDeployment(plan.id) =>
	194		log.info(s"Launching DeploymentActor for ${plan.id}")
	195		startDeployment(plan, origSender)
	196
	197		case LaunchDeploymentActor(plan, _) =>
	198		log.info(s"Deployment ${plan.id} was already canceled or overridden by another one. Not proceeding with it")
		aquamatthiasUnsubmitted Done When can this happen? Should we log this on warn?
		zen-dogAuthorUnsubmitted Not Done This can happen when we get 2 `forced` deployments in a row, second one effectively canceling the first one out. I don't think this is worth a warning - at the end this is user triggered.
		aquamatthiasUnsubmitted Not Done Just for clarity: please add: `if !isScheduledDeployment(plan.id)`
	199
	200		case stepInfo: DeploymentStepInfo => deploymentStatus += stepInfo.plan.id -> stepInfo
	201
	202		case ReadinessCheckUpdate(id, result) => deploymentStatus.get(id).foreach { info =>
	203		deploymentStatus += id -> info.copy(readinessChecks = info.readinessChecks.updated(result.taskId, result))
	204		}
	205
	206		case RetrieveRunningDeployments =>
	207		sender() ! RunningDeployments(deploymentStatus.values.to[Seq])
	208
	209		case StartDeployment(plan, origSender, force) =>
	210		val conflicts = conflictingDeployments(plan)
	211		val hasConflicts = conflicts.nonEmpty
	212		val recipient = sender()
	213
		meichstedtUnsubmitted Not Done If storing the plan fails, it is still marked as scheduled. It should only be marked as scheduled if storing succeeded, otherwise the actor's state and the repo are out of sync. I also wonder why you wait for the plan to be stored and then send a message to self. Wouldn't it be clearer to store new plan and map into that future with `self ! LaunchDeploymentActor(plan, recipient)` Do the internal state mutation on receiving that message, after the plan has been persisted: case LaunchDeploymentActor(plan, recipient) => { markScheduled(plan) origSender ! DeploymentStarted(plan) startDeployment(plan, origSender) ?
		zen-dogAuthorUnsubmitted Not Done Consider following scenario with two deployments (same pathId) A and B where B is forced and ZK storing is slow: StartDeployment A is coming in A is stored (wait for store to succeed) StartDeployment B is coming in and is forced B is stored too since it's not in the internal state yet and therefore there are no conflicts LaunchDeploymentActor A comes in and proceeds LaunchDeploymentActor B comes in and... proceeds too! The whole point of saving deployments in the internal state first was to avoid such conflicts. The case where storing fails is not handled (and wasn't handled in the previous version) since then master fails and the new master picks it up from there with a new state. This feels like a longer discussion so maybe we should sit down together and talk about it.
		meichstedtUnsubmitted Not Done I don't see why starting DeploymentActor B couldn't mean stopping DeploymentActor A when DeploymentPlan B overrules A. But I see that this probably complicates things on another level, I'm open for discussion. It might be that failing futures re persisting the plans wasn't handled before, but that's no excuse :P And no – Marathon does not fail over just because of a failed write. It fails over when the zk connection is lost, not when a read or write operation fails. This should definitely be handled here, no matter in which order the steps are performed.
	214		if (!hasConflicts) startNonConflictingDeployment(plan, origSender, recipient)
	215		else if (hasConflicts && !force) giveUpConflictingDeployment(plan, origSender, force)
	216		else if (hasConflicts && force) startConflictingDeployment(plan, conflicts, origSender, recipient)
	217
	218		case CancelDeletedConflicts(plan, conflicts, recipient, origSender) if isScheduledDeployment(plan.id) =>
	219		cancelDeletedConflicts(plan, conflicts, recipient, origSender)
		aquamatthiasUnsubmitted Done Why do we need to check for `noSender`?
		zen-dogAuthorUnsubmitted Not Done Because reconciled deployments are sent with `context.system.deadLetters` as sender.
		meichstedtUnsubmitted Not Done could we stop doing that then? if there is one place in the code where we send with `deadletters` as sender, we basically need to check for noSender everywhere, as it's really hard to track `originalSender`'s throughout the codebase.
	220
	221		case WaitForCanceledConflicts(plan, conflicts, origSender) if isScheduledDeployment(plan.id) =>
	222		waitForCanceledConflicts(plan, conflicts, origSender)
	223
	224		case FailedRepositoryOperation(plan, recipient, reason) if isScheduledDeployment(plan.id) =>
	225		recipient ! DeploymentFailed(plan, reason)
	226		runningDeployments.remove(plan.id)
	227		}
	228
	229		private def giveUpConflictingDeployment(plan: DeploymentPlan, origSender: ActorRef, force: Boolean): Unit = {
	230		log.info(s"Received new deployment plan ${plan.id}. Conflicts are detected and it is not forced, so it will not start")
	231		origSender ! CommandFailed(
	232		MarathonSchedulerActor.Deploy(plan, force),
	233		AppLockedException(conflictingDeployments(plan).map(_.plan.id)))
		aquamatthiasUnsubmitted Done Not necessary but easier to understand: please add `if hasConflicts(plan) && force`
		meichstedtUnsubmitted Not Done It's easier to understand that way now, but it's easy to mess up with permutations in the future. The current state is exhaustive wrt `StartDeployment`. If one of these cases is slightly changed, there are chances the the PF doesn't catch certain messages anymore. How about catching the message in 1 case and calling distinct methods from there? Note all 3 cases will compute conflicting deployments to check whether the case matches, and in 2 of them the conflicting deployments will be computed again in the function body. When matching once you can store the result of this computation. Also, case matches with more than 2 lines prevent readability imo.
	234		}
	235
	236		@SuppressWarnings(Array("all")) // async/await
	237		private def startNonConflictingDeployment(plan: DeploymentPlan, origSender: ActorRef, recipient: ActorRef): Unit = {
	238		log.info(s"Received new deployment plan ${plan.id}, no conflicts detected")
	239		markScheduled(plan)
	240
	241		async {
	242		await(deploymentRepository.store(plan))
	243		log.info(s"Stored new deployment plan ${plan.id}")
	244
	245		if (origSender != Actor.noSender) origSender ! DeploymentStarted(plan)
	246
	247		self ! LaunchDeploymentActor(plan, recipient)
	248		}.recover {
	249		case NonFatal(e) =>
		meichstedtUnsubmitted Done `NonFatal(e)`
	250		log.error(s"Couldn't start deployment ${plan.id}. Repository store failed with: $e")
		aquamatthiasUnsubmitted Done I misinterpreted Save as store. But this is only in memory. Perhaps make the comment + method name more clear.
	251		self ! FailedRepositoryOperation(plan, recipient, e)
	252		}
	253		}
		meichstedtUnsubmitted Not Done Same as above: the internal state is changed before persisting the change. please persist and then change state.
	254
	255		@SuppressWarnings(Array("all")) // async/await
	256		private def startConflictingDeployment(plan: DeploymentPlan, conflicts: Seq[DeploymentInfo], origSender: ActorRef, recipient: ActorRef): Unit = {
	257		log.info(s"Received new forced deployment plan ${plan.id} Proceeding with canceling conflicts ${conflicts.map(_.plan.id)}")
	258
	259		markScheduled(plan)
	260
	261		async {
	262		await(Future.sequence(conflicts.map(p => deploymentRepository.delete(p.plan.id))))
	263		log.info(s"Removed conflicting deployments ${conflicts.map(_.plan.id)} from the repository")
	264		self ! CancelDeletedConflicts(plan, conflicts, recipient, origSender)
	265		}.recover {
	266		case NonFatal(e) =>
		meichstedtUnsubmitted Done `NonFatal(e)`
	267		log.info(s"Failed to start deployment ${plan.id}. Repository delete failed with: $e")
	268		self ! FailedRepositoryOperation(plan, recipient, e)
	269		}
	270		}
	271
	272		@SuppressWarnings(Array("all")) // async/await
	273		private def waitForCanceledConflicts(plan: DeploymentPlan, conflicts: Seq[DeploymentInfo], origSender: ActorRef) = {
	274		val toCancel = conflicts.filter(_.status == DeploymentStatus.Canceling)
	275		val cancellations: Seq[Future[Boolean]] = toCancel.flatMap(_.cancel)
	276
	277		async {
	278		await(Future.sequence(cancellations))
	279
	280		log.info(s"Conflicting deployments ${toCancel.map(_.plan.id)} for deployment ${plan.id} have been canceled")
	281		self ! LaunchDeploymentActor(plan, origSender)
	282		}
	283		}
	284
	285		@SuppressWarnings(Array("all")) // async/await
		aquamatthiasUnsubmitted Done Why do we need filter?
		zen-dogAuthorUnsubmitted Not Done It's an optimization really. So we don't "wait" for already successfully cancellations that are still in the list. But since those will succeed immediately it's not necessary.
	286		private def cancelDeletedConflicts(plan: DeploymentPlan, conflicts: Seq[DeploymentInfo], recipient: ActorRef, origSender: ActorRef): Unit = {
	287		// [Scheduled] - remove from internal state (they haven't been started yet, so there is nothing to cancel),
	288		// and tell MarathonSchedulerActor that it was canceled since it needs to remove the lock.
	289		// [Deploying] - cancel by spawning a StopActor and marking as [Canceling]
	290		// [Canceling] - Nothing to do here since this deployment is already being canceled
	291		conflicts.foreach{
	292		case DeploymentInfo(_, p, DeploymentStatus.Scheduled, _) => runningDeployments.remove(p.id).map(info =>
	293		recipient ! DeploymentFailed(info.plan, new DeploymentCanceledException("The upgrade has been cancelled")))
		aquamatthiasUnsubmitted Done Sounds like it would make sense... The user starts a deployment - the old deployment needs to be canceled, before the new one can be started. Only if the deployment is started, we can signal to the user a result.
		zen-dogAuthorUnsubmitted Not Done This was the case with the old `MarathonSchedulerActor`. The new `DeploymentManager` cancels conflicting deployment, saves the new one, marking it as `Scheduled` and is ready to handle the next one. This timeout was needed in the old version, because `MarathonSchedulerActor` was waiting for the deployment to be canceled while being effectively blocked, stashing all other messages. This is not the case anymore. I would argue we don't it anymore but wanted to hear another opinion - hence the TODO.
	294		case DeploymentInfo(_, p, DeploymentStatus.Deploying, _) => stopDeployment(p.id)
	295		case DeploymentInfo(_, _, DeploymentStatus.Canceling, _) => // Nothing to do here - this deployment is already being canceled
	296		}
	297
	298		async {
	299		await(deploymentRepository.store(plan))
	300		log.info(s"Stored new deployment plan ${plan.id}")
	301
	302		if (origSender != Actor.noSender) origSender ! DeploymentStarted(plan)
	303
	304		self ! WaitForCanceledConflicts(plan, conflicts, recipient)
	305		}.recover{
	306		case NonFatal(e) =>
		meichstedtUnsubmitted Done `NonFatal(e)`
	307		log.error(s"Couldn't start deployment ${plan.id}. Repository store failed with: $e")
	308		self ! FailedRepositoryOperation(plan, recipient, e)
	309		}
	310		}
	311
	312		private def cancelDeployment(sender: ActorRef, id: String) = {
73	313		runningDeployments.get(id) match {
74			case Some(info) =>
75			info.ref ! Cancel(new DeploymentCanceledException("The upgrade has been cancelled"))
	314		case Some(DeploymentInfo(_, _, DeploymentStatus.Scheduled, _)) =>
	315		log.info(s"Canceling scheduled deployment $id.")
	316		runningDeployments.remove(id).map(info => sender ! DeploymentFailed(info.plan, new DeploymentCanceledException("The upgrade has been cancelled")))
	317
	318		case Some(DeploymentInfo(Some(_), _, DeploymentStatus.Deploying, _)) =>
	319		log.info(s"Canceling deployment $id which is already in progress.")
	320		stopDeployment(id)
	321
	322		case Some(DeploymentInfo(_, _, DeploymentStatus.Canceling, _)) =>
	323		log.warn(s"The deployment $id is already being canceled.")
		aquamatthiasUnsubmitted Done You not only mark this deployment, but also start a deployment actor. Suggestion: `startDeployment`??
	324
	325		case Some(_) =>
	326		// This means we have a deployment with a [Deploying] status which has no DeploymentActor to cancel it.
	327		// This is clearly an invalid state and should never happen.
	328		log.error(s"Failed to cancel an invalid deployment ${runningDeployments.get(id)}")
	329
76	330		case None =>
77			origSender ! DeploymentFailed(
78			DeploymentPlan(id, RootGroup.empty, RootGroup.empty, Nil, Timestamp.now()),
	331		sender ! DeploymentFailed(
	332		DeploymentPlan.empty.copy(id = id),
79	333		new DeploymentCanceledException("The upgrade has been cancelled"))
80	334		}
	335		}
81	336
82			case msg @ DeploymentFinished(plan) =>
83			log.info(s"Removing ${plan.id} from list of running deployments")
84			runningDeployments -= plan.id
85			deploymentStatus -= plan.id
	337		/** Method saves new DeploymentInfo with status = [Scheduled] */
	338		private def markScheduled(plan: DeploymentPlan) = {
	339		runningDeployments += plan.id -> DeploymentInfo(plan = plan, status = DeploymentStatus.Scheduled)
	340		}
86	341
87			case PerformDeployment(driver, plan) if !runningDeployments.contains(plan.id) =>
	342		/** Method spawns a DeploymentActor for the passed plan and saves new DeploymentInfo with status = [Scheduled] */
	343		private def startDeployment(plan: DeploymentPlan, origSender: ActorRef) = {
88	344		val ref = context.actorOf(
89			DeploymentActor.props(
	345		deploymentActorProps(
90	346		self,
91			sender(),
92			driver,
	347		origSender,
93	348		killService,
94	349		scheduler,
95	350		plan,
96	351		taskTracker,
97	352		launchQueue,
98	353		storage,
99	354		healthCheckManager,
100	355		eventBus,
101	356		readinessCheckExecutor
102	357		),
103	358		plan.id
104	359		)
105			runningDeployments += plan.id -> DeploymentInfo(ref, plan)
106
107			case stepInfo: DeploymentStepInfo => deploymentStatus += stepInfo.plan.id -> stepInfo
108
109			case ReadinessCheckUpdate(id, result) => deploymentStatus.get(id).foreach { info =>
110			deploymentStatus += id -> info.copy(readinessChecks = info.readinessChecks.updated(result.taskId, result))
	360		runningDeployments.update(plan.id, runningDeployments(plan.id).copy(ref = Some(ref), status = DeploymentStatus.Deploying))
111	361		}
112	362
113			case _: PerformDeployment =>
114			sender() ! Status.Failure(new ConcurrentTaskUpgradeException("Deployment is already in progress"))
115
116			case RetrieveRunningDeployments =>
117			sender() ! RunningDeployments(deploymentStatus.values.to[Seq])
	363		/** Method spawns a StopActor for the passed plan Id and saves new DeploymentInfo with status = [Canceling] */
	364		@SuppressWarnings(Array("OptionGet"))
	365		private def stopDeployment(id: String) = {
	366		val info = runningDeployments(id)
	367		val stopFuture = stopActor(info.ref.get, new DeploymentCanceledException("The upgrade has been cancelled"))
	368		runningDeployments.update(id, info.copy(status = DeploymentStatus.Canceling, cancel = Some(stopFuture)))
118	369		}
119	370
120	371		def stopActor(ref: ActorRef, reason: Throwable): Future[Boolean] = {
121	372		val promise = Promise[Boolean]()
122	373		context.actorOf(Props(classOf[StopActor], ref, promise, reason))
123	374		promise.future
124	375		}
	376
	377		def isScheduledDeployment(id: String): Boolean = {
	378		runningDeployments.contains(id) && runningDeployments(id).status == DeploymentStatus.Scheduled
125	379		}
126	380
	381		def hasConflicts(plan: DeploymentPlan): Boolean = {
	382		conflictingDeployments(plan).nonEmpty
	383		}
	384
	385		/**
	386		* Methods return all deployments that are conflicting with passed plan.
	387		*/
	388		def conflictingDeployments(thisPlan: DeploymentPlan): Seq[DeploymentInfo] = {
	389		def intersectsWith(thatPlan: DeploymentPlan): Boolean = {
	390		thatPlan.affectedRunSpecIds.intersect(thisPlan.affectedRunSpecIds).nonEmpty
	391		}
	392		runningDeployments.values.filter(info => intersectsWith(info.plan)).to[Seq]
	393		}
	394		}
	395
127	396		object DeploymentManager {
128			case class PerformDeployment(driver: SchedulerDriver, plan: DeploymentPlan)
	397		case class StartDeployment(plan: DeploymentPlan, origSender: ActorRef, force: Boolean = false)
129	398		case class CancelDeployment(id: String)
130			case object StopAllDeployments
131			case class CancelConflictingDeployments(plan: DeploymentPlan)
	399		case object ShutdownDeployments
	400		case class WaitForCanceledConflicts(plan: DeploymentPlan, conflicts: Seq[DeploymentInfo], origSender: ActorRef)
	401		case class CancelDeletedConflicts(plan: DeploymentPlan, conflicts: Seq[DeploymentInfo], recipient: ActorRef, origSender: ActorRef)
	402		case class DeploymentFinished(plan: DeploymentPlan)
	403		case class DeploymentFailed(plan: DeploymentPlan, reason: Throwable)
	404		case class ReadinessCheckUpdate(deploymentId: String, result: ReadinessCheckResult)
	405		case class LaunchDeploymentActor(plan: DeploymentPlan, origSender: ActorRef)
	406		case class FailedRepositoryOperation(plan: DeploymentPlan, recipient: ActorRef, reason: Throwable)
	407		case object LoadDeploymentsOnLeaderElection
132	408
133	409		case class DeploymentStepInfo(
134	410		plan: DeploymentPlan,
135	411		step: DeploymentStep,
136	412		nr: Int,
137	413		readinessChecks: Map[Task.Id, ReadinessCheckResult] = Map.empty) {
138	414		lazy val readinessChecksByApp: Map[PathId, Seq[ReadinessCheckResult]] = {
139	415		readinessChecks.values.groupBy(_.taskId.runSpecId).mapValues(_.to[Seq]).withDefaultValue(Seq.empty)
140	416		}
141	417		}
142	418
143			case class DeploymentFinished(plan: DeploymentPlan)
144			case class DeploymentFailed(plan: DeploymentPlan, reason: Throwable)
145			case class AllDeploymentsCanceled(plans: Seq[DeploymentPlan])
146			case class ConflictingDeploymentsCanceled(id: String, deployments: Seq[DeploymentPlan])
147			case class ReadinessCheckUpdate(deploymentId: String, result: ReadinessCheckResult)
148
149	419		case class DeploymentInfo(
150			ref: ActorRef,
151			plan: DeploymentPlan)
	420		ref: Option[ActorRef] = None, // An ActorRef to the DeploymentActor if status = [Deploying]
	421		plan: DeploymentPlan, // Deployment plan
	422		status: DeploymentStatus, // Status can be [Scheduled], [Canceling] or [Deploying]
	423		cancel: Option[Future[Boolean]] = None) // Cancellation future if status = [Canceling]
152	424
	425		sealed trait DeploymentStatus
	426		object DeploymentStatus {
		aquamatthiasUnsubmitted Done Please move all objects into the companion object.
	427		case object Scheduled extends DeploymentStatus
	428		case object Canceling extends DeploymentStatus
	429		case object Deploying extends DeploymentStatus
	430		}
	431
	432		@SuppressWarnings(Array("MaxParameters"))
153	433		def props(
154	434		taskTracker: InstanceTracker,
155	435		killService: KillService,
156	436		launchQueue: LaunchQueue,
157	437		scheduler: SchedulerActions,
158	438		storage: StorageProvider,
159	439		healthCheckManager: HealthCheckManager,
160	440		eventBus: EventStream,
161			readinessCheckExecutor: ReadinessCheckExecutor): Props = {
	441		readinessCheckExecutor: ReadinessCheckExecutor,
	442		deploymentRepository: DeploymentRepository,
	443		deploymentActorProps: (ActorRef, ActorRef, KillService, SchedulerActions, DeploymentPlan, InstanceTracker, LaunchQueue, StorageProvider, HealthCheckManager, EventStream, ReadinessCheckExecutor) => Props = DeploymentActor.props)(implicit mat: Materializer): Props = {
162	444		Props(new DeploymentManager(taskTracker, killService, launchQueue,
163			scheduler, storage, healthCheckManager, eventBus, readinessCheckExecutor))
	445		scheduler, storage, healthCheckManager, eventBus, readinessCheckExecutor, deploymentRepository, deploymentActorProps))
164	446		}
165	447
166	448		}

View Options

src/main/scala/mesosphere/marathon/upgrade/StartingBehavior.scala

1	1	package mesosphere.marathon.upgrade
2	2
3	3	import akka.actor.Actor
4	4	import akka.event.EventStream
5	5	import mesosphere.marathon.SchedulerActions
	6	import mesosphere.marathon.core.condition.Condition.Terminal
6	7	import mesosphere.marathon.core.event.{ InstanceChanged, InstanceHealthChanged }
7	8	import mesosphere.marathon.core.instance.Instance
8		import mesosphere.marathon.core.condition.Condition.Terminal
9	9	import mesosphere.marathon.core.launchqueue.LaunchQueue
10	10	import mesosphere.marathon.core.task.tracker.InstanceTracker
11		import org.apache.mesos.SchedulerDriver
12	11	import org.slf4j.LoggerFactory
13	12
14	13	import scala.concurrent.duration._
15	14
16	15	trait StartingBehavior extends ReadinessBehavior { this: Actor =>
17	16	import context.dispatcher
18	17	import mesosphere.marathon.upgrade.StartingBehavior._
19	18
20	19	def eventBus: EventStream
21	20	def scaleTo: Int
22	21	def nrToStart: Int
23	22	def launchQueue: LaunchQueue
24		def driver: SchedulerDriver
25	23	def scheduler: SchedulerActions
26	24	def instanceTracker: InstanceTracker
27	25
28	26	def initializeStart(): Unit
29	27
30	28	private[this] val log = LoggerFactory.getLogger(getClass)
31	29
32	30	final override def preStart(): Unit = {
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

View Options

src/main/scala/mesosphere/marathon/upgrade/TaskReplaceActor.scala

1	1	package mesosphere.marathon.upgrade
2	2
3	3	import akka.actor._
4	4	import akka.event.EventStream
5	5	import mesosphere.marathon._
6	6	import mesosphere.marathon.core.event._
7	7	import mesosphere.marathon.core.instance.Instance
8	8	import mesosphere.marathon.core.instance.Instance.Id
9	9	import mesosphere.marathon.core.launchqueue.LaunchQueue
10	10	import mesosphere.marathon.core.readiness.ReadinessCheckExecutor
11	11	import mesosphere.marathon.core.task.termination.{ KillReason, KillService }
12	12	import mesosphere.marathon.core.task.tracker.InstanceTracker
13	13	import mesosphere.marathon.core.task.termination.InstanceChangedPredicates.considerTerminal
14	14	import mesosphere.marathon.state.RunSpec
15	15	import mesosphere.marathon.upgrade.TaskReplaceActor._
16		import org.apache.mesos.SchedulerDriver
17	16	import org.slf4j.LoggerFactory
18	17
19	18	import scala.collection.{ SortedSet, mutable }
20	19	import scala.concurrent.Promise
21	20
22	21	class TaskReplaceActor(
23	22	val deploymentManager: ActorRef,
24	23	val status: DeploymentStatus,
25		val driver: SchedulerDriver,
26	24	val killService: KillService,
27	25	val launchQueue: LaunchQueue,
28	26	val instanceTracker: InstanceTracker,
29	27	val eventBus: EventStream,
30	28	val readinessCheckExecutor: ReadinessCheckExecutor,
31	29	val runSpec: RunSpec,
32	30	promise: Promise[Unit]) extends Actor with ReadinessBehavior with ActorLogging {
33	31
▲ Show 20 Lines • Show All 134 Lines • ▼ Show 20 Line(s)
168	166
169	167	object TaskReplaceActor {
170	168	private[this] val log = LoggerFactory.getLogger(getClass)
171	169
172	170	//scalastyle:off
173	171	def props(
174	172	deploymentManager: ActorRef,
175	173	status: DeploymentStatus,
176		driver: SchedulerDriver,
177	174	killService: KillService,
178	175	launchQueue: LaunchQueue,
179	176	instanceTracker: InstanceTracker,
180	177	eventBus: EventStream,
181	178	readinessCheckExecutor: ReadinessCheckExecutor,
182	179	app: RunSpec,
183	180	promise: Promise[Unit]): Props = Props(
184		new TaskReplaceActor(deploymentManager, status, driver, killService, launchQueue, instanceTracker, eventBus,
	181	new TaskReplaceActor(deploymentManager, status, killService, launchQueue, instanceTracker, eventBus,
185	182	readinessCheckExecutor, app, promise)
186	183	)
187	184
188	185	/** Encapsulates the logic how to get a Restart going */
189	186	private[upgrade] case class RestartStrategy(nrToKillImmediately: Int, maxCapacity: Int)
190	187
191	188	private[upgrade] def computeRestartStrategy(runSpec: RunSpec, runningInstancesCount: Int): RestartStrategy = {
192	189	// in addition to a spec which passed validation, we require:
Show All 35 Lines

View Options

src/main/scala/mesosphere/marathon/upgrade/TaskStartActor.scala

1	1
2	2	package mesosphere.marathon.upgrade
3	3
4	4	import akka.actor.{ Actor, ActorLogging, ActorRef, Props }
5	5	import akka.event.EventStream
	6	import mesosphere.marathon.core.event.DeploymentStatus
6	7	import mesosphere.marathon.core.launchqueue.LaunchQueue
7	8	import mesosphere.marathon.core.readiness.ReadinessCheckExecutor
8	9	import mesosphere.marathon.core.task.tracker.InstanceTracker
9		import mesosphere.marathon.core.event.DeploymentStatus
10	10	import mesosphere.marathon.state.RunSpec
11	11	import mesosphere.marathon.{ SchedulerActions, TaskUpgradeCanceledException }
12		import org.apache.mesos.SchedulerDriver
13	12
14	13	import scala.concurrent.Promise
15	14
16	15	class TaskStartActor(
17	16	val deploymentManager: ActorRef,
18	17	val status: DeploymentStatus,
19		val driver: SchedulerDriver,
20	18	val scheduler: SchedulerActions,
21	19	val launchQueue: LaunchQueue,
22	20	val instanceTracker: InstanceTracker,
23	21	val eventBus: EventStream,
24	22	val readinessCheckExecutor: ReadinessCheckExecutor,
25	23	val runSpec: RunSpec,
26	24	val scaleTo: Int,
27	25	promise: Promise[Unit]) extends Actor with ActorLogging with StartingBehavior {
Show All 28 Lines
56	54	}
57	55	}
58	56
59	57	object TaskStartActor {
60	58	@SuppressWarnings(Array("MaxParameters"))
61	59	def props(
62	60	deploymentManager: ActorRef,
63	61	status: DeploymentStatus,
64		driver: SchedulerDriver,
65	62	scheduler: SchedulerActions,
66	63	launchQueue: LaunchQueue,
67	64	instanceTracker: InstanceTracker,
68	65	eventBus: EventStream,
69	66	readinessCheckExecutor: ReadinessCheckExecutor,
70	67	runSpec: RunSpec,
71	68	scaleTo: Int,
72	69	promise: Promise[Unit]): Props = {
73		Props(new TaskStartActor(deploymentManager, status, driver, scheduler, launchQueue, instanceTracker,
	70	Props(new TaskStartActor(deploymentManager, status, scheduler, launchQueue, instanceTracker,
74	71	eventBus, readinessCheckExecutor, runSpec, scaleTo, promise)
75	72	)
76	73	}
77	74	}

View Options

src/test/scala/mesosphere/marathon/MarathonSchedulerActorTest.scala

1	1		package mesosphere.marathon
2	2
3			import java.util.concurrent.TimeoutException
4
5	3		import akka.Done
6	4		import akka.actor.{ ActorRef, Props }
7	5		import akka.event.EventStream
8	6		import akka.stream.scaladsl.Source
9	7		import akka.testkit._
10	8		import akka.util.Timeout
11			import mesosphere.Unstable
12	9		import mesosphere.marathon.MarathonSchedulerActor._
13	10		import mesosphere.marathon.core.condition.Condition
14	11		import mesosphere.marathon.core.election.{ ElectionService, LocalLeadershipEvent }
15	12		import mesosphere.marathon.core.event._
16	13		import mesosphere.marathon.core.health.HealthCheckManager
17	14		import mesosphere.marathon.core.history.impl.HistoryActor
18	15		import mesosphere.marathon.core.instance.{ Instance, TestInstanceBuilder }
19	16		import mesosphere.marathon.core.launcher.impl.LaunchQueueTestHelper
▲ Show 20 Lines • Show All 433 Lines • ▼ Show 20 Line(s)
453	450		deploymentRepo.store(plan) returns Future.successful(Done)
454	451		instanceTracker.specInstancesLaunchedSync(app.id) returns Seq.empty[Instance]
455	452
456	453		val schedulerActor = system.actorOf(
457	454		MarathonSchedulerActor.props(
458	455		schedulerActions,
459	456		deploymentManagerProps,
460	457		historyActorProps,
461			deploymentRepo,
462	458		hcManager,
463	459		killService,
464	460		queue,
465	461		holder,
466	462		electionService,
467	463		system.eventStream
468	464		))
469	465
Show All 12 Lines
482	478		}
483	479
484	480		test("Forced deployment") {
485	481		val f = new Fixture
486	482		import f._
487	483		val app = AppDefinition(id = PathId("app1"), cmd = Some("cmd"), instances = 2, upgradeStrategy = UpgradeStrategy(0.5))
488	484		val rootGroup = createRootGroup(groups = Set(createGroup(PathId("/foo/bar"), Map(app.id -> app))))
489	485
490			val plan = DeploymentPlan(createRootGroup(), rootGroup)
	486		val plan = DeploymentPlan(createRootGroup(), rootGroup, id = Some("d1"))
491	487
492	488		appRepo.store(any) returns Future.successful(Done)
493	489		appRepo.get(app.id) returns Future.successful(None)
494	490		instanceTracker.specInstancesLaunchedSync(app.id) returns Seq.empty[Instance]
	491		instanceTracker.specInstancesSync(app.id) returns Seq.empty[Instance]
495	492		appRepo.delete(app.id) returns Future.successful(Done)
496	493
497	494		val schedulerActor = createActor()
498	495		try {
499	496		schedulerActor ! LocalLeadershipEvent.ElectedAsLeader
500	497		schedulerActor ! Deploy(plan)
501	498
502	499		expectMsgType[DeploymentStarted](10.seconds)
503	500
504			schedulerActor ! Deploy(plan, force = true)
	501		schedulerActor ! Deploy(plan.copy(id = "d2"), force = true)
505	502
506	503		expectMsgType[DeploymentStarted]
507	504
508	505		} finally {
509	506		stopActor(schedulerActor)
510	507		}
511	508		}
512	509
513			// TODO: Fix this test...
514			test("Cancellation timeout - this test is really racy and fails intermittently.", Unstable) {
515			val f = new Fixture
516			import f._
517			val app = AppDefinition(id = PathId("app1"), cmd = Some("cmd"), instances = 2, upgradeStrategy = UpgradeStrategy(0.5))
518			val rootGroup = createRootGroup(Map(app.id -> app), groups = Set(createGroup(PathId("/foo/bar"))))
519
520			val plan = DeploymentPlan(createRootGroup(), rootGroup)
521
522			appRepo.store(any) returns Future.successful(Done)
523			appRepo.get(app.id) returns Future.successful(None)
524			instanceTracker.specInstancesLaunchedSync(app.id) returns Seq.empty[Instance]
525			appRepo.delete(app.id) returns Future.successful(Done)
526
527			val schedulerActor = TestActorRef[MarathonSchedulerActor](
528			MarathonSchedulerActor.props(
529			schedulerActions,
530			deploymentManagerProps,
531			historyActorProps,
532			deploymentRepo,
533			hcManager,
534			killService,
535			queue,
536			holder,
537			electionService,
538			system.eventStream
539			)
540			)
541			try {
542			val probe = TestProbe()
543			schedulerActor.tell(LocalLeadershipEvent.ElectedAsLeader, probe.testActor)
544			schedulerActor.tell(Deploy(plan), probe.testActor)
545
546			probe.expectMsgType[DeploymentStarted]
547
548			schedulerActor.tell(Deploy(plan, force = true), probe.testActor)
549
550			val answer = probe.expectMsgType[CommandFailed]
551
552			answer.reason.isInstanceOf[TimeoutException] should be(true)
553			answer.reason.getMessage should be
554
555			// this test has more messages sometimes!
556			// needs: probe.expectNoMsg()
557			} finally {
558			stopActor(schedulerActor)
559			}
560			}
561
562	510		test("Do not run reconciliation concurrently") {
		aquamatthiasUnsubmitted Done yes.
563	511		val f = new Fixture
564	512		import f._
565	513		val actions = mock[SchedulerActions]
566	514		val actionsFactory: ActorRef => SchedulerActions = _ => actions
567	515		val schedulerActor = createActor(Some(actionsFactory))
568	516
569	517		val reconciliationPromise = Promise[Status]()
570	518		actions.reconcileTasks(any) returns reconciliationPromise.future
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Line(s)
635	583		val deploymentManagerProps: SchedulerActions => Props = schedulerActions => Props(new DeploymentManager(
636	584		instanceTracker,
637	585		killService,
638	586		queue,
639	587		schedulerActions,
640	588		storage,
641	589		hcManager,
642	590		system.eventStream,
643			readinessCheckExecutor
	591		readinessCheckExecutor,
	592		deploymentRepo
644	593		))
645	594
646	595		val historyActorProps: Props = Props(new HistoryActor(system.eventStream, taskFailureEventRepository))
647	596
648	597		def createActor(overrideActions: Option[(ActorRef) => SchedulerActions] = None) = {
649	598		val actions = overrideActions.getOrElse(schedulerActions)
650	599		system.actorOf(
651	600		MarathonSchedulerActor.props(
652	601		actions,
653	602		deploymentManagerProps,
654	603		historyActorProps,
655			deploymentRepo,
656	604		hcManager,
657	605		killService,
658	606		queue,
659	607		holder,
660	608		electionService,
661	609		system.eventStream
662	610		)
663	611		)
Show All 21 Lines

View Options

src/test/scala/mesosphere/marathon/integration/AppDeployWithLeaderAbdicationIntegrationTest.scala

Show First 20 Lines • Show All 49 Lines • ▼ Show 20 Line(s)
50	50	upgradeStrategy = Some(UpgradeStrategy(minimumHealthCapacity = 1.0))))
51	51
52	52	And("new and updated task is started successfully")
53	53	val updated = waitForTasks(appId, 2, maxWait = 90.seconds) //make sure, the new task has really started
54	54
55	55	val updatedTask = updated.diff(started.value).head
56	56	val updatedTaskIds: List[String] = updated.map(_.id).diff(startedTaskIds)
57	57
	58	And("service mock is responding")
	59	val serviceFacade = new ServiceMockFacade(updatedTask)
	60	WaitTestSupport.waitUntil("ServiceMock is up", 30.seconds){ Try(serviceFacade.plan()).isSuccess }
	61
58	62	log.info(s"Updated app: ${marathon.app(appId).entityPrettyJsonString}")
59	63
60	64	When("marathon leader is abdicated")
61	65	val leader = marathon.leader().value
62	66	marathon.abdicate().code should be (200)
63	67
64	68	And("a new leader is elected")
65	69	WaitTestSupport.waitUntil("the leader changes", 30.seconds) { marathon.leader().value != leader }
66	70
67	71	And("the updated task becomes healthy")
68		val serviceFacade = new ServiceMockFacade(updatedTask)
69		WaitTestSupport.waitUntil("ServiceMock is up", 30.seconds){ Try(serviceFacade.plan()).isSuccess }
70	72	// This would move the service mock from "InProgress" [HTTP 503] to "Complete" [HTTP 200]
71	73	serviceFacade.continue()
72	74	waitForEvent("health_status_changed_event")
73	75
74	76	Then("the app should have only 1 task launched")
75	77	waitForTasks(appId, 1) should have size 1
76	78
77	79	And("app was deployed successfully")
▲ Show 20 Lines • Show All 47 Lines • Show Last 20 Lines

View Options

src/test/scala/mesosphere/marathon/upgrade/AppStartActorTest.scala

1	1	package mesosphere.marathon.upgrade
2	2
3	3	import akka.testkit.{ TestActorRef, TestProbe }
4	4	import mesosphere.marathon.core.condition.Condition
5	5	import mesosphere.marathon.core.event.{ DeploymentStatus, InstanceChanged, InstanceHealthChanged }
6	6	import mesosphere.marathon.core.health.{ MarathonHttpHealthCheck, PortReference }
7	7	import mesosphere.marathon.core.instance.Instance
8	8	import mesosphere.marathon.core.launchqueue.LaunchQueue
9	9	import mesosphere.marathon.core.leadership.AlwaysElectedLeadershipModule
10	10	import mesosphere.marathon.core.readiness.ReadinessCheckExecutor
11	11	import mesosphere.marathon.core.task.tracker.InstanceTracker
12	12	import mesosphere.marathon.state.{ AppDefinition, PathId }
13	13	import mesosphere.marathon.test.{ MarathonActorSupport, MarathonSpec, MarathonTestHelper, Mockito }
14	14	import mesosphere.marathon.{ AppStartCanceledException, SchedulerActions }
15		import org.apache.mesos.SchedulerDriver
16	15	import org.scalatest.{ BeforeAndAfter, Matchers }
17	16
18	17	import scala.concurrent.duration._
19	18	import scala.concurrent.{ Await, Future, Promise }
20	19
21	20	class AppStartActorTest
22	21	extends MarathonActorSupport
23	22	with MarathonSpec
▲ Show 20 Lines • Show All 82 Lines • ▼ Show 20 Line(s)
106	105	Await.result(promise.future, 5.seconds)
107	106
108	107	verify(f.scheduler).startRunSpec(app.copy(instances = 0))
109	108	expectTerminated(ref)
110	109	}
111	110
112	111	class Fixture {
113	112
114		val driver: SchedulerDriver = mock[SchedulerDriver]
115	113	val scheduler: SchedulerActions = mock[SchedulerActions]
116	114	val launchQueue: LaunchQueue = mock[LaunchQueue]
117	115	val taskTracker: InstanceTracker = MarathonTestHelper.createTaskTracker(AlwaysElectedLeadershipModule.forActorSystem(system))
118	116	val deploymentManager: TestProbe = TestProbe()
119	117	val deploymentStatus: DeploymentStatus = mock[DeploymentStatus]
120	118	val readinessCheckExecutor: ReadinessCheckExecutor = mock[ReadinessCheckExecutor]
121	119
122	120	def instanceChanged(app: AppDefinition, condition: Condition): InstanceChanged = {
123	121	val instanceId = Instance.Id.forRunSpec(app.id)
124	122	val instance: Instance = mock[Instance]
125	123	instance.instanceId returns instanceId
126	124	InstanceChanged(instanceId, app.version, app.id, condition, instance)
127	125	}
128	126
129	127	def healthChanged(app: AppDefinition, healthy: Boolean): InstanceHealthChanged = {
130	128	InstanceHealthChanged(Instance.Id.forRunSpec(app.id), app.version, app.id, healthy = Some(healthy))
131	129	}
132	130
133	131	def startActor(app: AppDefinition, scaleTo: Int, promise: Promise[Unit]): TestActorRef[AppStartActor] =
134		TestActorRef(AppStartActor.props(deploymentManager.ref, deploymentStatus, driver, scheduler,
	132	TestActorRef(AppStartActor.props(deploymentManager.ref, deploymentStatus, scheduler,
135	133	launchQueue, taskTracker, system.eventStream, readinessCheckExecutor, app, scaleTo, promise)
136	134	)
137	135	}
138	136	}

View Options

src/test/scala/mesosphere/marathon/upgrade/DeploymentActorTest.scala

1	1	package mesosphere.marathon
2	2	package upgrade
3	3
4	4	import akka.actor.{ ActorRef, ActorSystem }
5	5	import akka.testkit.{ TestActorRef, TestProbe }
6	6	import akka.util.Timeout
7		import mesosphere.marathon.SchedulerActions
8	7	import mesosphere.marathon.core.condition.Condition
9	8	import mesosphere.marathon.core.event.InstanceChanged
10	9	import mesosphere.marathon.core.health.HealthCheckManager
11		import mesosphere.marathon.core.instance.Instance
	10	import mesosphere.marathon.core.instance.{ Instance, TestInstanceBuilder }
12	11	import mesosphere.marathon.core.launchqueue.LaunchQueue
13	12	import mesosphere.marathon.core.readiness.ReadinessCheckExecutor
14		import mesosphere.marathon.core.task.tracker.InstanceTracker
15	13	import mesosphere.marathon.core.task.KillServiceMock
16		import mesosphere.marathon.core.event.InstanceChanged
17		import mesosphere.marathon.core.health.HealthCheckManager
18		import mesosphere.marathon.core.instance.{ Instance, TestInstanceBuilder }
	14	import mesosphere.marathon.core.task.tracker.InstanceTracker
19	15	import mesosphere.marathon.io.storage.StorageProvider
20	16	import mesosphere.marathon.state._
21	17	import mesosphere.marathon.test.{ GroupCreation, MarathonSpec, Mockito }
22	18	import mesosphere.marathon.upgrade.DeploymentManager.{ DeploymentFinished, DeploymentStepInfo }
23		import org.apache.mesos.SchedulerDriver
24		import org.mockito.Mockito.{ verifyNoMoreInteractions, when }
	19	import org.mockito.Mockito.when
25	20	import org.mockito.invocation.InvocationOnMock
26	21	import org.mockito.stubbing.Answer
27	22	import org.scalatest.{ BeforeAndAfterAll, Matchers }
28	23
29	24	import scala.concurrent.Await
30	25	import scala.concurrent.duration._
31	26
32	27	// TODO: this is NOT a unit test. the DeploymentActor create child actors that cannot be mocked in the current
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Line(s)
226	221	plan.steps.zipWithIndex.foreach {
227	222	case (step, num) => managerProbe.expectMsg(5.seconds, DeploymentStepInfo(plan, step, num + 1))
228	223	}
229	224
230	225	managerProbe.expectMsg(5.seconds, DeploymentFinished(plan))
231	226
232	227	f.killService.numKilled should be (1)
233	228	f.killService.killed should contain (instance1_2.instanceId)
234		verifyNoMoreInteractions(f.driver)
235	229	} finally {
236	230	Await.result(system.terminate(), Duration.Inf)
237	231	}
238	232	}
239	233
240	234	class Fixture {
241	235	implicit val system = ActorSystem("TestSystem")
242	236	val tracker: InstanceTracker = mock[InstanceTracker]
243	237	val queue: LaunchQueue = mock[LaunchQueue]
244		val driver: SchedulerDriver = mock[SchedulerDriver]
245	238	val killService = new KillServiceMock(system)
246	239	val scheduler: SchedulerActions = mock[SchedulerActions]
247	240	val storage: StorageProvider = mock[StorageProvider]
248	241	val hcManager: HealthCheckManager = mock[HealthCheckManager]
249	242	val config: UpgradeConfig = mock[UpgradeConfig]
250	243	val readinessCheckExecutor: ReadinessCheckExecutor = mock[ReadinessCheckExecutor]
251	244	config.killBatchSize returns 100
252	245	config.killBatchCycle returns 10.seconds
253	246
254	247	def instanceChanged(app: AppDefinition, condition: Condition): InstanceChanged = {
255	248	val instanceId = Instance.Id.forRunSpec(app.id)
256	249	val instance: Instance = mock[Instance]
257	250	instance.instanceId returns instanceId
258	251	InstanceChanged(instanceId, app.version, app.id, condition, instance)
259	252	}
260	253
261	254	def deploymentActor(manager: ActorRef, receiver: ActorRef, plan: DeploymentPlan) = TestActorRef(
262	255	DeploymentActor.props(
263	256	manager,
264	257	receiver,
265		driver,
266	258	killService,
267	259	scheduler,
268	260	plan,
269	261	tracker,
270	262	queue,
271	263	storage,
272	264	hcManager,
273	265	system.eventStream,
274	266	readinessCheckExecutor
275	267	)
276	268	)
277	269
278	270	}
279	271	}

View Options

src/test/scala/mesosphere/marathon/upgrade/DeploymentManagerTest.scala

1	1	package mesosphere.marathon.upgrade
2	2
3		import akka.actor.ActorRef
	3	import java.util.concurrent.LinkedBlockingDeque
	4
	5	import akka.Done
	6	import akka.actor.{ ActorRef, Props }
4	7	import akka.event.EventStream
	8	import akka.stream.scaladsl.Source
5	9	import akka.testkit.TestActor.{ AutoPilot, NoAutoPilot }
6		import akka.testkit.{ ImplicitSender, TestActorRef, TestProbe }
	10	import akka.testkit.{ ImplicitSender, TestActor, TestActorRef, TestProbe }
7	11	import akka.util.Timeout
8	12	import com.codahale.metrics.MetricRegistry
	13	import mesosphere.marathon.MarathonSchedulerActor.{ CommandFailed, DeploymentStarted, LoadedDeploymentsOnLeaderElection }
9	14	import mesosphere.marathon.core.health.HealthCheckManager
10	15	import mesosphere.marathon.core.launchqueue.LaunchQueue
11	16	import mesosphere.marathon.core.leadership.AlwaysElectedLeadershipModule
12	17	import mesosphere.marathon.core.readiness.ReadinessCheckExecutor
13	18	import mesosphere.marathon.core.task.termination.KillService
14	19	import mesosphere.marathon.core.task.tracker.InstanceTracker
15	20	import mesosphere.marathon.io.storage.StorageProvider
16	21	import mesosphere.marathon.metrics.Metrics
17	22	import mesosphere.marathon.state.PathId._
18	23	import mesosphere.marathon.state.{ AppDefinition, PathId }
19		import mesosphere.marathon.storage.repository.AppRepository
20	24	import mesosphere.marathon.storage.repository.legacy.AppEntityRepository
21	25	import mesosphere.marathon.storage.repository.legacy.store.{ InMemoryStore, MarathonStore }
	26	import mesosphere.marathon.storage.repository.{ AppRepository, DeploymentRepository }
22	27	import mesosphere.marathon.test.{ GroupCreation, MarathonActorSupport, MarathonTestHelper, Mockito }
23	28	import mesosphere.marathon.upgrade.DeploymentActor.Cancel
24		import mesosphere.marathon.upgrade.DeploymentManager.{ CancelDeployment, DeploymentFailed, PerformDeployment, StopAllDeployments }
	29	import mesosphere.marathon.upgrade.DeploymentManager._
25	30	import mesosphere.marathon.{ MarathonConf, SchedulerActions }
26	31	import org.apache.mesos.SchedulerDriver
27	32	import org.rogach.scallop.ScallopConf
28	33	import org.scalatest.concurrent.Eventually
29	34	import org.scalatest.time.{ Seconds, Span }
30	35	import org.scalatest.{ BeforeAndAfter, FunSuiteLike, Matchers }
	36	import org.slf4j.LoggerFactory
31	37
32	38	import scala.concurrent.duration._
33		import scala.concurrent.{ Await, ExecutionContext }
	39	import scala.concurrent.{ Await, ExecutionContext, Future }
34	40
35	41	class DeploymentManagerTest
36	42	extends MarathonActorSupport
37	43	with FunSuiteLike
38	44	with Matchers
39	45	with BeforeAndAfter
40	46	with Mockito
41	47	with Eventually
42	48	with ImplicitSender
43	49	with GroupCreation {
44	50
45		test("deploy") {
	51	private[this] val log = LoggerFactory.getLogger(getClass)
	52
	53	test("Deployment") {
46	54	val f = new Fixture
47	55	val manager = f.deploymentManager()
48	56	val app = AppDefinition("app".toRootPath)
49	57
50	58	val oldGroup = createRootGroup()
51	59	val newGroup = createRootGroup(Map(app.id -> app))
52	60	val plan = DeploymentPlan(oldGroup, newGroup)
53	61
54		f.launchQueue.get(app.id) returns None
55		manager ! PerformDeployment(f.driver, plan)
	62	manager ! LoadDeploymentsOnLeaderElection
	63	expectMsgType[LoadedDeploymentsOnLeaderElection]
56	64
57		awaitCond(
58		manager.underlyingActor.runningDeployments.contains(plan.id),
59		5.seconds
60		)
	65	manager ! StartDeployment(plan, ActorRef.noSender)
	66
	67	awaitCond(manager.underlyingActor.runningDeployments.contains(plan.id), 5.seconds)
	68	manager.underlyingActor.runningDeployments(plan.id).status should be (DeploymentStatus.Deploying)
61	69	}
62	70
	71	test("Finished deployment") {
	72	val f = new Fixture
	73	val manager = f.deploymentManager()
	74	val app = AppDefinition("app".toRootPath)
	75
	76	val oldGroup = createRootGroup()
	77	val newGroup = createRootGroup(Map(app.id -> app))
	78	val plan = DeploymentPlan(oldGroup, newGroup)
	79
	80	manager ! LoadDeploymentsOnLeaderElection
	81	expectMsgType[LoadedDeploymentsOnLeaderElection]
	82
	83	manager ! StartDeployment(plan, ActorRef.noSender)
	84
	85	awaitCond(manager.underlyingActor.runningDeployments.contains(plan.id), 5.seconds)
	86	manager.underlyingActor.runningDeployments(plan.id).status should be (DeploymentStatus.Deploying)
	87
	88	manager ! DeploymentFinished(plan)
	89	awaitCond(manager.underlyingActor.runningDeployments.isEmpty, 5.seconds)
	90	}
	91
	92	test("Conflicting not forced deployment") {
	93	val f = new Fixture
	94	val manager = f.deploymentManager()
	95	val app = AppDefinition("app".toRootPath)
	96
	97	val oldGroup = createRootGroup()
	98	val newGroup = createRootGroup(Map(app.id -> app))
	99	val plan = DeploymentPlan(oldGroup, newGroup, id = Some("d1"))
	100
	101	manager ! LoadDeploymentsOnLeaderElection
	102	expectMsgType[LoadedDeploymentsOnLeaderElection]
	103
	104	manager ! StartDeployment(plan, ActorRef.noSender)
	105
	106	awaitCond(manager.underlyingActor.runningDeployments.contains(plan.id), 5.seconds)
	107	manager.underlyingActor.runningDeployments(plan.id).status should be (DeploymentStatus.Deploying)
	108
	109	manager ! StartDeployment(plan.copy(id = "d2"), self, force = false)
	110	expectMsgType[CommandFailed]
	111	manager.underlyingActor.runningDeployments.size should be (1)
	112	manager.underlyingActor.runningDeployments(plan.id).status should be (DeploymentStatus.Deploying)
	113	}
	114
	115	test("Conflicting forced deployment") {
	116	val f = new Fixture
	117	val manager = f.deploymentManager()
	118	val app = AppDefinition("app".toRootPath)
	119
	120	val oldGroup = createRootGroup()
	121	val newGroup = createRootGroup(Map(app.id -> app))
	122	val plan = DeploymentPlan(oldGroup, newGroup, id = Some("b1"))
	123
	124	manager ! LoadDeploymentsOnLeaderElection
	125	expectMsgType[LoadedDeploymentsOnLeaderElection]
	126
	127	manager ! StartDeployment(plan, self)
	128	expectMsgType[DeploymentStarted]
	129
	130	awaitCond(manager.underlyingActor.runningDeployments.contains(plan.id), 5.seconds)
	131	manager.underlyingActor.runningDeployments(plan.id).status should be (DeploymentStatus.Deploying)
	132
	133	manager ! StartDeployment(plan.copy(id = "d2"), self, force = true)
	134	expectMsgType[DeploymentStarted]
	135	manager.underlyingActor.runningDeployments(plan.id).status should be (DeploymentStatus.Canceling)
	136	eventually(manager.underlyingActor.runningDeployments("d2").status should be (DeploymentStatus.Deploying))
	137	}
	138
	139	test("Multiple conflicting forced deployments") {
	140	val f = new Fixture
	141	val manager = f.deploymentManager()
	142	val app = AppDefinition("app".toRootPath)
	143
	144	val oldGroup = createRootGroup()
	145	val newGroup = createRootGroup(Map(app.id -> app))
	146	val plan = DeploymentPlan(oldGroup, newGroup, id = Some("d1"))
	147
	148	manager ! LoadDeploymentsOnLeaderElection
	149	expectMsgType[LoadedDeploymentsOnLeaderElection]
	150
	151	manager ! StartDeployment(plan, self)
	152	expectMsgType[DeploymentStarted]
	153	manager.underlyingActor.runningDeployments("d1").status should be (DeploymentStatus.Deploying)
	154
	155	manager ! StartDeployment(plan.copy(id = "d2"), self, force = true)
	156	expectMsgType[DeploymentStarted]
	157	manager.underlyingActor.runningDeployments("d1").status should be (DeploymentStatus.Canceling)
	158	manager.underlyingActor.runningDeployments("d2").status should be (DeploymentStatus.Deploying)
	159
	160	manager ! StartDeployment(plan.copy(id = "d3"), self, force = true)
	161	expectMsgType[DeploymentStarted]
	162
	163	// Since deployments are not really started (DeploymentActor is not spawned), DeploymentFinished event is not
	164	// sent and the deployments are staying in the list of runningDeployments
	165	manager.underlyingActor.runningDeployments("d1").status should be (DeploymentStatus.Canceling)
	166	manager.underlyingActor.runningDeployments("d2").status should be (DeploymentStatus.Canceling)
	167	manager.underlyingActor.runningDeployments("d3").status should be (DeploymentStatus.Scheduled)
	168	}
	169
63	170	test("StopActor") {
64	171	val f = new Fixture
65	172	val manager = f.deploymentManager()
66	173	val probe = TestProbe()
67	174
68	175	probe.setAutoPilot(new AutoPilot {
69	176	override def run(sender: ActorRef, msg: Any): AutoPilot = msg match {
70	177	case Cancel(_) =>
Show All 14 Lines
85	192	val manager = f.deploymentManager()
86	193	implicit val timeout = Timeout(1.minute)
87	194
88	195	val app = AppDefinition("app".toRootPath)
89	196	val oldGroup = createRootGroup()
90	197	val newGroup = createRootGroup(Map(app.id -> app))
91	198	val plan = DeploymentPlan(oldGroup, newGroup)
92	199
93		manager ! PerformDeployment(f.driver, plan)
	200	manager ! LoadDeploymentsOnLeaderElection
	201	expectMsgType[LoadedDeploymentsOnLeaderElection]
94	202
95		manager ! CancelDeployment(plan.id)
	203	manager ! StartDeployment(plan, self)
	204	expectMsgType[DeploymentStarted]
96	205
97		expectMsgType[DeploymentFailed]
	206	manager ! CancelDeployment(plan.id)
	207	eventually(manager.underlyingActor.runningDeployments(plan.id).status should be (DeploymentStatus.Canceling))
98	208	}
99	209
100		test("Stop All Deployments") {
	210	test("Shutdown deployments") {
101	211	val f = new Fixture
102	212	val manager = f.deploymentManager()
103	213	implicit val timeout = Timeout(1.minute)
104	214
105	215	val app1 = AppDefinition("app1".toRootPath)
106	216	val app2 = AppDefinition("app2".toRootPath)
107	217	val oldGroup = createRootGroup()
108		manager ! PerformDeployment(f.driver, DeploymentPlan(oldGroup, createRootGroup(Map(app1.id -> app1))))
109		manager ! PerformDeployment(f.driver, DeploymentPlan(oldGroup, createRootGroup(Map(app2.id -> app2))))
	218
	219	manager ! LoadDeploymentsOnLeaderElection
	220	expectMsgType[LoadedDeploymentsOnLeaderElection]
	221
	222	manager ! StartDeployment(DeploymentPlan(oldGroup, createRootGroup(Map(app1.id -> app1))), ActorRef.noSender)
	223	manager ! StartDeployment(DeploymentPlan(oldGroup, createRootGroup(Map(app2.id -> app2))), ActorRef.noSender)
110	224	eventually(manager.underlyingActor.runningDeployments should have size 2)
111	225
112		manager ! StopAllDeployments
	226	manager ! ShutdownDeployments
113	227	eventually(manager.underlyingActor.runningDeployments should have size 0)
114	228	}
115	229
116	230	override implicit def patienceConfig: PatienceConfig = PatienceConfig(Span(3, Seconds))
117	231
118	232	class Fixture {
119	233
120	234	val driver: SchedulerDriver = mock[SchedulerDriver]
	235	val deploymentRepo = mock[DeploymentRepository]
121	236	val eventBus: EventStream = mock[EventStream]
122	237	val launchQueue: LaunchQueue = mock[LaunchQueue]
123	238	val config: MarathonConf = new ScallopConf(Seq("--master", "foo")) with MarathonConf {
124	239	verify()
125	240	}
126	241	val metrics: Metrics = new Metrics(new MetricRegistry)
127	242	val taskTracker: InstanceTracker = MarathonTestHelper.createTaskTracker(
128	243	AlwaysElectedLeadershipModule.forActorSystem(system), new InMemoryStore
129	244	)
130	245	val taskKillService: KillService = mock[KillService]
131	246	val scheduler: SchedulerActions = mock[SchedulerActions]
132	247	val appRepo: AppRepository = new AppEntityRepository(
133	248	new MarathonStore[AppDefinition](new InMemoryStore, metrics, () => AppDefinition(id = PathId("/test")), prefix = "app:"),
134	249	0
135	250	)(ExecutionContext.global, metrics)
136	251	val storage: StorageProvider = mock[StorageProvider]
137	252	val hcManager: HealthCheckManager = mock[HealthCheckManager]
138	253	val readinessCheckExecutor: ReadinessCheckExecutor = mock[ReadinessCheckExecutor]
139	254
	255	// A method that returns dummy props. Used to control the deployments progress. Otherwise the tests become racy
	256	// and depending on when DeploymentActor sends DeploymentFinished message.
	257	val deploymentActorProps: (Any, Any, Any, Any, Any, Any, Any, Any, Any, Any, Any) => Props = (_, _, _, _, _, _, _, _, _, _, _) => TestActor.props(new LinkedBlockingDeque())
	258
140	259	def deploymentManager(): TestActorRef[DeploymentManager] = TestActorRef (
141		DeploymentManager.props(taskTracker, taskKillService, launchQueue, scheduler, storage, hcManager, eventBus, readinessCheckExecutor)
	260	DeploymentManager.props(
	261	taskTracker,
	262	taskKillService,
	263	launchQueue,
	264	scheduler,
	265	storage,
	266	hcManager,
	267	eventBus,
	268	readinessCheckExecutor,
	269	deploymentRepo,
	270	deploymentActorProps)
142	271	)
143
	272	deploymentRepo.store(any[DeploymentPlan]) returns Future.successful(Done)
	273	deploymentRepo.delete(any[String]) returns Future.successful(Done)
	274	deploymentRepo.all() returns Source.empty
144	275	}
145	276	}

View Options

src/test/scala/mesosphere/marathon/upgrade/TaskReplaceActorTest.scala

1	1	package mesosphere.marathon
2	2	package upgrade
3	3
4	4	import akka.actor.{ Actor, Props }
5	5	import akka.testkit.TestActorRef
6	6	import mesosphere.marathon.core.condition.Condition
	7	import mesosphere.marathon.core.condition.Condition.Running
7	8	import mesosphere.marathon.core.event._
8	9	import mesosphere.marathon.core.health.{ MarathonHttpHealthCheck, PortReference }
9		import mesosphere.marathon.core.condition.Condition.Running
10	10	import mesosphere.marathon.core.instance.{ Instance, TestInstanceBuilder }
11	11	import mesosphere.marathon.core.launchqueue.LaunchQueue
12	12	import mesosphere.marathon.core.readiness.{ ReadinessCheck, ReadinessCheckExecutor, ReadinessCheckResult }
13	13	import mesosphere.marathon.core.task.tracker.InstanceTracker
14	14	import mesosphere.marathon.core.task.{ KillServiceMock, Task }
15	15	import mesosphere.marathon.state.PathId._
16	16	import mesosphere.marathon.state._
17	17	import mesosphere.marathon.test.MarathonActorSupport
18		import org.apache.mesos.SchedulerDriver
	18	import org.mockito.Matchers.any
19	19	import org.mockito.Mockito
20	20	import org.mockito.Mockito._
21		import org.mockito.Matchers.any
22	21	import org.scalatest.concurrent.Eventually
23	22	import org.scalatest.mockito.MockitoSugar
24	23	import org.scalatest.{ BeforeAndAfter, FunSuiteLike, Matchers }
25	24	import rx.lang.scala.Observable
26	25
27	26	import scala.collection.immutable.Seq
28	27	import scala.concurrent.duration._
29	28	import scala.concurrent.{ Await, Promise }
▲ Show 20 Lines • Show All 549 Lines • ▼ Show 20 Line(s)
579	578	eventually { f.killService.numKilled should be(1) }
580	579
581	580	Await.result(promise.future, 5.seconds)
582	581	}
583	582
584	583	class Fixture {
585	584	val deploymentsManager: TestActorRef[Actor] = TestActorRef[Actor](Props.empty)
586	585	val deploymentStatus = DeploymentStatus(DeploymentPlan.empty, DeploymentStep(Seq.empty))
587		private[this] val driver = mock[SchedulerDriver]
588	586	val killService = new KillServiceMock(system)
589	587	val queue: LaunchQueue = mock[LaunchQueue]
590	588	val tracker: InstanceTracker = mock[InstanceTracker]
591	589	val readinessCheckExecutor: ReadinessCheckExecutor = mock[ReadinessCheckExecutor]
592	590	val hostName = "host.some"
593	591	val hostPorts = Seq(123)
594	592
595	593	def runningInstance(app: AppDefinition): Instance = {
596	594	TestInstanceBuilder.newBuilder(app.id, version = app.version)
597	595	.addTaskWithBuilder().taskRunning().withNetworkInfo(hostName = Some(hostName), hostPorts = hostPorts).build()
598	596	.getInstance()
599	597	}
600	598
601	599	def instanceChanged(app: AppDefinition, condition: Condition): InstanceChanged = {
602	600	val instanceId = Instance.Id.forRunSpec(app.id)
603	601	val instance: Instance = mock[Instance]
604	602	when(instance.instanceId).thenReturn(instanceId)
605	603	InstanceChanged(instanceId, app.version, app.id, condition, instance)
606	604	}
607	605
608	606	def healthChanged(app: AppDefinition, healthy: Boolean): InstanceHealthChanged = {
609	607	InstanceHealthChanged(Instance.Id.forRunSpec(app.id), app.version, app.id, healthy = Some(healthy))
610	608	}
611	609	def replaceActor(app: AppDefinition, promise: Promise[Unit]): TestActorRef[TaskReplaceActor] = TestActorRef(
612		TaskReplaceActor.props(deploymentsManager, deploymentStatus, driver, killService, queue,
	610	TaskReplaceActor.props(deploymentsManager, deploymentStatus, killService, queue,
613	611	tracker, system.eventStream, readinessCheckExecutor, app, promise)
614	612	)
615	613	}
616	614	}

View Options

src/test/scala/mesosphere/marathon/upgrade/TaskStartActorTest.scala

1	1	package mesosphere.marathon.upgrade
2	2
3	3	import akka.testkit.{ TestActorRef, TestProbe }
4	4	import com.codahale.metrics.MetricRegistry
5		import mesosphere.{ IntegrationTag, Unstable }
6	5	import mesosphere.marathon.core.condition.Condition
	6	import mesosphere.marathon.core.condition.Condition.{ Failed, Running }
7	7	import mesosphere.marathon.core.event.{ DeploymentStatus, _ }
8	8	import mesosphere.marathon.core.health.MesosCommandHealthCheck
9		import mesosphere.marathon.core.condition.Condition.{ Failed, Running }
10	9	import mesosphere.marathon.core.instance.update.InstanceUpdateOperation
11	10	import mesosphere.marathon.core.instance.{ Instance, TestInstanceBuilder }
12	11	import mesosphere.marathon.core.launcher.impl.LaunchQueueTestHelper
13	12	import mesosphere.marathon.core.launchqueue.LaunchQueue
14	13	import mesosphere.marathon.core.leadership.AlwaysElectedLeadershipModule
15	14	import mesosphere.marathon.core.readiness.ReadinessCheckExecutor
16	15	import mesosphere.marathon.core.task.tracker.{ InstanceCreationHandler, InstanceTracker }
17	16	import mesosphere.marathon.metrics.Metrics
18	17	import mesosphere.marathon.state.PathId._
19	18	import mesosphere.marathon.state.{ AppDefinition, Command, Timestamp }
20	19	import mesosphere.marathon.storage.repository.legacy.store.InMemoryStore
21	20	import mesosphere.marathon.test.{ MarathonActorSupport, MarathonTestHelper, Mockito }
22	21	import mesosphere.marathon.{ SchedulerActions, TaskUpgradeCanceledException }
23		import org.apache.mesos.SchedulerDriver
	22	import mesosphere.{ IntegrationTag, Unstable }
24	23	import org.mockito.Mockito.{ spy, when }
25	24	import org.scalatest.concurrent.ScalaFutures
26	25	import org.scalatest.{ BeforeAndAfter, FunSuiteLike, Matchers }
27	26
28	27	import scala.concurrent.duration._
29	28	import scala.concurrent.{ Await, Promise }
30	29
31	30	class TaskStartActorTest
▲ Show 20 Lines • Show All 217 Lines • ▼ Show 20 Line(s)
249	248
250	249	noMoreInteractions(f.launchQueue)
251	250
252	251	expectTerminated(ref)
253	252	}
254	253
255	254	class Fixture {
256	255
257		val driver: SchedulerDriver = mock[SchedulerDriver]
258	256	val scheduler: SchedulerActions = mock[SchedulerActions]
259	257	val launchQueue: LaunchQueue = mock[LaunchQueue]
260	258	val metrics: Metrics = new Metrics(new MetricRegistry)
261	259	val leadershipModule = AlwaysElectedLeadershipModule.forActorSystem(system)
262	260	val taskTrackerModule = MarathonTestHelper.createTaskTrackerModule(
263	261	leadershipModule, store = new InMemoryStore, metrics = metrics)
264	262	val taskTracker: InstanceTracker = spy(taskTrackerModule.instanceTracker)
265	263	val taskCreationHandler: InstanceCreationHandler = taskTrackerModule.instanceCreationHandler
266	264	val deploymentManager = TestProbe()
267	265	val status: DeploymentStatus = mock[DeploymentStatus]
268	266	val readinessCheckExecutor: ReadinessCheckExecutor = mock[ReadinessCheckExecutor]
269	267
270	268	def instanceChange(app: AppDefinition, id: Instance.Id, condition: Condition): InstanceChanged = {
271	269	val instance: Instance = mock[Instance]
272	270	instance.instanceId returns id
273	271	InstanceChanged(id, app.version, app.id, condition, instance)
274	272	}
275	273
276	274	def healthChange(app: AppDefinition, id: Instance.Id, healthy: Boolean): InstanceHealthChanged = {
277	275	InstanceHealthChanged(id, app.version, app.id, Some(healthy))
278	276	}
279	277
280	278	def startActor(app: AppDefinition, scaleTo: Int, promise: Promise[Unit]): TestActorRef[TaskStartActor] = TestActorRef(TaskStartActor.props(
281		deploymentManager.ref, status, driver, scheduler, launchQueue, taskTracker, system.eventStream, readinessCheckExecutor, app, scaleTo, promise
	279	deploymentManager.ref, status, scheduler, launchQueue, taskTracker, system.eventStream, readinessCheckExecutor, app, scaleTo, promise
282	280	))
283	281	}
284	282	}

Rework DeploymentManager logicClosedAll UsersActions

Details

Diff Detail

Revision Contents

Diff 1160

src/main/scala/mesosphere/marathon/MarathonModule.scala

src/main/scala/mesosphere/marathon/MarathonSchedulerActor.scala

src/main/scala/mesosphere/marathon/upgrade/AppStartActor.scala

src/main/scala/mesosphere/marathon/upgrade/DeploymentActor.scala

src/main/scala/mesosphere/marathon/upgrade/DeploymentManager.scala

src/main/scala/mesosphere/marathon/upgrade/StartingBehavior.scala

src/main/scala/mesosphere/marathon/upgrade/TaskReplaceActor.scala

src/main/scala/mesosphere/marathon/upgrade/TaskStartActor.scala

src/test/scala/mesosphere/marathon/MarathonSchedulerActorTest.scala

src/test/scala/mesosphere/marathon/integration/AppDeployWithLeaderAbdicationIntegrationTest.scala

src/test/scala/mesosphere/marathon/upgrade/AppStartActorTest.scala

src/test/scala/mesosphere/marathon/upgrade/DeploymentActorTest.scala

src/test/scala/mesosphere/marathon/upgrade/DeploymentManagerTest.scala

src/test/scala/mesosphere/marathon/upgrade/TaskReplaceActorTest.scala

src/test/scala/mesosphere/marathon/upgrade/TaskStartActorTest.scala

Rework DeploymentManager logic
ClosedAll Users
Actions