Details

Reviewers

meichstedt
timcharper
kensipe
jenkins

Commits

rMARATHONfe73c3f6528f: Do a better job at maintaining task failure rate limiting values per RunSpec

JIRA Issues

JIRA MARATHON-7696 Backoff settings are not respected

Summary

There was --minimum_viable_task_execution_duration command-line argument was
introduced which is 60s by default. It doesn't respect maxLaunchDelay completely,
and looks like a hack. For instance, if maxLaunchDelay is more than 60s for
some RunSpec, the corresponding rate limiting can be reset/removed, while
the deployment of RunSpec is being delayed.

In addition to that, on conditions like Running, Created, existing delays are
advanced to make sure that delays are applied to failures, and time taken by
provisioning containers doesn't get subtracted from them.

In the future we might implement removal of rate-limiting delays when
a corresponding RunSpec becomes healthy.

Test Plan

sbt test

Diff Detail

Repository

rMARATHON marathon

Lint

Automatic diff as part of commit; lint not applicable.

Unit

Automatic diff as part of commit; unit tests not applicable.

ichernetsky created this revision.Aug 21 2017, 8:40 PM

Herald added a subscriber: marathon-dev. · View Herald TranscriptAug 21 2017, 8:40 PM

jenkins requested changes to this revision.Aug 21 2017, 8:46 PM

This revision now requires changes to proceed.Aug 21 2017, 8:46 PM

✗ Build of 4090 failed jenkins-public-marathon-phabricator-836.

Error message:

Stage Compile and Test failed.

(๑′°︿°๑)

Harbormaster failed to build B3351: Diff 4090!Aug 21 2017, 9:12 PM

✗ Build of 4090 failed jenkins-public-marathon-phabricator-837.

Error message:

Stage Compile and Test failed.

(๑′°︿°๑)

I like it... restarting tests

✗ Build of 4090 failed jenkins-public-marathon-phabricator-839.

Error message:

Stage Compile and Test failed.

(๑′°︿°๑)

Harbormaster failed to build B3351: Diff 4090!Aug 21 2017, 11:25 PM

✗ Build of 4090 failed jenkins-public-marathon-phabricator-841.

Error message:

Stage Compile and Test failed.

(๑′°︿°๑)

✗ Build of 4090 failed jenkins-public-marathon-phabricator-842.

Error message:

Stage Compile and Test failed.

(๑′°︿°๑)

✗ Build of 4090 failed jenkins-public-marathon-phabricator-845.

Error message:

Stage Compile and Test failed.

(๑′°︿°๑)

Rebase

Harbormaster completed building B3354: Diff 4093.Aug 22 2017, 5:48 AM

jenkins requested changes to this revision.Aug 22 2017, 5:49 AM

This revision now requires changes to proceed.Aug 22 2017, 5:49 AM

✗ Build of 4093 failed jenkins-public-marathon-phabricator-846.

Error message:

Stage Compile and Test failed.

(๑′°︿°๑)

Harbormaster failed to build B3354: Diff 4093!Aug 22 2017, 6:14 AM

Remove minimum_viable_task_execution_duration option from MarathonTest

jenkins requested changes to this revision.Aug 24 2017, 1:37 AM

This revision now requires changes to proceed.Aug 24 2017, 1:37 AM

jenkins accepted this revision.Aug 24 2017, 1:52 AM

This revision is now accepted and ready to land.Aug 24 2017, 1:52 AM

✔ Build of 4104 completed jenkins-public-marathon-phabricator-854.

You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:

＼\ ٩( ᐛ )و /／

Harbormaster completed building B3362: Diff 4104.Aug 24 2017, 1:52 AM

kensipe accepted this revision.Aug 24 2017, 1:53 AM

Seems okay. I think in fixing the test the semantics have been affected detrimentally.

src/test/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiterTest.scala
35	I'm finding myself wishing this test were clarified. What, exactly, is it testing" "does X thing after calling resetDelaysOfViableTasks". further, the `stillWaiting` variable name seems misleading. Presume `clock` is 00:00:00 `viable` has a back off strategy of 10 seconds, with a max launch delay of 60 seconds. We invoke a delay. This presumably puts the deadline to 00:00:10. `stillWaiting` has a back off strategy of 20 seconds, with a max launch delay of 70 seconds. We invoke a delay. This presumably puts the deadline to 00:00:20. We then advance 61 seconds. The clock is now 00:01:01. `viable`'s deadline is completely reset, so the current time is returned because it has been in a delayed state for more than 60 seconds. This is sensible. `stillWaiting`'s deadline is the same as the last time delay was called: 41 seconds ago. I might be misreading this, but that does not seem "stillWaiting". Is this a realistic simulation for how the component will actually be used? Would it be better to simulate stepping through time, calling delay multiple times, perhaps 10 seconds at a time, and then asserting in a frame that a delay call, followed by resetDelaysOfViableTasks ceases to have an effect once maxLaunchDelay is reached? I'm unsure if this would a better way to express the test; but, as it stands, the test is quite confusing.

This revision now requires changes to proceed.Aug 25 2017, 12:35 AM

"Seams okay" didn't come across right. This change seems okay :) I'm not entirely sure why we are removing the param that Matthias introduced, nor why he introduced it in the first place. And, the test is in a really weird state.

src/test/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiterTest.scala
35	Thanks for commenting on this. I agree that the variable names are confusing. The purpose of this test is to ensure that the existing delays as the time flies, get deleted.

Make resetDelaysOfViableTasks test case less confusing

jenkins requested changes to this revision.Aug 29 2017, 7:52 PM

This revision now requires changes to proceed.Aug 29 2017, 7:52 PM

I think Matthias introduced that parameter just to make sure that delays get GCed. But the implementation of clean-up and resetting delays was buggy.

jenkins accepted this revision.Aug 29 2017, 8:06 PM

Harbormaster completed building B3400: Diff 4150.Aug 29 2017, 8:06 PM

✔ Build of 4150 completed jenkins-public-marathon-phabricator-901.

You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:

＼\ ٩( ᐛ )و /／

timcharper accepted this revision.Aug 30 2017, 12:07 AM

This revision is now accepted and ready to land.Aug 30 2017, 12:07 AM

Closed by commit rMARATHONfe73c3f6528f: Do a better job at maintaining task failure rate limiting values per RunSpec (authored by ichernetsky). · Explain WhyAug 30 2017, 12:09 AM

This revision was automatically updated to reflect the committed changes.

Diff	ID	Base	Description	Created	Lint	Unit
Base			Base
Diff 1	4090	859a6fb		Aug 21 2017, 8:40 PM	★	★
Diff 2	4093	9678ca0	Rebase	Aug 22 2017, 5:48 AM	★	★
Diff 3	4104	2bd046f	- Remove minimum_viable_task_execution_duration option from MarathonTest	Aug 24 2017, 1:31 AM	★	★
Diff 4	4150	2bd046f	- Make resetDelaysOfViableTasks test case less confusing	Aug 29 2017, 7:50 PM	★	★
Diff 5	4153	391b379	rMARATHONfe73c3f6528f6039057f1d3980112843f54649ed	Aug 30 2017, 12:09 AM	★	★

Commit	Tree	Parents	Author	Summary	Date
4ba2248357b0	ff6f9f6dbaf1	3f8964e09a3e	Ivan Chernetsky	Make resetDelaysOfViableTasks test case less confusing	Aug 29 2017, 7:50 PM
3f8964e09a3e	cc1c0e11dc3b	887383474d06	Ivan Chernetsky	Remove minimum_viable_task_execution_duration option from MarathonTest	Aug 24 2017, 1:30 AM
887383474d06	45070b057e11	2bd046f41b2a	Ivan Chernetsky	Do a better job at maintaining task failure rate limiting values per RunSpec (Show More…)	Aug 21 2017, 8:19 PM

Diff 4153

View Options

docs/docs/command-line-flags.md

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Line(s)
159	159	* `--mesos_heartbeat_interval` (Optional. Default: 15 seconds):
160	160	(milliseconds) in the absence of receiving a message from the mesos master during a time window of this duration,
161	161	attempt to coerce mesos into communicating with marathon.
162	162	* `--mesos_heartbeat_failure_threshold` (Optional. Default: 5):
163	163	after missing this number of expected communications from the mesos master, infer that marathon has become
164	164	disconnected from the master.
165	165	* `--mesos_bridge_name` (Optional. Default: mesos-bridge):
166	166	The name of the Mesos CNI network used by MESOS-type containers configured to use bridged networking
167		* <span class="label label-default">v1.5.0</span>`--minimum_viable_task_execution_duration` (Optional. Default: 60 seconds):
168		Delay (in ms) after which a task is considered viable. If the task starts up correctly, but fails during this timeout, the application is backed off.
169	167	* <span class="label label-default">v1.5.0</span>`--backup_location` (Optional. Default: None):
170	168	Create a backup before a migration is applied to the persistent store.
171	169	This backup can be used to restore the state at that time.
172	170	Currently two providers are allowed:
173	171	- File provider: file:///path/to/file
174	172	- S3 provider (experimental): s3://bucket-name/key-in-bucket?access_key=xxx&secret_key=xxx&region=eu-central-1
175	173	Please note: access_key and secret_key are optional.
176	174	If not provided, the [AWS default credentials provider chain](http://docs.aws.amazon.com/sdk-for-java/v1/developer-guide/credentials.html) is used to look up aws credentials.
▲ Show 20 Lines • Show All 174 Lines • Show Last 20 Lines

View Options

src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueue.scala

Show First 20 Lines • Show All 85 Lines • ▼ Show 20 Line(s)
86	86	def asyncPurge(specId: PathId): Future[Done]
87	87
88	88	/** Add delay to the given RunnableSpec because of a failed instance */
89	89	def addDelay(spec: RunSpec): Unit
90	90
91	91	/** Reset the backoff delay for the given RunnableSpec. */
92	92	def resetDelay(spec: RunSpec): Unit
93	93
	94	/** Advance the reference time point of the delay for the given RunSpec */
	95	def advanceDelay(spec: RunSpec): Unit
	96
94	97	/** Notify queue about InstanceUpdate */
95	98	def notifyOfInstanceUpdate(update: InstanceChange): Future[Done]
96	99	}

View Options

src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueueConfig.scala

1	1	package mesosphere.marathon
2	2	package core.launchqueue
3	3
4	4	import org.rogach.scallop.ScallopConf
5	5
6		import scala.concurrent.duration._
7
8	6	trait LaunchQueueConfig extends ScallopConf {
9	7
10		lazy val minimumViableTaskExecutionDurationMillis = opt[Long](
11		"minimum_viable_task_execution_duration",
12		descr = "Delay (in ms) after which a task is considered viable.",
13		default = Some(60000))
14
15	8	lazy val launchQueueRequestTimeout = opt[Int](
16	9	"launch_queue_request_timeout",
17	10	descr = "INTERNAL TUNING PARAMETER: Timeout (in ms) for requests to the launch queue actor.",
18	11	hidden = true,
19	12	default = Some(3000))
20	13
21	14	lazy val taskOpNotificationTimeout = opt[Int](
22	15	"task_operation_notification_timeout",
23	16	descr = "INTERNAL TUNING PARAMETER: Timeout (in ms) for matched task operations to be accepted or rejected.",
24	17	hidden = true,
25	18	default = Some(30000))
26	19
27		lazy val minimumViableTaskExecutionDuration: FiniteDuration = minimumViableTaskExecutionDurationMillis().millis
28	20	}

View Options

src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueueModule.scala

Show All 37 Lines
38	38	maybeOfferReviver,
39	39	taskTracker,
40	40	rateLimiterActor,
41	41	offerMatchStatisticsActor)(runSpec, count)
42	42	val props = LaunchQueueActor.props(config, offerMatchStatisticsActor, runSpecActorProps)
43	43	leadershipModule.startWhenLeader(props, "launchQueue")
44	44	}
45	45
46		val rateLimiter: RateLimiter = new RateLimiter(config, clock)
	46	val rateLimiter: RateLimiter = new RateLimiter(clock)
47	47	private[this] val rateLimiterActor: ActorRef = {
48	48	val props = RateLimiterActor.props(
49	49	rateLimiter, launchQueueActorRef)
50	50	leadershipModule.startWhenLeader(props, "rateLimiter")
51	51	}
52	52	val launchQueue: LaunchQueue = new LaunchQueueDelegate(config, launchQueueActorRef, rateLimiterActor)
53	53	}

View Options

src/main/scala/mesosphere/marathon/core/launchqueue/impl/LaunchQueueDelegate.scala

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Line(s)
93	93	case NonFatal(e) => throw new RuntimeException(s"in $method", e)
94	94	}
95	95	answerFuture.mapTo[R]
96	96	}
97	97
98	98	override def addDelay(spec: RunSpec): Unit = rateLimiterRef ! RateLimiterActor.AddDelay(spec)
99	99
100	100	override def resetDelay(spec: RunSpec): Unit = rateLimiterRef ! RateLimiterActor.ResetDelay(spec)
	101
	102	override def advanceDelay(spec: RunSpec): Unit = rateLimiterRef ! RateLimiterActor.AdvanceDelay(spec)
101	103	}
102	104
103	105	private[impl] object LaunchQueueDelegate {
104	106	sealed trait Request
105	107	case object List extends Request
106	108	case object ListWithStatistics extends Request
107	109	case class Count(runSpecId: PathId) extends Request
108	110	case class Purge(runSpecId: PathId) extends Request
109	111	case object ConfirmPurge extends Request
110	112	case class Add(spec: RunSpec, count: Int) extends Request
111	113	}

View Options

src/main/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiter.scala

1	1	package mesosphere.marathon
2	2	package core.launchqueue.impl
3	3
4	4	import java.time.Clock
5	5	import java.util.concurrent.TimeUnit
6	6
7		import mesosphere.marathon.core.launchqueue.LaunchQueueConfig
8	7	import mesosphere.marathon.state.{ RunSpec, PathId, Timestamp }
	8	import mesosphere.util.DurationToHumanReadable
9	9	import org.slf4j.LoggerFactory
10	10
11	11	import scala.concurrent.duration._
12	12
13	13	/**
14	14	* Manages the task launch delays for every run spec and config version.
15	15	*
16	16	* We do not keep the delays for every version because that would include scaling changes or manual restarts.
17	17	*/
18		private[launchqueue] class RateLimiter(config: LaunchQueueConfig, clock: Clock) {
	18	private[launchqueue] class RateLimiter(clock: Clock) {
19	19	import RateLimiter._
20	20
21	21	/** The task launch delays per run spec and their last config change. */
22	22	private[this] var taskLaunchDelays = Map[(PathId, Timestamp), Delay]()
23	23
24		/**
25		* Reset delay for tasks that have reached the viability
26		* threshold. The deadline indicates when the task has been
27		* launched for the last time.
28		*/
	24	/*Reset delays for tasks that have reached the maximum launch delay threshold. /
29	25	def resetDelaysOfViableTasks(): Unit = {
	26	val now = clock.now()
30	27	taskLaunchDelays = taskLaunchDelays.filter {
31	28	case (_, delay) =>
32		clock.now() - config.minimumViableTaskExecutionDuration < delay.deadline
	29	now <= delay.referenceTimestamp + delay.maxLaunchDelay
33	30	}
34	31	}
35	32
36	33	def getDeadline(spec: RunSpec): Timestamp =
37	34	taskLaunchDelays.get(spec.id -> spec.versionInfo.lastConfigChangeVersion).map(_.deadline) getOrElse clock.now()
38	35
39	36	def addDelay(spec: RunSpec): Timestamp = {
40	37	setNewDelay(spec, "Increasing delay") {
41		case Some(delay) => Some(delay.increased(clock, spec))
42		case None => Some(Delay(clock, spec))
	38	case Some(delay) => delay.increased(clock, spec)
	39	case None => Delay(clock, spec)
43	40	}
44	41	}
45	42
46		private[this] def setNewDelay(spec: RunSpec, message: String)(
47		calcDelay: Option[Delay] => Option[Delay]): Timestamp = {
	43	private[this] def setNewDelay(spec: RunSpec, message: String)(calcDelay: Option[Delay] => Delay): Timestamp = {
48	44	val maybeDelay: Option[Delay] = taskLaunchDelays.get(spec.id -> spec.versionInfo.lastConfigChangeVersion)
49		calcDelay(maybeDelay) match {
50		case Some(newDelay) =>
51		import mesosphere.util.DurationToHumanReadable
	45	val newDelay = calcDelay(maybeDelay)
	46
52	47	val now: Timestamp = clock.now()
53		val priorTimeLeft = (now until maybeDelay.map(_.deadline).getOrElse(now)).toHumanReadable
54	48	val timeLeft = (now until newDelay.deadline).toHumanReadable
55	49
56		if (newDelay.deadline <= now) {
57		resetDelay(spec)
58		} else {
59		log.info(s"$message. Task launch delay for [${spec.id}] changed from [$priorTimeLeft] to [$timeLeft].")
60		taskLaunchDelays += ((spec.id, spec.versionInfo.lastConfigChangeVersion) -> newDelay)
61		}
	50	log.info(
	51	s"$message. Task launch delay for [${spec.id} - ${spec.versionInfo.lastConfigChangeVersion}] is set to $timeLeft")
	52	taskLaunchDelays += ((spec.id -> spec.versionInfo.lastConfigChangeVersion) -> newDelay)
62	53	newDelay.deadline
63
64		case None =>
65		resetDelay(spec)
66		clock.now()
67	54	}
68		}
69	55
70	56	def resetDelay(runSpec: RunSpec): Unit = {
71		if (taskLaunchDelays contains (runSpec.id -> runSpec.versionInfo.lastConfigChangeVersion)) {
	57	val key = runSpec.id -> runSpec.versionInfo.lastConfigChangeVersion
	58	taskLaunchDelays.get(key).foreach { _ =>
72	59	log.info(s"Task launch delay for [${runSpec.id} - ${runSpec.versionInfo.lastConfigChangeVersion}}] reset to zero")
73		taskLaunchDelays -= (runSpec.id -> runSpec.versionInfo.lastConfigChangeVersion)
	60	taskLaunchDelays -= key
74	61	}
75	62	}
	63
	64	def advanceDelay(runSpec: RunSpec): Unit = {
	65	val key = runSpec.id -> runSpec.versionInfo.lastConfigChangeVersion
	66	taskLaunchDelays.get(key).foreach { delay =>
	67	log.info(s"Task launch delay for [${runSpec.id} - ${runSpec.versionInfo.lastConfigChangeVersion}}] got advanced")
	68	taskLaunchDelays += key -> Delay(clock, delay.currentDelay, delay.maxLaunchDelay)
76	69	}
	70	}
	71	}
77	72
78	73	private object RateLimiter {
79	74	private val log = LoggerFactory.getLogger(getClass.getName)
80	75
81	76	private object Delay {
82		def apply(clock: Clock, runSpec: RunSpec): Delay = Delay(clock.now() + runSpec.backoffStrategy.backoff, runSpec.backoffStrategy.backoff)
83		def apply(clock: Clock, delay: FiniteDuration): Delay = Delay(clock.now() + delay, delay)
	77	def apply(clock: Clock, runSpec: RunSpec): Delay = {
	78	val delay = runSpec.backoffStrategy.backoff min runSpec.backoffStrategy.maxLaunchDelay
	79	Delay(clock.now(), delay, runSpec.backoffStrategy.maxLaunchDelay)
84	80	}
	81	def apply(clock: Clock, currentDelay: FiniteDuration, maxLaunchDelay: FiniteDuration): Delay =
	82	Delay(clock.now(), currentDelay, maxLaunchDelay)
	83	}
85	84
86	85	private case class Delay(
87		deadline: Timestamp,
88		delay: FiniteDuration) {
	86	referenceTimestamp: Timestamp,
	87	currentDelay: FiniteDuration,
	88	maxLaunchDelay: FiniteDuration) {
89	89
	90	def deadline: Timestamp = referenceTimestamp + currentDelay
	91
90	92	def increased(clock: Clock, runSpec: RunSpec): Delay = {
91	93	val newDelay: FiniteDuration =
92	94	runSpec.backoffStrategy.maxLaunchDelay min FiniteDuration(
93		(delay.toNanos * runSpec.backoffStrategy.factor).toLong, TimeUnit.NANOSECONDS)
94		Delay(clock, newDelay)
	95	(currentDelay.toNanos * runSpec.backoffStrategy.factor).toLong, TimeUnit.NANOSECONDS)
	96	Delay(clock, newDelay, runSpec.backoffStrategy.maxLaunchDelay)
95	97	}
96	98	}
97	99	}

View Options

src/main/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiterActor.scala

1	1	package mesosphere.marathon
2	2	package core.launchqueue.impl
3	3
4	4	import akka.actor.{ Actor, ActorRef, Cancellable, Props }
5	5	import akka.event.LoggingReceive
6	6	import com.typesafe.scalalogging.StrictLogging
7		import mesosphere.marathon.core.launchqueue.impl.RateLimiterActor.{ AddDelay, DecreaseDelay, DelayUpdate, GetDelay, ResetDelay, ResetDelayResponse, ResetViableTasksDelays }
	7	import mesosphere.marathon.core.launchqueue.impl.RateLimiterActor._
8	8	import mesosphere.marathon.state.{ RunSpec, Timestamp }
9	9
10	10	import scala.concurrent.duration._
11	11
12	12	private[launchqueue] object RateLimiterActor {
13	13	def props(
14	14	rateLimiter: RateLimiter,
15	15	launchQueueRef: ActorRef): Props =
16	16	Props(new RateLimiterActor(
17	17	rateLimiter, launchQueueRef
18	18	))
19	19
20	20	case class DelayUpdate(runSpec: RunSpec, delayUntil: Timestamp)
21	21
22	22	case class ResetDelay(runSpec: RunSpec)
23	23	case object ResetDelayResponse
24	24
25	25	case class GetDelay(runSpec: RunSpec)
26	26	private[impl] case class AddDelay(runSpec: RunSpec)
27	27	private[impl] case class DecreaseDelay(runSpec: RunSpec)
	28	private[impl] case class AdvanceDelay(runSpec: RunSpec)
28	29
29	30	private case object ResetViableTasksDelays
30	31	}
31	32
32	33	private class RateLimiterActor private (
33	34	rateLimiter: RateLimiter,
34	35	launchQueueRef: ActorRef) extends Actor with StrictLogging {
35	36	var cleanup: Cancellable = _
Show All 28 Lines
64	65	private[this] def receiveDelayOps: Receive = {
65	66	case GetDelay(runSpec) =>
66	67	sender() ! DelayUpdate(runSpec, rateLimiter.getDeadline(runSpec))
67	68
68	69	case AddDelay(runSpec) =>
69	70	rateLimiter.addDelay(runSpec)
70	71	launchQueueRef ! DelayUpdate(runSpec, rateLimiter.getDeadline(runSpec))
71	72
72		case DecreaseDelay(runSpec) => // ignore for now
	73	case DecreaseDelay(_) => // ignore for now
73	74
	75	case AdvanceDelay(runSpec) =>
	76	rateLimiter.advanceDelay(runSpec)
	77	launchQueueRef ! DelayUpdate(runSpec, rateLimiter.getDeadline(runSpec))
	78
74	79	case ResetDelay(runSpec) =>
75	80	rateLimiter.resetDelay(runSpec)
76	81	launchQueueRef ! DelayUpdate(runSpec, rateLimiter.getDeadline(runSpec))
77		sender() ! ResetDelayResponse
78	82	}
79	83	}

View Options

src/main/scala/mesosphere/marathon/core/task/update/impl/steps/NotifyRateLimiterStepImpl.scala

1	1	package mesosphere.marathon
2	2	package core.task.update.impl.steps
3	3
4	4	import java.time.OffsetDateTime
5	5
6	6	import akka.Done
7	7	import com.google.inject.{ Inject, Provider }
8	8	import mesosphere.marathon.core.condition.Condition
9	9	import mesosphere.marathon.core.group.GroupManager
10	10	import mesosphere.marathon.core.instance.update.{ InstanceChange, InstanceChangeHandler }
11	11	import mesosphere.marathon.core.launchqueue.LaunchQueue
12		import mesosphere.marathon.state.PathId
	12	import mesosphere.marathon.state.{ PathId, RunSpec }
13	13
14	14	import scala.async.Async._
15	15	import scala.concurrent.Future
16	16
17	17	class NotifyRateLimiterStepImpl @Inject() (
18	18	launchQueueProvider: Provider[LaunchQueue],
19	19	groupManagerProvider: Provider[GroupManager]) extends InstanceChangeHandler {
20	20
21	21	import NotifyRateLimiterStep._
22	22	import mesosphere.marathon.core.async.ExecutionContexts.global
23	23
24	24	private[this] lazy val launchQueue = launchQueueProvider.get()
25	25	private[this] lazy val groupManager = groupManagerProvider.get()
26	26
27	27	override def name: String = "notifyRateLimiter"
28	28
29	29	override def process(update: InstanceChange): Future[Done] = {
30		if (limitWorthy(update.condition)) {
31		notifyRateLimiter(update.runSpecId, update.instance.runSpecVersion.toOffsetDateTime)
32		} else {
	30	update.condition match {
	31	case condition if limitWorthy(condition) =>
	32	notifyRateLimiter(update.runSpecId, update.instance.runSpecVersion.toOffsetDateTime, launchQueue.addDelay)
	33	case condition if advanceWorthy(condition) =>
	34	notifyRateLimiter(update.runSpecId, update.instance.runSpecVersion.toOffsetDateTime, launchQueue.advanceDelay)
	35	case _ =>
33	36	Future.successful(Done)
34	37	}
35	38	}
36	39
37	40	@SuppressWarnings(Array("all")) // async/await
38		private[this] def notifyRateLimiter(runSpecId: PathId, version: OffsetDateTime): Future[Done] = async {
	41	private[this] def notifyRateLimiter(runSpecId: PathId, version: OffsetDateTime, fn: RunSpec => Unit): Future[Done] =
	42	async {
39	43	val appFuture = groupManager.appVersion(runSpecId, version)
40	44	val podFuture = groupManager.podVersion(runSpecId, version)
41	45	val (app, pod) = (await(appFuture), await(podFuture))
42		app.foreach(launchQueue.addDelay)
43		pod.foreach(launchQueue.addDelay)
	46	app.foreach(fn)
	47	pod.foreach(fn)
44	48	Done
45	49	}
46	50	}
47	51
48	52	private[steps] object NotifyRateLimiterStep {
49	53	// A set of conditions that are worth rate limiting the associated runSpec
50	54	val limitWorthy: Set[Condition] = Set(
51	55	Condition.Dropped, Condition.Error, Condition.Failed, Condition.Gone, Condition.Finished
	56	)
	57
	58	// A set of conditions that are worth advancing an existing delay of the corresponding runSpec
	59	val advanceWorthy: Set[Condition] = Set(
	60	Condition.Staging, Condition.Starting, Condition.Running, Condition.Created
52	61	)
53	62	}

View Options

src/test/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiterActorTest.scala

Show All 19 Lines
20	20
21	21	private[this] implicit val timeout: Timeout = 3.seconds
22	22
23	23	case class Fixture(
24	24	launchQueueConfig: LaunchQueueConfig = new LaunchQueueConfig { verify() },
25	25	clock: SettableClock = new SettableClock(),
26	26	instanceTracker: InstanceTracker = mock[InstanceTracker],
27	27	updateReceiver: TestProbe = TestProbe()) {
28		val rateLimiter: RateLimiter = Mockito.spy(new RateLimiter(launchQueueConfig, clock))
	28	val rateLimiter: RateLimiter = Mockito.spy(new RateLimiter(clock))
29	29	val props = RateLimiterActor.props(rateLimiter, updateReceiver.ref)
30	30	val limiterRef = system.actorOf(props)
31	31	}
32	32
33	33	"RateLimiterActor" should {
34	34	"GetDelay gets current delay" in new Fixture {
35	35	rateLimiter.addDelay(app)
36	36
Show All 21 Lines

View Options

src/test/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiterTest.scala

1	1		package mesosphere.marathon
2	2		package core.launchqueue.impl
3	3
4	4		import mesosphere.UnitTest
5	5		import mesosphere.marathon.test.SettableClock
6			import mesosphere.marathon.core.launchqueue.LaunchQueueConfig
7	6		import mesosphere.marathon.state.PathId._
8	7		import mesosphere.marathon.state.{ AppDefinition, BackoffStrategy }
9	8
10	9		import scala.concurrent.duration._
11	10
12	11		class RateLimiterTest extends UnitTest {
13	12
14	13		val clock = SettableClock.ofNow()
15	14
16			private[this] val launchQueueConfig: LaunchQueueConfig = new LaunchQueueConfig {
17			verify()
18			}
19
20	15		"RateLimiter" should {
21	16		"addDelay" in {
22			val limiter = new RateLimiter(launchQueueConfig, clock)
	17		val limiter = new RateLimiter(clock)
23	18		val app = AppDefinition(id = "test".toPath, backoffStrategy = BackoffStrategy(backoff = 10.seconds))
24	19
25	20		limiter.addDelay(app)
26	21
27	22		limiter.getDeadline(app) should be(clock.now() + 10.seconds)
28	23		}
29	24
30	25		"addDelay for existing delay" in {
31			val limiter = new RateLimiter(launchQueueConfig, clock)
	26		val limiter = new RateLimiter(clock)
32	27		val app = AppDefinition(id = "test".toPath, backoffStrategy = BackoffStrategy(backoff = 10.seconds, factor = 2.0))
33	28
34	29		limiter.addDelay(app) // linter:ignore:IdenticalStatements
35	30		limiter.addDelay(app)
36	31
37	32		limiter.getDeadline(app) should be(clock.now() + 20.seconds)
38	33		}
39	34
40	35		"resetDelaysOfViableTasks" in {
		timcharperUnsubmitted Done I'm finding myself wishing this test were clarified. What, exactly, is it testing" "does X thing after calling resetDelaysOfViableTasks". further, the `stillWaiting` variable name seems misleading. Presume `clock` is 00:00:00 `viable` has a back off strategy of 10 seconds, with a max launch delay of 60 seconds. We invoke a delay. This presumably puts the deadline to 00:00:10. `stillWaiting` has a back off strategy of 20 seconds, with a max launch delay of 70 seconds. We invoke a delay. This presumably puts the deadline to 00:00:20. We then advance 61 seconds. The clock is now 00:01:01. `viable`'s deadline is completely reset, so the current time is returned because it has been in a delayed state for more than 60 seconds. This is sensible. `stillWaiting`'s deadline is the same as the last time delay was called: 41 seconds ago. I might be misreading this, but that does not seem "stillWaiting". Is this a realistic simulation for how the component will actually be used? Would it be better to simulate stepping through time, calling delay multiple times, perhaps 10 seconds at a time, and then asserting in a frame that a delay call, followed by resetDelaysOfViableTasks ceases to have an effect once maxLaunchDelay is reached? I'm unsure if this would a better way to express the test; but, as it stands, the test is quite confusing.
		ichernetskyAuthorUnsubmitted Not Done Thanks for commenting on this. I agree that the variable names are confusing. The purpose of this test is to ensure that the existing delays as the time flies, get deleted.
41	36		val time_origin = clock.now()
42			val limiter = new RateLimiter(launchQueueConfig, clock)
43			val threshold = launchQueueConfig.minimumViableTaskExecutionDuration
44			val viable = AppDefinition(id = "viable".toPath, backoffStrategy = BackoffStrategy(backoff = 10.seconds))
45			limiter.addDelay(viable)
46			val notYetViable = AppDefinition(id = "notYetViable".toPath, backoffStrategy = BackoffStrategy(backoff = 20.seconds))
47			limiter.addDelay(notYetViable)
48			val stillWaiting = AppDefinition(id = "test".toPath, backoffStrategy = BackoffStrategy(backoff = threshold + 20.seconds))
49			limiter.addDelay(stillWaiting)
	37		val limiter = new RateLimiter(clock)
	38		val threshold = 60.seconds
50	39
51			clock += threshold + 11.seconds
	40		val app1 = AppDefinition(
	41		id = "viable".toPath,
	42		backoffStrategy = BackoffStrategy(backoff = 10.seconds, maxLaunchDelay = threshold))
	43		limiter.addDelay(app1)
52	44
53			limiter.resetDelaysOfViableTasks()
	45		val app2 = AppDefinition(
	46		id = "test".toPath,
	47		backoffStrategy = BackoffStrategy(backoff = 20.seconds, maxLaunchDelay = threshold + 10.seconds))
	48		limiter.addDelay(app2)
54	49
55			limiter.getDeadline(viable) should be(clock.now())
56			limiter.getDeadline(notYetViable) should be(time_origin + 20.seconds)
57			limiter.getDeadline(stillWaiting) should be(time_origin + threshold + 20.seconds)
	50		// after advancing the clock by (threshold + 1), the existing delays
	51		// with maxLaunchDelay < (threshold + 1) should be gone
	52		clock += threshold + 1.seconds
	53		limiter.resetDelaysOfViableTasks()
	54		limiter.getDeadline(app1) should be(clock.now())
	55		limiter.getDeadline(app2) should be(time_origin + 20.seconds)
58	56		}
59	57
60	58		"resetDelay" in {
61			val limiter = new RateLimiter(launchQueueConfig, clock)
	59		val limiter = new RateLimiter(clock)
62	60		val app = AppDefinition(id = "test".toPath, backoffStrategy = BackoffStrategy(backoff = 10.seconds))
63	61
64	62		limiter.addDelay(app)
65
66	63		limiter.resetDelay(app)
67	64
68	65		limiter.getDeadline(app) should be(clock.now())
69	66		}
70	67		}
71	68		}

View Options

src/test/scala/mesosphere/marathon/integration/setup/MarathonTest.scala

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Line(s)
94	94	"zk_connection_timeout" -> 20.seconds.toMillis.toString,
95	95	"zk_session_timeout" -> 20.seconds.toMillis.toString,
96	96	"mesos_authentication_secret_file" -> s"$secretPath",
97	97	"access_control_allow_origin" -> "*",
98	98	"reconciliation_initial_delay" -> 5.minutes.toMillis.toString,
99	99	"min_revive_offers_interval" -> "100",
100	100	"hostname" -> "localhost",
101	101	"logging_level" -> "debug",
102		"minimum_viable_task_execution_duration" -> "0",
103	102	"offer_matching_timeout" -> 10.seconds.toMillis.toString // see https://github.com/mesosphere/marathon/issues/4920
104	103	) ++ conf
105	104
106	105	val args = config.flatMap {
107	106	case (k, v) =>
108	107	if (v.nonEmpty) {
109	108	Seq(s"--$k", v)
110	109	} else {
▲ Show 20 Lines • Show All 762 Lines • Show Last 20 Lines

			Path	Packages
M			docs/docs/command-line-flags.md (2 lines)
M			src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueue.scala (3 lines)
M			src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueueConfig.scala (8 lines)
M			src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueueModule.scala (2 lines)
M			src/main/scala/mesosphere/marathon/core/launchqueue/impl/LaunchQueueDelegate.scala (2 lines)
M			src/main/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiter.scala (80 lines)
M			src/main/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiterActor.scala (10 lines)
M			src/main/scala/mesosphere/marathon/core/task/update/impl/steps/NotifyRateLimiterStepImpl.scala (35 lines)
M			src/test/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiterActorTest.scala (2 lines)
M			src/test/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiterTest.scala (45 lines)
M			src/test/scala/mesosphere/marathon/integration/setup/MarathonTest.scala (1 line)

Do a better job at maintaining task failure rate limiting values per RunSpec
ClosedAll Users
Actions

Details

Diff Detail

(๑′°︿°๑)

(๑′°︿°๑)

(๑′°︿°๑)

(๑′°︿°๑)

(๑′°︿°๑)

(๑′°︿°๑)

(๑′°︿°๑)

＼\ ٩( ᐛ )و /／

＼\ ٩( ᐛ )و /／

Revision Contents

Diff 4153

docs/docs/command-line-flags.md

src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueue.scala

src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueueConfig.scala

src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueueModule.scala

src/main/scala/mesosphere/marathon/core/launchqueue/impl/LaunchQueueDelegate.scala

src/main/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiter.scala

src/main/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiterActor.scala

src/main/scala/mesosphere/marathon/core/task/update/impl/steps/NotifyRateLimiterStepImpl.scala

src/test/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiterActorTest.scala

src/test/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiterTest.scala

src/test/scala/mesosphere/marathon/integration/setup/MarathonTest.scala

Do a better job at maintaining task failure rate limiting values per RunSpecClosedAll UsersActions

Details

Diff Detail

(๑′°︿°๑)

(๑′°︿°๑)

(๑′°︿°๑)

(๑′°︿°๑)

(๑′°︿°๑)

(๑′°︿°๑)

(๑′°︿°๑)

＼\ ٩( ᐛ )و /／

＼\ ٩( ᐛ )و /／

Revision Contents

Diff 4153

docs/docs/command-line-flags.md

src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueue.scala

src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueueConfig.scala

src/main/scala/mesosphere/marathon/core/launchqueue/LaunchQueueModule.scala

src/main/scala/mesosphere/marathon/core/launchqueue/impl/LaunchQueueDelegate.scala

src/main/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiter.scala

src/main/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiterActor.scala

src/main/scala/mesosphere/marathon/core/task/update/impl/steps/NotifyRateLimiterStepImpl.scala

src/test/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiterActorTest.scala

src/test/scala/mesosphere/marathon/core/launchqueue/impl/RateLimiterTest.scala

src/test/scala/mesosphere/marathon/integration/setup/MarathonTest.scala

Do a better job at maintaining task failure rate limiting values per RunSpec
ClosedAll Users
Actions