HomeMesosphereNo notifications. 4 unresolved issues.

clarify relationship between Condition and MesosTaskState
ClosedAll Users

Authored by timcharper on Dec 3 2016, 1:51 AM.

Details

Summary

I made these changes while working on https://phabricator.mesosphere.com/D263 . It’s an attempt to clarify the relationship between Condition and MesosTaskState, and use the type system more to describe this relationship.

Test Plan

n/a

Diff Detail

Repository
rMARATHON marathon
Lint
Automatic diff as part of commit; lint not applicable.
Unit
Automatic diff as part of commit; unit tests not applicable.
Changes from before your most recent comment are hidden. Show Older Changes
jeschkies requested changes to this revision.Dec 5 2016, 2:55 PM

Nice work! It's much cleaner. Overall we should have a different type for task states and instance conditions.

Also, a few weeks ago I've created a state machine to illustrate the state of an instance https://github.com/mesosphere/marathon/blob/master/task_status_fsm.svg. This might illustrate the terminal states better.

src/main/scala/mesosphere/marathon/core/condition/Condition.scala
90

Funny, it should actually be the other around. UnreachableInactive is lost and Unreachable is not. Then again it depends on how we interpret lost and what actions we take. Ideally we don't use the term at all any more since there used to be a TASK_LOST state which is gone with Mesos 1.1 and Marathon 1.4.

91–92

This is also a terminal state.

92–116

This is a terminal state.

src/main/scala/mesosphere/marathon/core/task/termination/impl/InstanceChangedPredicates.scala
7–12 ↗(On Diff #1039)

Why not use just Terminal? We seem to be inconsistent here.

This revision now requires changes to proceed.Dec 5 2016, 2:55 PM
timcharper updated this revision to Diff 1079.Dec 5 2016, 8:32 PM

integrate M.E.'s comments wrt Condition.UnreachableInactive Lost classification

meichstedt requested changes to this revision.Dec 5 2016, 8:35 PM
meichstedt added inline comments.
src/main/scala/mesosphere/marathon/core/condition/Condition.scala
18

I'd rather not have this as a member of condition after a second thought. Having it defined here implies that there's a correct conversion from condition to String. However, as discussed, this should be evaluated based on the given context (publishing an event might be different from serializing via the API e.g.)

Even more since this def has type String; it's not clear about the intended usage.

D263 introduces a conversion specific to the InstanceChangedEventsGenerator for that use case. I see two remaining usages of toReadableName: Formats (TaskWrites) and PodInstanceConversion. Maybe you can change these to specific conversions as well? WDYT about a Map[Condition->TaskState] or Map[Condition->String]? (It's easier to forget about added condition, but we can work around that by testing via pattern matches.)

90

I agree with @jeschkies: Please don't use the term lost – and I think an extra trait is not needed. The semantics of Unreachable and UnreachableInactive depend on the context as well: when scaling, Unreachable is still considered Active while instances in UnreachableInactive are subject to be replaced. When performing health checks or readiness checks, both will be skipped.

src/main/scala/mesosphere/marathon/core/task/termination/impl/InstanceChangedPredicates.scala
7–12 ↗(On Diff #1039)

The KillService is used e.g. when stopping an application. If we don't consider Unreachable and Reserved as terminal in this context, a StopApplication deployment will not finish for a long time.

@timcharper can you add a comment to this def explaining why we don't use Terminal or Active?

src/main/scala/mesosphere/marathon/state/TaskFailure.scala
124–138

Note the TODO in HistoryActor: that currently extracts TaskFailures from both MesosTaskStatusUpdateEvent and InstanceChanged. I'd like to get rid of internal MesosTaskStatusUpdateEvent consumers, but I'm also not a big fan of condition.mesosTaskState. Maybe we need an optional trigger in InstanceChanged like we have in InstanceChangedEventsGenerator? That'd be a bigger change though.

This revision now requires changes to proceed.Dec 5 2016, 8:35 PM
timcharper updated this revision to Diff 1080.Dec 5 2016, 8:44 PM

compiler issues fixed

jeschkies added inline comments.Dec 6 2016, 1:59 PM
src/main/scala/mesosphere/marathon/core/task/termination/impl/InstanceChangedPredicates.scala
13–14 ↗(On Diff #1080)

Could you create issues for these TODOs? I'm afraid we forget otherwise.

timcharper added inline comments.Dec 6 2016, 5:33 PM
src/main/scala/mesosphere/marathon/core/condition/Condition.scala
18

I don’t believe we should implement one for Condition -> String since toReadableName has been expected to return a mesos.TaskState string all along. https://github.com/mesosphere/marathon/blob/2cda52a3cc9735a735755910ed9222263c108884/src/main/scala/mesosphere/marathon/state/TaskFailure.scala#L119

timcharper added inline comments.Dec 6 2016, 5:34 PM
src/main/scala/mesosphere/marathon/core/condition/Condition.scala
92–116

Failure implies Terminal.

timcharper added inline comments.Dec 6 2016, 5:37 PM
src/main/scala/mesosphere/marathon/core/task/termination/impl/InstanceChangedPredicates.scala
13–14 ↗(On Diff #1080)

I will

timcharper updated this revision to Diff 1112.Dec 6 2016, 6:35 PM

rebased patch and integrated comments

Github issue for Condition suspended TODO recorded: https://github.com/mesosphere/marathon/issues/4789

timcharper updated this revision to Diff 1115.Dec 6 2016, 6:53 PM

fix tests

jdef accepted this revision.Dec 6 2016, 8:11 PM
jdef added a reviewer: jdef.
jdef added a subscriber: jdef.

lgtm. @meichstedt WDYT?

timcharper updated this object.Dec 7 2016, 6:32 AM
aquamatthias accepted this revision.Dec 7 2016, 8:54 AM

@timcharper Thanks! One question regarding a comment - otherwise LGTM.

src/main/scala/mesosphere/marathon/core/instance/update/InstanceChangedEventsGenerator.scala
40

Is this a todo, left over or clarification?

src/main/scala/mesosphere/marathon/state/TaskFailure.scala
124–138

@meichstedt not part of this PR: Since we emit InstanceChange and MesosStatusUpdateEvent for a failing task, the HistoryActor will store the same error twice, or?

jeschkies accepted this revision.Dec 7 2016, 11:10 AM
jeschkies added inline comments.
src/main/scala/mesosphere/marathon/core/condition/Condition.scala
92–116

Ah, missed that.

meichstedt accepted this revision.Dec 7 2016, 12:32 PM

Thanks!

src/main/scala/mesosphere/marathon/state/TaskFailure.scala
124–138

@aquamatthias yes, I'm working on cleaning this up

This revision is now accepted and ready to land.Dec 7 2016, 12:32 PM
This revision was automatically updated to reflect the committed changes.
timcharper added inline comments.Dec 7 2016, 2:58 PM
src/main/scala/mesosphere/marathon/core/instance/update/InstanceChangedEventsGenerator.scala
40

clarification