[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: snmpconf-pm-04 notes





Wes Hardaker wrote:
> 
> >>>>> On Wed, 28 Feb 2001 07:20:53 -0800, Steve Waldbusser <waldbusser@nextbeacon.com> said:
> 
> Steve> pmPolicyAbnormalTerminations OBJECT-TYPE
> Steve>          "The number of elements that, in their most recent filter or
> Steve>          action execution, have experienced a run-time exception and
> Steve>          terminated abnormally. Note that if a policy was experiencing
> Steve>          a run-time exception while processing a particular element
> Steve>          but on a subsequent invocation it runs normally, this number
> Steve>          can decline."
> Steve>     ::= { pmPolicyEntry 12 }
> 
> Steve> Let's say we have 20 elements. If, when policy P processes
> Steve> those elements, the first iteration through them 3 of them
> Steve> failed, this object would be set to 7.
> 
> Ah, that makes more sense.  Sorry.  (up until you mix 3 and 7, which I
> think was a typo).

I keep trying to throw you off the scent, but you figured it out anyway
:-)

> Steve> What I'm trying to argue is that if we can guarantee an alert
> Steve> on the first 0 to 1 transition your pretty much done.
> 
> True, though if the code changed I'd be tempted to want another one
> but its certainly not necessary.
> 
> Now, if that trap goes un-received, though, then the broken script may
> not be noticed for a while.  I'd still be tempted to send them out
> every X number of seconds (defaulting to 60 minutes, say, as you would
> like).  If X is 0, only send one (note that this is a different
> configuration item than discussed below where throttling from toggling
> is discussed):

Let's only talk about informs.

> Steve> Regarding overreporting, we just need some protection from
> Steve> floods. I say just pick a number. I realize it should be higher
> Steve> than I originally said. Maybe 30-60 minutes. Why would you ever
> Steve> set it lower?
> 
> Never underestimate the desire for a sysadmin to want something
> different than what the MIB designer figured would be good enough.

Mostly I don't want to estimate *at all* now (underestimate or
otherwise). We'll learn more about the finer points of usage and address
them later. In the meantime we'll base it on reasonable engineering
design that makes sure we 'do no harm'.

> IMHO, everything should be configurable unless there is a good reason
> to do otherwise and I don't see one here.  The reason for not making
> it configurable is that you don't see a value other than yours being
> useful (and its doubtful you ever will, seeing as that its your choice)!

Statesmanlike as I am, I'm willing to let you choose!


Well, as I was going round and round on this, I came up with:
- Maybe we need a configurable time-threshold per policy
- Maybe we need to be able to shut up notifications from a policy that
  is broken on some elements but for which no fix is planned.
- Maybe we need to configure the "0" threshold (allow N to break?)
- Maybe we want a higher severity "N" threshold or "N%" threshold or
  "100%" threshold (as in failed on X% of elements).

Should EVERYTHING really be configurable? Let me really emphasize my
earlier statement about estimating user needs - I think it's too early
to tell what the needs will be for this feature. And I'm definitely
wishing we were hearing other's opinions and advice about this.


Maybe we should just defer this to the event MIB.


> Steve> In the unlikely case that your programmer has instant response
> Steve> time and fixes the problem in 10 minutes, why can't we just
> Steve> watch the error counters for another 50 minutes?

Something I forgot to throw in is that the solution I described is
called trap-directed polling which is a generally-accepted architectural
principle.

> You could, and I'm not saying otherwise.  I'm saying it should be
> configurable.  The type of interface I like when managing 10000 nodes
> is for them to watch themselves, or nearby nodes, and notify me when
> something goes wrong.  I'd rather not run software that continually
> monitors something if I don't need the data itself (I'm only looking
> for errors not collecting statistics).  I'm likely to fix code and
> forget about it (while working on something new) until another error
> notification comes my way from the same node again.
> 
> Steve> It's easy to add objects now. It's easy to add them later. But
> Steve> it's impossible to lower the cost after the fact.
> 
> It's also annoying to have written code to support one architecture
> that could have been done differently after the later addition.

Agreed. 

> >> (side note: I'd also really like to see multiple actions be available
> >> per policy rule, and ideally fall back actions as well.  Or can a code
> >> table call another code object in the same table? (hence making it
> >> possible to write a function X that calls Y and Z, which could also be
> >> called independently by another action).
> 
> Steve> We only have one action. Fall-back actions are available
> Steve> through the use of precedence:
> 
> Right, I'd like to see more than one "successful" action.  (Take a
> look at what we did in the draft-ietf-ipsp-ipsec-conf-mib if it ever
> gets published (it hasn't made it through ID processing yet))

I'll take a look.


Steve