[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: snmpconf A non-trivial policyAction example





Juergen Schoenwaelder wrote:

> >>>>> Steve Waldbusser writes:
>
> Steve> This API won't work in our environment because agents don't
> Steve> have MIB compilers and therefore don't know the type of a
> Steve> variable. For example, in the maxDesiredEntries varbinds above,
> Steve> it's not clear whether to send the integer 1000 or the 4
> Steve> character string "1000".
>
> Sure, I can write the same script using OIDs and having type
> information in there as well. The Tnm API allows you to do both.  I
> decided to go with the short version since you used descriptors rather
> than OIDs in your example as well. :-)

Sorry, I assumed that the API didn't specify the type since you didn't
specify the type in the example.

As a side note, the use of descriptors in my examples is part of a
convention described and encouraged in the PM draft in which humans read and
write scripts with descriptors and then management stations translate them
into dotted decimal before shipping them to agents.

Another side note after looking at the TNM API: I see there is some
optionality allowed for the parameters of varbind lists, but I'm surprised
to see the middle parameter of the 3-tuple be omitted. It would seem to be
very difficult to parse correctly. Was this just a mistake in your example?

> >> This example makes it very obvious to me that actions will be
> >> complex over time and that a mechanism is needed to write
> >> "procedures" (like the one above) which you can call and reuse in
> >> actions. (I can easily imagine cases where a policy rule action
> >> instantiates many rows and the hlMatrixControlEntry is just one of
> >> them.)
>
> Steve> Interesting, I had the opposite reaction. I was struck by the
> Steve> fact that the function was well within a manageable size and
> Steve> that no functional decomposition seemed obvious.
>
> We are working on a language which will have primitives on a much
> higher level, e.g. a primitive to 'ensure that the following row
> exists in a table'.  Your code (and my code as well) was more part of
> the assembler code for what I consider to be a primitive in a decent
> policy language. But I know, we have different opinions on this.

I'm not sure we differ at all on this. If you look at the type of routines
I've added to the library, we seem to be on the same wavelength (i.e.
setRowStatus, searchColumn). In other words, I believe that there are
several high-level library routines that can immensely simplify script
writing.

One thing I brought up at the meeting was that it may be possible to further
simplify the addition of rows with a library function specifically designed
for this purpose. I may make a proposal soon.


> >> The fix is flawed as well since the oid in the scratchpad does not
> >> say which SNMP agent the action is executed on.
>
> Steve> This is handled automatically by the execution environment
> Steve> which knows the address of "this element".
>
> Address? EngineID? EngineID + ContextName? Is it specified anywhere
> what it is?

       "The agent will retrieve the instance in the same SNMP context
        in which the element resides. Note that no actual SNMP PDU
        needs to be generated and parsed when the policy MIB module
        resides on the same system as the managed elements."

This text can stand a bit of improvement, but I think it is basically clear.
Also, note that the last published draft has this text for 5 of the 6 SNMP
functions but doesn't have it for the snmpsend function. You'll see this fix
in the next draft.


> >> I am also concerned about the error handling - what happens if
> >> e.g. snmpsend(0, 5, OP_SET); fails with an SNMP error? Will the
> >> policy repeat in a loop to fire this action regularily??
>
> Steve> Yes, and I think this is desired behavior. A policy is trying
> Steve> to enforce a certain behavior. If the enforcement fails
> Steve> temporarily I think that it should continue to try. In this
> Steve> case, the maxLatency won't be too high (maybe run once per
> Steve> hour) and the (computational/operational) cost of failing is
> Steve> low. In another circumstance where the cost of failing was
> Steve> high, we could set something in the scratchpad to disable
> Steve> enforcement on this interface.
>
> There are errors where retrying simply does not make any sense (e.g.
> a VACM configuration which does not give you access to the target).
> Sure, it is much easier to ignore errors. But my experience is that
> systems that ignore errors are not useful once something goes wrong.
> Probably another point where we just have very different opinions.

I hope it was clear that I was just talking about this RMON2 example. There
are certainly situations where the cost of failure is high (too much CPU or
other operational impact) or where a remedial operation is desired. If error
recovery is mandated, then it is easy to code it in. I just don't feel it is
warranted in this case.

Nevertheless, I'm not sure your VACM example would make me change my mind.
>From an operational point of view, if and when the VACM configuration on an
interface was fixed, I'd like the alMatrix entry to be created because it
moves me closer to my policy which is "all trunk ports are monitored with
alMatrix".

If we were to disable a policy on error (never to retry again), would a
momentary misconfiguration (of security, say) cause the element to forever
be out of policy? I don't think this is the right behavior.

The issue of error handling comes up when (1) failure is expensive or (2)
recovery was an option.

(1) An interesting example of expensive failure is when a log message is
generated whenever an alMatrix creation fails. The cost here is operational
(confusion, time, etc), not computational (CPU, disk, etc). In such a case,
I'd want to make sure the policy didn't retry more than once a day or so
after failure. I could accomplish this with a scratchpad variable that
recorded the last failure time and didn't try again within some window of
that time.

(2) Sometimes recovery is an option when it might be possible to modify the
request and try again or use another mechanism entirely. In such
circumstances I see nothing wrong with adding error recovery code. In this
RMON2 example, there isn't anything about the request that the agent could
complain about.


Steve