[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: snmpconf-pm-04 notes


  Thanks a lot (!) for the extensive comments. I'll try to finish my
reply to your second note soon.

My comments are inline.

Wes Hardaker wrote:
> I haven't quite finished reading it, but I thought I'd jot these down
> for as far as I've gotten.  I'll try to mark the more serious issues
> (as opposed to editorial ones) with a '***'.
> 1)  section 4:
>     "Policies always express a notion of:
>        if (an object has certain characteristics) then (apply operation to
>        that object)"
>     should be replaced with:
>     "Policies always express a notion of:
>        if (an object has certain characteristics) then (apply an operation(s) to
>        an object)"                                            ^^
>        ^^
>   Operations can apply to objects other than the one with the
>   characteristic in question.  Now, since an action can contain
>   multiple operations, the following would probably be the best, IMHO:
>     "Policies always express a notion of:
>        if (an object has certain characteristics) then (apply a set of
>        operation(s) to a set of objects)"

What if it said:
"Policies are intended to express a notion of:"
and then we could use the original pseudo-code, which is designed to
show the intent of the policy-based management model rather than the
limits of what a *could* be done.

I think the "intent" also extends to:
... (apply operation to that object or a related object)
for example: "for every frame relay circuit with xxx characteristics,
apply config xxx to its containing interface/port"

Do you think this should be mentioned in this section?

> 2) section 4:
>   roles, whereas the percent utilization of a link would not be.
>      ->
>   roles, whereas "the percent utilization of a link" would not be.
>   (the other two examples in the paragraph are quoted)


> 3) section 4:
>   Roles are human defined strings that can be referenced by policy
>      ->
>   Roles in this model are human defined strings that can be referenced by policy


> 4) section 4:
>   Repeated use of the word 'download' is used to say that the manager
>   downloads policies to the managed devices.  IMHO, this is
>   "uploading".

Very interesting. You probably say uploading because the code is being
"pushed" from here to there. I say it is downloading because the code is
going from server to client (central to distributed).

I'm not sure I agree that it is "uploading" but it's clear that it's not
clear. I'm thinking using the word "install" instead. What do you think?

> ***
> 5) section 4:
>   There are security issues with not uploading policies to a managed
>   object until they are discovered.  This issue comes up repeatedly in
>   the document.  During the time that the policy is not in place it
>   obviously can't be enforced (which could be serious if financial or
>   highly secure information was transmitted (EG) during that time
>   window).  In theory, you'd think, that this shouldn't happen that
>   often and the window would be small.  But what if the
>   pmNewRoleNotification isn't received by the management station?

I agree. I'll make some mention of this as appropriate (at least in the
security section). How's this text?:

-- Note that using this algorithm to avoid installing "unnecessary"
-- policies may result in delays in having the policy available when
-- the policy becomes necessary. This delay could become extensive if
-- an interruption of communications prevents the notification from
-- being delivered and/or the policy from being downloaded, causing
-- the sytem to not be in compliance with policy for a period of
-- time. In particular, if the policy is enforcing security rules,
-- this could open up security vulnerabilities during this period of
-- time.

> 6) 5.1
>   therefore they can be modeled as one element type.
>    ->
>   therefore they can be defined in this model as one element type.
> 7) 5.2
>   filter the last time it was checked (or it was a newly-
>    ->
>   filter the last time it was checked (or it is a newly-
>   and just below it:
>   the last check, it will remain in the set of elements that
>   will whose policyAction will be run within the
>     ->
>   the last check, it will remain in the set of elements
>   whose policyAction will be run within the

Both are fixed.

> 8) 5.3
>    accessible from accessor functions, no state is remembered
>    from the policy filter evaluation, nor from the previous
>    filter/action invocation of this element nor from the previous
>    invocation of the policy filter or action). If any syntax or
>      ->
>    accessible from accessor functions, no state is remembered
>    from the policy filter evaluation, nor from the previous
>    filter/action invocation of this element. If any syntax or


> ***
> 9) 5.3 and others
>    If syntax or processing errors occur, the action will terminate
>    immediately for this element.  I think failures need to be dealt
>    with in a better way.  The document repeatedly references failing
>    actions and that processing stops.
>    First and most importantly, I think a failure notification needs to
>    be sent out (if configured to do so), as there are security
>    implications with policies that fail to run properly.

For each policy, there's a rollup of the total number of instances of
the policy that are failing now (a gauge, pmPolicyAbnormalTerminations).
Also, when I send an updated draft next week, check out
pmTrackingPolicyToElementInfo which contains info about errors and
exceptions on a per-execution-context basis (i.e. per policy/element
pair). Of course, there's also the debugging table.

So we've made it easy to poll but haven't achieved sub-second
notification (only possible with a notification). And it's here that I'm
a bit reluctant to say that a notification is a requirement. To be
honest, I really don't have a sense of this yet. One more notification 
isn't that big a deal but I'm most worried about getting into ratholes
about how we can make sure that floods of notifications don't occur.

All we would absolutely need would be to have a notification that is
sent whenever the abnormalTerminations gauge goes from zero to non-zero
and no more than 1 notification per 60 seconds (or some similar
non-configurable constant). If we could keep it simple like this it
might be worth it.

>    Second, having a policy die in mid stream is a pretty serious event
>    since half of the operations (or so) will have succeeded before the
>    error occurred (yet another security consideration).  I'm not
>    saying there is a way around this (there isn't), but it should be
>    noted somewhere (IMHO).
> ***
> 10) I certainly understand the reason for not including function
>     definitions, but the code duplication issues resulting from this
>     are going to be a pain.
> 11) section 6: define BNF & EBNF somewhere (or state a reference to
>     its definition).


> 12) section 6:
>     "The use of comments and newlines are allowed and encouraged
>     where they will promote readability of policy code."
>     ^^^^^^^^^^^^^^^
>     Oh sure, but I can't make comments with snide remarks in them?
>     How about:
>     "The use of comments and newlines are allowed and encouraged
>     in order to promote readability of policy code."


> 13) section 8.1:
>         Copies the value of b to a. After the copy a and
>         b are equal but do not share storage.
>     Compilers optimized for storage space could simply make a
>     reference until a data request is made...


> 14) section 8.1:
>     "Index (with range checking): a[i]"
>     What happens when its out of range?
> * (it's almost important)

Yes, I agree it's important. I'll specify.

> 15)
>    If getint() is retrieving a unsigned int, I suggest calling it
>    getunsigned() or getuint() instead.

Check this out in light of the loosely-typed variables in the next

> 16) repeatedly, starting in
>      The agent will retrieve the instance in the same SNMP context
>      in which the element resides.
>    What exactly does this mean?  In the SNMP notion of contexts, an
>    item can exist in multiple contexts.

It's supposed to mean the context in which the element was discovered
(by the pmElementTypeRegTable).

> 17) It's repeatedly stated that oids are specified as dotted-string of
>     numbers, but doesn't say what happens if the oid can't be parsed.
>     Action stops, like everything else?

Yes. I'll say if it isn't of the form below, it is a run-time-exception.

  integer [ . integer ]

> 18)
>     the optional argument "value" should be dropped (as someone else
>     pointed out).
>      a) C has no optional arguments
>      b) is not needed.


> ***
> 19) why have both longs and ints if they are the same size?  Just drop
>     "long"s, or make them 64 bits and drop "long longs"

While I agree, this has been overtaken by events. Check out the next

> ********
> 20) repeatedly:
>      Note that no actual SNMP PDU needs to be generated and parsed
>      when the policy MIB module resides on the same system as the
>      managed elements.
>     If this is the case, I can write policy scripts that will operate
>     on objects that I don't otherwise have permission to modify?  Is
>     there *no* access control checking going on here?  The
>     DISMAN-EVENT-MIB handles this by using the authentication
>     semantics of the person that performed the set operation putting
>     it into existence.  Maybe something similar should be done here?

Exactly. Access-control checking MUST be done even if no PDU is
generated. I've had a long-standing todo to write the appropriate text.
The plan is to lift as much text as appropriate from the disman event

> ***
> 21) 11.1.2
>     readVarbind() can return bogus values if nothing was ever written
>     to that slot location before.  This goes against everything
>     previously in the document where the default action was to stop
>     processing for situations like that.  Undefined values are
>     probably a very bad thing in a policy script.

This has been changed.