Estimating Transition Probabilities

William Waites ww at styx.org
Fri Sep 6 19:34:03 UTC 2002


>>> "sgorman" ==   <sgorman1 at gmu.edu> writes:

    sgorman> Also   it  might  be   easier  to   calculate  transition
    sgorman> probabilities by summing across  the rows of the adjaceny
    sgorman> matrix then dividing the row components by the sum.

I'm   not  sure    that  that   works.  That would   cause systems not
adjacent to i to contribute to the probability that a packet goes from
i  to j  in  one  step, if  I understand  correctly. There  is  also a
hidden  imbedded Markov chain in my formulation.

Firstly, to avoid confusion with terminology, i am using:

A_i  - adjacency  density vector  --  the number  of  systems adjacent
       to  i.
C_ij - connectivity  matrix -- representing presence or  absence of an
       interconnect  between i  and j  as  a 1  or 0.   C_ij does  not
       represent a Markov process.
P_ij - transition probability matrix -- probability that a packet, now
       in i will end up in j next. 

We have,

        A_i = Sum_j C_ij

In making P_ij, i think it  is necessary to calculate according to the
number  of interconnections of  j --  how big  the neighbour  is. This
looks right  intuitively: if i is  a smaller, local isp  or an non-isp
organization they will tend to have a few upstream providers and maybe
some  private peering.  The upstreams  will  tend to  be more  heavily
interconnected and  peered and their  component of A_i will  be larger
than that  of   the  private peering, or more of  your packets will go
to an upstream than to your  peers.

The  assumption may  or may  not  be valid;  it is  difficult to  know
without actual  traffic statistics for a  good sample of  links on the
internet. If  we had a close  complete data set  of traffic statistics
for the internet measured at the same instant, it would be possible to
calculate the P_ij directly:  

               pps on link connecting i and j
        P_ij = ------------------------------
               total pps leaving i

but this data is not available available in general.

(aside:  I belive  that because  of the  scoping property  of Little's
Formula this approach would also  be  valid for analyzing  packet flow
within   an  AS   where  i   and  j  represent   each  router  indexed
arbitrarily)

Consider
                             C_ij A_j
          P_ij = lambda_i --------------
                          Sum_k C_jk A_k

                             Sum_k C_ij C_jk
               = lambda_i ---------------------
                          Sum_k Sum_l C_jk C_kl

where lambda_i  is a normalizing  factor appropriate to make  the sums
across each row of P_ij equal to 1.
             
The product of C_ij with A_j  is necessary since we don't want to take
into account the  adjacency density of ASj if we  are not connected to
it -- it doesn't  matter how big it is, if i is  not connected to j, a
packet  has zero  chance of  making the  transition directly.  This is
taken care of by the fact that C_ij == 0 in that instance.

Likewise the right  inner product  of C_jk with A_k in the denominator
is  necessary  for the  same  reason, except  now  two  hops out.  Non
existent  links  are  never  taken  into account  in  calculating  the
transition probabilities. 

You might say that the  numerator is skewed towards the current state,
and the denominator  is skewed towards the next  state.  This suggests
that there is  an implicitly imbedded Markov chain  -- the denominator
used in calculating  the probability of going in one step  from i to j
takes into  account information  about what will  happen after  at the
next step when it leaves j.

-w
-- 
William Waites <ww at styx.org>
finger ww at styx.org for PGP keys
Idiosyntactix Research Laboratories
http://www.irl.styx.org -- NetBSD: ça marche!



More information about the NANOG mailing list