Estimating Transition Probabilities
William Waites
ww at styx.org
Fri Sep 6 19:34:03 UTC 2002
>>> "sgorman" == <sgorman1 at gmu.edu> writes:
sgorman> Also it might be easier to calculate transition
sgorman> probabilities by summing across the rows of the adjaceny
sgorman> matrix then dividing the row components by the sum.
I'm not sure that that works. That would cause systems not
adjacent to i to contribute to the probability that a packet goes from
i to j in one step, if I understand correctly. There is also a
hidden imbedded Markov chain in my formulation.
Firstly, to avoid confusion with terminology, i am using:
A_i - adjacency density vector -- the number of systems adjacent
to i.
C_ij - connectivity matrix -- representing presence or absence of an
interconnect between i and j as a 1 or 0. C_ij does not
represent a Markov process.
P_ij - transition probability matrix -- probability that a packet, now
in i will end up in j next.
We have,
A_i = Sum_j C_ij
In making P_ij, i think it is necessary to calculate according to the
number of interconnections of j -- how big the neighbour is. This
looks right intuitively: if i is a smaller, local isp or an non-isp
organization they will tend to have a few upstream providers and maybe
some private peering. The upstreams will tend to be more heavily
interconnected and peered and their component of A_i will be larger
than that of the private peering, or more of your packets will go
to an upstream than to your peers.
The assumption may or may not be valid; it is difficult to know
without actual traffic statistics for a good sample of links on the
internet. If we had a close complete data set of traffic statistics
for the internet measured at the same instant, it would be possible to
calculate the P_ij directly:
pps on link connecting i and j
P_ij = ------------------------------
total pps leaving i
but this data is not available available in general.
(aside: I belive that because of the scoping property of Little's
Formula this approach would also be valid for analyzing packet flow
within an AS where i and j represent each router indexed
arbitrarily)
Consider
C_ij A_j
P_ij = lambda_i --------------
Sum_k C_jk A_k
Sum_k C_ij C_jk
= lambda_i ---------------------
Sum_k Sum_l C_jk C_kl
where lambda_i is a normalizing factor appropriate to make the sums
across each row of P_ij equal to 1.
The product of C_ij with A_j is necessary since we don't want to take
into account the adjacency density of ASj if we are not connected to
it -- it doesn't matter how big it is, if i is not connected to j, a
packet has zero chance of making the transition directly. This is
taken care of by the fact that C_ij == 0 in that instance.
Likewise the right inner product of C_jk with A_k in the denominator
is necessary for the same reason, except now two hops out. Non
existent links are never taken into account in calculating the
transition probabilities.
You might say that the numerator is skewed towards the current state,
and the denominator is skewed towards the next state. This suggests
that there is an implicitly imbedded Markov chain -- the denominator
used in calculating the probability of going in one step from i to j
takes into account information about what will happen after at the
next step when it leaves j.
-w
--
William Waites <ww at styx.org>
finger ww at styx.org for PGP keys
Idiosyntactix Research Laboratories
http://www.irl.styx.org -- NetBSD: ça marche!
More information about the NANOG
mailing list