Request for a pointer - Linux modifying DSCP on replies?

Darren Bolding darren at
Fri Sep 4 01:27:21 UTC 2009

I wanted to go ahead and reply back with what I figured out.
The easiest way to solve this problem turned out to be to utilize netfilters
CONNMARK module, which sadly is not available in some older but still used
linux kernels.

Syntax for this is as follows:
# Set outgoing DSCP on connections to same as incoming DSCP
-A PREROUTING -m dscp --dscp 1 -j CONNMARK --set-mark 1
-A POSTROUTING -m connmark --mark 1 -j DSCP --set-dscp 1
-A PREROUTING -m dscp --dscp 2 -j CONNMARK --set-mark 2
-A POSTROUTING -m connmark --mark 2 -j DSCP --set-dscp 2

And this goes on for all 63 possible non-zero markings.

This seems to have had negligible performance or memory impact on some very
busy hosts, so it seems like a viable solution.


On Mon, Aug 17, 2009 at 4:08 PM, Darren Bolding <darren at> wrote:

> Steve,
> Perhaps it is outside the DS domain, and that is the issue.  It seems odd
> that the behavior with ICMP/Ping is different than that with TCP however.
>  Not sure which is technically correct, but I am going to follow up on some
> of the pointers I've gotten to try and learn that.  It just seems natural to
> me that connection oriented traffic would have the same markings on both
> sides of the conversation unless explicitly told otherwise.
> I would love to be able to mark the traffic at the edge of the DS domain- I
> do this at ingress from one location.  The challenge I am trying to solve is
> that the DS edge switch will not reliably know how to policy-route traffic
> unless it has been previously marked.
> To clarify, as in many other environments, we have stateful devices such as
> firewalls and load balancers.  I want to be able to route traffic
> that ingressed through one of these devices to egress through it as well.
>  This is entirely solvable by splitting equipment functionally (a cluster of
> servers and associated network equipment, real or virtual associated with a
> service) or by employing SNAT solutions.  However, for various reasons these
> solutions are not preferred in our environment, and I dare say I am not
> alone in that viewpoint.
> What I am trying to deploy now is a system where the stateful equipment (in
> this case a load balancer) has its traffic to the rest of the network tagged
> on ingress.  Since I am using Cisco 6500's with sup720's, I can classify and
> mark the traffic with a DSCP setting via PFC/DFC hardware.  I then inspect
> traffic at the layer-3 edge for the various pools of servers.  Depending on
> the DSCP marking of the packet, I change the next-hop.  Since this is
> implemented through an extended ACL for a route-map it is handled in
> hardware (a good thing).  Research shows that I can implement similar
> functionality in hardware on L3 switching gear from Juniper, Foundry, etc.
> so I am not boxing myself into a vendor.
> I don't believe Cisco supports using reflexive-acl's to apply policy
> routing, and even if they did, that would likely swamp our sup's CPU's, so I
> don't believe maintaining a stateful filter on the switch is viable.
> This all works as expected for Ping's and the ICMP replies.  It breaks down
> for TCP http/mysql connections.
> It sounds like the correct (per-spec) solution may be to have the Linux
> servers track the incoming connections DSCP setting and mark the outgoing
> packets related to those connections.  I am not at all certain this will not
> hit the servers CPU's more than desired or require additional
> connection-tracking resources than the ones we currently implement via
> iptables.
> Is there some other design option I am not considering?
> Thanks to those of you who have replied so far, it is at least a start down
> some additional paths of research for me!  It's been since the days of BSDI
> that I have been involved in system networking internals, so I have been at
> a loss who to even ask!
> --D
> On Mon, Aug 17, 2009 at 2:44 PM, Steve Miller <stmille at> wrote:
>> Would not the end station be considered to be outside of the DS
>> domain?  It does not necessarily make sense (to me) for reply packets
>> to be marked unless they are appropriate classified and marked on the
>> return path at the point they re-enter the DS domain.
>> I would imagine that iptables and the DSCP target would do what you
>> wanted, yes.  I'd consider classifying and marking traffic at whatever
>> switch you would consider to be at the edge of the DS domain
>> (connected to this server.)
>> -Steve
>> 2009/8/17 Darren Bolding <darren at>:
>> > I believe this is operational content, but may well be better asked
>> > somewhere else.  I would love to have a pointer to another list/website.
>> > I am looking to do some policy routing based on DSCP marking, and I have
>> > this all working inside the networking equipment.  I DSCP mark some
>> packets
>> > at ingress and I policy-route others based on ACL's matching those DSCP
>> > markings.  This should allow me to solve some problems in a rather
>> elegant
>> > manner, if I do say so myself.
>> >
>> > And this works fine for some things- I have verified that Ping's to a
>> host
>> > work as expected- the Ping shows up at the destination host DSCP marked,
>> and
>> > the ICMP reply leaves with the same DSCP marking.
>> >
>> > However, when I do this with apache and mysql connections (TCP 80/3306),
>> the
>> > incoming packets are marked, but the replies are not.
>> >
>> > My research into the subject doesn't seem to suggest there is a standard
>> for
>> > whether replies to a TCP connection are required to have the same DSCP
>> > marking, but it seems to make a lot of sense that they would.
>> >
>> > I've disabled iptables on the server host to no avail.  I've looked
>> around
>> > for an apache or Linux kernel setting and found nothing.
>> >
>> > At this point I'm looking for pointers- to a way to solve this issue, or
>> to
>> > a better place to ask.
>> >
>> > I've started investigating writing iptables rules to match incoming
>> > connections that have DSCP marking and explicitly mark response traffic,
>> but
>> > that seems, I don't know... wrong.
>> >
>> > Linux kernel we are using is 2.6.9-67.ELsmp.
>> >
>> > Any help or pointers would be appreciated!
>> >
>> > --D
>> >
>> > --
>> > --  Darren Bolding                  --
>> > --  darren at           --
>> >
>> --
>> Steve Miller, CCIE #23977 (R&S), RHCE
>> Key fingerprint = 5CE3 A789 4CF5 666F 5CD6  2A8E 3132 77C7 483F 5F9D
> --
> --  Darren Bolding                  --
> --  darren at           --

--  Darren Bolding                  --
--  darren at           --

More information about the NANOG mailing list