FCoE/CNA Deployment w/ Nexus 5K, HP 580s, QLogic

Mon Feb 27 15:22:24 UTC 2012

Hi Everyone!

I had several requests for more feedback on our FCoE experience, based on
my comments from a thread last week, so I'm writing here with a bit more
background on our project in hopes that it saves some pain for others :-).

I'm with a sizable health insurance provider in the mid-west, and we've
typically focused on technology vs. headcount as an overal strategy.  Based
on that, we upgrade much more often than some of our peers in the industry
because techology is still cheaper than long-term staffing costs.

Last fall, we were faced with an issue of both power and rack capacity
constraints in our primary datacenter, which is just three years old now.
As various ideas were on the table, which included taking out a section of
IT cubes to expand the DC, the most appealing idea was to consolidate our
server and network infrastructure into what was coined our "High Density
Row".

We transitioned from Cat6500s as access to a Nexus 5K deployment, using 5Ks
as both distribution and access for the new HD row.  We didn't like how
oversubscription is handled on 2K FEXs when it comes to 10G links, so for
the situation here all 5Ks made the most sense.  Our capacity needs
couldn't justify 7Ks and while they would have been cool to have, we didn't
want to blow money just because.

Our SAN is an EMC Symmetrix with Cisco MDS switches in between it and the
hosts (Fiber Channel).  In the new row, we deployed all hosts with CNAs
(converged net adapters), which combine both FCoE storage and network in a
single 10Gb connection.  Since FCoE was new to all of us, we use a phased
approach that the Nexus offered where we brough straight fiber channel
connections into our distibution layer 5Ks and used the Nexus' FCoE proxy
functionality to convert between true FC to FCoE.

>From the host perpsective, it was only aware of FCoE connectivity to the
Nexus.  VSANs had to be created on the Nexus to map back to the FC VSANs on
the MDS side, Virtual Fiber Channel (VFC) interfaces were created on the
Nexus side, and a few other settings had to be configured.

Overall though, the config wasn't huge, but the biggest hurdle for was that
as the network guys, we had to learn the storage side to be able to
properly set this up.  So new terms like WWN (world wide name), floggy
database, VSAN (a VLAN for storage), etc.  Also, on the Nexus side, you
have to enable the feature of FCOE, as Nexus OS is very modulular and
leaves most options disabled during the initial setup.

The painful part, which is probably what might be of most interest here, is
that we hit a very strange and catrastrophic issue specific to QLogic's
8242 Copper-based (twinax) CNA adapter.  As part of the burn-in testing, we
were working with our server team to simulate the loss of a
link/card/switch (all hosts were dual-connected with dual-CNAs to separate
5Ks).  We were using the Cisco branded twinax cabling and QLogic's 8242
card (brand new HP DL580s in this case, new card, new 5K, new cabling).
When a single link was dropped/diconnected PHYSICALLY (a shut/no shut is
not the same here), the host's throughput on BOTH storage and network went
to crap.

Our baseline was showing nearly 400MB/s on storage (raw disk IO) tests
prior to a link drop and  1-8 MB/s after!  This siutation would not recover
until you fully rebooted/power cycled the server.  We had the same results
accross every HP DL 580 tested, which was 5-6 of them I belive.  We
replaced CNAs, cables, and even moved ports across 5Ks.  It didn't matter
which cable, 5K, port, of card we used, all reacted the same!  The hosts
were all Windows 2008 Datacenter, simliar hardware, Nexus 5K on current
code, twinax cabling.

This situation led to a sev 2 w/ Cisco, the equivalant w/ HP, EMC, and
QLogic.  We used both the straight QLogic 8242 and the HP OEM'd version and
the results were identical.  QLogic acknowledged the issue but could not
resolve it due not being able to grab a hardware level trace of the
connection (required some type of test equipment that they couldn't provide
and we didn't have).

As part of our trail/error testing, we had our re-seller ship us the fiber
versions of the same QLogic cards, becuase we eventually got down to a gut
instinct of this being a copper/electrical anomoly.  That instict was
dead-on.  Switching to the fiber versions, with fiber SFPs on the 5K side
resolved the situation entirely.  We are now able to drop a link with NO
noticable degradation, back and forth, and eveyrthing is consistent again.

We originally went the twinax route because it was signifiantly cheaper
than the fiber, but in retrospect, as a whole, the danger posed was not
worth it.  You might ask, well... why would you intentially drop the
cable?  Think about a situation of doing a code upgrade on the 5K, since
it's not a dual-sup box, you physcailly go through a reboot to upgrade it.
That reboot right htere would have hosed our entire environment (keep in
mind, the HD row's intent was to replace a signifiant portion of our
production environment).  You could also have a HW failure on a 5K.  It
kind of defeats the point of all this redundancy if your throuhput goes to
hell when loosing a single path.  As our storage guys best put it "i'd
rather loose a path than have bad performance through it....based on how
things alert, I'd know right away if a path were down, but not if it were
severaly degraded."

Btw, we've been rock solid on the fiber-connected CNAs ever since.  We're
still using copper on our connections to HP blade chassis though, which go
to FLEX Fabric cards, as we couldn't produce the problem on those.  For
those wondering, we did rebuild several of the DL580s from scratch (all of
this was a new deployment, thankfully!), we also went through many
iterations of driver updates/changes/etc.

Lots of head-banging and teamwork eventually got us squared away!  This
situation is a good example of why network guys NEED to have a great
relationship with both server and storage guys (we're all really close
where I'm at).  Had there been tension/etc between the teams, this would
have been signifiantky harder to resolve.

Hope this helps, sorry for the long winded email :-), but I think those
interested will find it beneficial.

David.