An Easy way to build a server cluster without top of rack switches (MEMO)

Ken Chase math at sizone.org
Sat Feb 14 16:09:10 UTC 2015


We did similar way back in the day (2001?) when GBE switches were ridiculously
expensive and we wanted many nodes instead of expensive gear. The
(deplorably hot!) NatSemi 83820 gbe cards were a mere $40 or something however.

Uplink for loading data via NFS/control was the onboard FE (via desktop 8 port
Surecoms), but 2x GBE was used for inter-node.  GROMACS, a molecular modeller,
only talked to adjacent nodes, so we just hooked up a linear network
A-B-C-D-E-F-A in a loop.

With 40 nodes though, some nodes had 3 cards in them however to effectively
make two separate smaller cluster loops (A-B-C-A and D-E-F-D for eg) without
having to visit the machine and move cards around.

Perfectly reasonable where A talks to B and C or F only. A ridiculous concept
for A talking to C however. Latency on the network was our big thing for
GROMACS' speed of course, thus the GBE, so multihop would have been totally
anathema.

While our cluster ran about twice as slow per job (with no net gain in speed
beyond 16-20 nodes due to latency catching up with us) as the way more pricey
competing quote's infiniband-based solution, their total of 8 nodes were no
match for us running 5 jobs in parallel on our 40 nodes for the same cost :)

Considering the lab had multiple grad students in it, there was ample
opportunity for running multiple jobs at the same time - while this may have
thrashed the CPU cache (and increased our memory requirements slightly) in
terms of pure compute efficiency, the end throughput per dollar and
happiness-per-grad-student was far higher.

Feel free to trawl the archives on beowulf-l ca. 2001-2 for more details of
dirt cheap cluster design (or reply to me directly).

Here's some pics of the cluster, but please keep in mind we were young and
foolish. :)

http://sizone.org/m/i/velocet/cluster_construction/133-3371_IMG.JPG.html

/kc


On Fri, Feb 13, 2015 at 10:08:21PM +0000, Dan Eckert said:
  >I'm having a hard time seeing how this reduces cable costs or increases network durability.  Each individual server is well connected to 3-4 other servers in the rack, but the rack still only has two uplinks.  For many servers in the rack you're adding 3-4 routing hops between an end node and the rack uplink.
  >
  >Additionally, with only 2 external links tied to 2 specific nodes, you introduce more risks.  If one of the uplink nodes fails, you've got to re-route all of the nodes that were using it as the shortest path to now exit through the other uplink node -- the worst case in the example then increases from the original 4-hops-to-exit to now 7-hops-to-exit.
  >
  >As far as cable costs go, you might have slightly shorter cables, but far more complex wiring pattern -- so in essence you're trading off a small amount of cable cost for a higher amount of installation and troubleshooting cost.
  >
  >Also, using this layout, you dramatically reduce the effective bandwidth available between devices, since per-device links now have to be used for backhaul/transport in addition to device-specific traffic.
  >
  >Finally, you have to manage per-server routing service configurations to make this work -- more points of failure and increased setup/troubleshooting cost.  In a ToR switch scenario, you do one config on one switch, plug in the cables, and you're done -- problems happen, you go to the one switch, not chasing a needle through a haystack of interconnected servers.
  >
  >If your RU count is worth more than the combination of increased installation, server configuration, troubleshooting, latency, and capacity costs, then this is a good solution.  Either way, it's a neat idea and a fun thought experiment to work through.
  >
  >Thanks!
  >Dan
  >
  >
  >-----Original Message-----
  >From: NANOG [mailto:nanog-bounces at nanog.org] On Behalf Of NAOTO MATSUMOTO
  >Sent: Wednesday, February 11, 2015 11:32 PM
  >To: nanog at nanog.org
  >Subject: FYI: An Easy way to build a server cluster without top of rack switches (MEMO)
  >
  >Hi all!
  >
  >We wrote up TIPS memo "an easy way to build a server cluster without top of rack switches" concept.
  >
  >This model have a reduce switches and cables costs and high network durability by lightweight and simple configuration.
  >
  >if you interest in, please try to do yourself this concept  ;-)
  >
  >
  >An Easy way to build a server cluster without top of rack switches (MEMO) http://slidesha.re/1EduYXM
  >
  >
  >Best regards,
  >--
  >Naoto MATSUMOTO

-- 
Ken Chase - ken at heavycomputing.ca skype:kenchase23 +1 416 897 6284 Toronto Canada
Heavy Computing - Clued bandwidth, colocation and managed linux VPS @151 Front St. W.



More information about the NANOG mailing list