<br><br><div class="gmail_quote">On Mon, Mar 31, 2008 at 8:24 AM,  <<a href="mailto:michael.dillon@bt.com">michael.dillon@bt.com</a>> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<div class="Ih2E3d"><br>

> Here is a little hint - most distributed applications in<br>

> traditional jobsets, tend to work best when they are close<br>

> together. Unless you can map those jobsets onto truly<br>

> partitioned algorithms that work on local copy, this is a<br>

> _non starter_.<br>

<br>

</div>Let's make it simple and say it in plain English. The users<br>

of services have made the decision that it is "good enough"<br>

to be a user of a service hosted in a data center that is<br>

remote from the client. Remote means in another building in<br>

the same city, or in another city.</blockquote><div><br>Try reading for comprehension. The users of services have made the decision that it is good enough to be a user of a service hosted in a datacenter, and thanks to the wonders of AJAX and pipelining, you can even get snappy performance. What the users haven't signed up for is the massive amounts of scatter gathers that happen _behind_ the front end. Eg, I click on a web page to log in. The login process then kicks off a few authentication sessions with servers located halfway around the world. Then you do the data gathering, 2 phase locks, distributed file systems with the masters and lock servers all over the place. Your hellish user experience, let me SHOW YOU IT.<br>

 <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>

<br>

Now, given that context, many of these "good enough" applications<br>

will run just fine if the "data center" is no longer in one<br>

physical location, but distributed across many. Of course,<br>

as you point out, one should not be stupid when designing such<br>

distributed data centers or when setting up the applications<br>

in them.</blockquote><div><br><br>Other than that minor handwaving, we are all good. Turns out that desining such distributed datacenters and setting up applications that you just handwaved away is a bit harder than it looks. I eagerly await papers on distributed database transactions with cost estimates for a distributed datacenter model vs. a traditional model.<br>

 </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>

<br>

I would assume that every data center has local storage available<br>

using some protocol like iSCSI and probably over a separate network<br>

from the external client access. That right there solves most of<br>

your problems of traditional jobsets. And secondly, I am not suggesting<br>

that everybody should shut down big data centers or that every<br>

application<br>

should be hosted across several of these distributed data centers.</blockquote><div><br>See above. That right there doesn't quite solve most of the problems of traditional jobsets but its kind of hard to hear with the wind in my ears.<br>

 </div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>

There will always be some apps that need centralised scaling. But<br>

there are many others that can scale in a distributed manner, or<br>

at least use distributed mirrors in a failover scenario.</blockquote><div><br>Many many others indeed. <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

<div class="Ih2E3d"><br>

> No matter how much optical technology you have, it will tend<br>

> to be more expensive to run, have higher failure rates, and<br>

> use more power, than simply running fiber or copper inside<br>

> your datacenter. There is a reason most people, who are<br>

> backed up by sober accountants, tend to cluster stuff under one roof.<br>

<br>

</div>Frankly I don't understand this kind of statement. It seems<br>

obvious to me that high-speed metro fibre exists and corporate<br>

IT people already have routers and switches and servers in the<br>

building, connected to the metro fiber. Also, the sober accountants<br>

do tend to agree with spending money on backup facilities to<br>

avoid the risk of single points of failure. Why should company A<br>

operate two data centers, and company B operate two data centers,<br>

when they could outsource it all to ISP X running one data center<br>

in each of the two locations (Company A and Company B).</blockquote><div><br>I guess I can try to make it clearer by example: look at the cross-sectional bandwidth availability of a datacenter, now compare and contrast what it would take to pull it apart by a few tens of miles and conduct the cost comparison.<br>

<br>/vijay<br><br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>

<br>

In addition, there is a trend to commoditize the whole data center.<br>

Amazon EC2 and S3 is not the only example of a company who does<br>

not offer any kind of colocation, but you can host your apps out<br>

of their data centers. I believe that this trend will pick up<br>

steam and that as the corporate market begins to accept running<br>

virtual servers on top of a commodity infrastructure, there is<br>

an opportunity for network providers to branch out and not only<br>

be specialists in the big consolidated data centers, but also<br>

in running many smaller data centers that are linked by fast metro<br>

fiber.<br>

<br>

--Michael Dillon<br>

</blockquote></div><br>