mysidia at gmail.com
Thu Feb 20 03:44:22 UTC 2014
On Wed, Feb 19, 2014 at 2:06 PM, Jay Ashworth <jra at baylink.com> wrote:
> ----- Original Message -----
> > From: "Eugeniu Patrascu" <eugen at imacandi.net>
> My understanding of "cluster-aware filesystem" was "can be mounted at the
> physical block level by multiple operating system instances with complete
> safety". That seems to conflict with what you suggest, Eugeniu; am I
> missing something (as I often do)?
When one of the hosts has a virtual disk file open for write access on a
VMFS cluster-aware filesystem, it is locked to that particular host,
and a process on a different host is denied the ability write to the
file, or even open the file for read access.
Another host cannot even read/write metadata about the file's directory
Attempts to do so, get rejected with an error.
So you don't really have to worry all that much about "as long you don't
access the same files", although: certainly you should not try to, either.
Only the software in ESXi can access the VMFS --- there is no ability to
run arbitrary applications.
(Which is also, why I like NFS more than shared block storage; you can
conceptually use the likes of a storage array feature such as FlexClone
to make a copy-on-write clone of a file, take a storage level snapshot,
and then do a granular restore of a specific VM; without having to
restore the entire volume as a unit.
You can't pull that off with a clustered filesystem on a block target!)
Also, the VMFS filesystem is cluster aware by method of exclusion (SCSI
Reservations) and separate journaling.
Metadata locks are global in the VMFS cluster-aware filesystem. Only one
host is allowed to write to
any of the metadata -on the entire volume a- time, unless you have VAAI
VMFS extensions, and your storage vendor supports the ATS (atomic test
resulting in a performance bottleneck.
For that reason, while VMFS is cluster aware, you cannot necessarily have
a large number of cluster nodes,
or more than a few dozen open files, before performance degrades due to
the metadata bottleneck.
Another consideration is that; in the event that you have a power outage
which simultaneously impacts your storage array and all your hosts: you
may very well be unable to regain access to any of your files,
until the specific host that had that file locked comes back up, or you
wait out a ~30 to ~60 minute timeout period.
> -- jra
More information about the NANOG