Plasmafs_and_hdfs


Feature comparison of PlasmaFS and HDFS

HDFS is another user-space filesystem that was originally developed for map/reduce.

Feature				PlasmaFS		HDFS
---------------------------------------------------------------------------
Supported blocksizes		any			any
				recommended 64K-1M	recommended >= 64M

Blocksize can be set for
  each file separately		no			yes

System can allocate blocks
  in contiguous regions		yes (all blocks are	no (each block is a
     				stored in a single	separate file)
				file)

Number of datanodes is
  limited by RAM in namenode	yes			yes

Number of files is limited
  by RAM in namenode		no			yes

Replication can be set for
  each file separately		yes			yes

Client communicates directly
  with datanodes		yes			yes

Block checksums			no			yes

Random read access to files	yes			yes

Random write access to files	yes			no
				(blocks can only be	(at most, files can
				replaced but not	be appended to after
				overwritten) 		creation, and are other-
							wise immutable)

Directory hierarchy		yes			yes

Symbolic links			yes			yes

POSIX file semantics		yes (few exceptions,	no
      	   			see [1] below)

Authentication system		yes			partially

Encrypted data communication	optional		partially

Authorization system		yes			yes

Several namenode operations
  can be bundled in an
  atomic transaction		yes			no

Accesses to file contents
can have ACID semantics		yes			no

Namenode crashes can lead to
  data loss			no (2-phase commit)	yes
       	     	       		
Datanode crashes are handled
  automatically (fail-over)	yes			yes

Namenode crashes are handled
  automatically (fail-over)	not yet			no
  				(but planned)
				so far: auto-selection
				of live coordinator at
				startup time

Datanode configuration can
  be changed w/o restart
  (e.g. add node, del node)	yes			no

Namenodes profit from SSDs	high			low

Filesystem can be mounted	yes (NFS bridge)	no
							(fuse? unclear)

Rebalancing    	  		not yet			yes
				(but planned)

Communication to local
  datanode servers via shared
  memory			yes			no

Primary access method		SunRPC from any		ad-hoc protocol
	       			language    		(undocumented)

Clients available		Ocaml			Java
				Access from any
				language via NFS


1 POSIX semantics: PlasmaFS supports not only random reads and writes, but also more complicated aspects of POSIX. In particular:

There are, of course, also deviations and weaknesses: There are also some points where PlasmaFS implements much more than POSIX demands: