Plasma_quickstart


Plasma Quickstart

This guide explains how to get a running system as quickly as possible. It also describes which parts of Plasma are needed for which types of applications:

Operating System

Working OS (tested):

Generally, Plasma requires a 64 bit CPU. It is not impossible that it runs in 32 bit mode, but there might be issues.

Whether it runs on other operating systems is unknown. There is a chance it could work on

but it is totally untested. Other systems will certainly not work.

If you are starting with Plasma, it is not recommended to try one of these untested systems. Stick to 64 bit Linux (all CPUs should work). Of course, if you are an experienced Plasma user, feedback is welcome which problems occur on which OS (and which not).

Build it

Before you can do anything, you need to build Plasma, and install the resulting libraries and binaries.

If you are using a cluster of machines, note that you need to install the libaries and binaries only on a single machine. We call this machine the operator node. Of course, the programs need finally also to be copied over to the other machines, but this process is called deployment, and is supported by different scripts.

Essentially, there are three options for the build:

You may ask why there are no "normal" way of getting Plasma, like a deb or rpm package. Plasma simply needs very recent prerequisites, which are not yet available in Linux distros. (Hopefully, this will change.)

The result of the build is that the software is installed under a certain path prefix <prefix>, especially:

When you use the GODI method for the build, there will also be unrelated software installed under <prefix> - this is just a side-effect of the build.

Things you should not do: Do not try to find "abbreviations" for the build. This creates more problems than are solved. For example, don't try to use the ocaml compiler that comes with your Linux distro. Ocaml libraries built with different versions of the compiler cannot be mixed, and attempts to do so lead to checksum mismatches.

What do you need for which application

Trying out map/reduce without PlasmaFS

Since Plasma-0.6, it is possible to run map/reduce jobs without PlasmaFS. The data files are just stored in the local Unix filesystem. Of course, you are then restricted to just a single computer. This mode especially exists for trying out map/reduce for the first time.

So, if this applies to you, you can skip the PlasmaFS deployment.

Remember that the map/reduce configuration file must explicitly disable PlasmaFS. E.g. if your map/reduce program is called my_prog, there is a configuration file my_prog.conf, and it must conform to:

netplex {
  namenodes {
    disabled = true;                       (* required *)
  };
  mapred {
    node { addr = "localhost" };           (* only one node "localhost" *)
    ...                                    (* other settings *)
  };
  mapredjob {
    ...                                    (* other settings *)
  };
}

Caveat: There are many configuration files that look similar. We refer here to the file configuring the map/reduce job.

Read more about map/reduce in these two documents:

Using map/reduce with PlasmaFS

In this case, you should read the instructions in Plasmafs_deployment. In short, you need

The deployment document explains this in detail. Note that you do not need to configure NFS support in PlasmaFS for just running map/reduce.

For running a map/reduce job, you need to know two PlasmaFS settings:

The map/reduce configuration file must then look like:

netplex {
  namenodes {
    clustername = "the name of the PlasmaFS cluster";
    node { addr = "namenode host:namenode port" };
  };
  mapred {
    ...                                    (* other settings *)
  };
  mapredjob {
    ...                                    (* other settings *)
  };
}

It is not necessary to configure anything on the computers running map/reduce tasks. They will automatically get the required settings together with the other task parameters.

Using PlasmaFS as network filesystem

This application allows you to store large files in a replicated way. Also, PlasmaFS is, to some degree, fault-tolerant, and gets you close to high availability. Finally, PlasmaFS can be configured to be highly secure.

This case is very similar to the previous application: read the instructions in Plasmafs_deployment.

Remember that there are several ways of accessing PlasmaFS:

The first three options use the PlasmaFS protocol to talk to the server nodes. In order to get access from a machine to the cluster, you need to install two things on this machine: The NFS bridge makes it even simpler to access the PlasmaFS files: You can simply mount the filesystem and use normal file access functions. You can read how to do this here: Plasmafs_nfs.