Plasmafs_authentication



PlasmaFS Authentication

The question covered here is how accesses are secured in PlasmaFS.

PlasmaFS consists of a bunch of daemons running on several systems, and an open number of clients accessing the daemons. All the communication paths between these endpoints need to be secured in a reasonable way. As all communication is done via SunRPC, we can use the security options of this protocol to ensure that only permitted clients connect to servers, and that optionally even all SunRPC data is encrypted.

After establishing security for the RPC layer, there is the question how clients are identified for the filesystem, and which accesses are granted to them. We allow here that the filesystem user ID is different from the user ID in the RPC layer. We explain this idea in detail below.

The RPC layer

The RPC layer uses SCRAM (RFC 5802) for authentication, and optionally for integrity protection and privacy. The SCRAM method uses simple passwords which are safely checked by a challenge-response protocol. SCRAM is enabled for SunRPC via GSS-API - this gives us some additional flexibility, because it is relatively easy to switch the authentication mechanism (and e.g. enable Kerberos as an even more secure method).

We use only two user IDs, called "proot" and "pnobody":

The passwords for these two users IDs are installed in the etc/password_proot and etc/password_pnobody files on each node (with mode 600, so only the Unix user running the daemons can access these passwords).

You may ask why the "pnobody" user exists. Couldn't we also allow that clients connect anonymously? The purpose of this user ID is to keep foreign hosts out of the PlasmaFS network. Also, RPC messages can only be encrypted when a user/password pair exists, and we want encryption at least for some communication paths.

Btw., SCRAM uses an SHA1-based HMAC for authentation. For encryption, AES-128 is employed.

Example: The plasma command-line utility allows it to set the RPC user ID with the -auth switch.

$ plasma ls / -auth proot

This command authenticates as proot, and one has to enter the password for it. As proot is superuser, it is possible to list /.

$ plasma ls / -auth pnobody                      # fails

This command authenticates as pnobody, again by providing the password. This results in an EPERM error code, because pnobody is not allowed to perform any filesystem operation.

Any other user ID will not be able to successfully authenticate (Auth_failed). If you omit -auth, the plasma utility falls back to the default behavior, which is to ask the authentication daemon for help.

The filesystem

The filesystem generally implements POSIX semantics, with only a few exceptions (and a few generalizations). For file accesses, one needs to have a user ID and a group ID.

Note that users and groups are always handled as names, and never as numeric IDs!

Each file has an owner, expressed as the owning user and the owning group. The file mode bits determine who can access the files. There are no ACLs.

Some points:

Examples: The file mode bits and the owner is shown by a plasma ls:

$ plasma ls /
drwxr-xr-x gerd admin 0 2011-10-05 21:41 input 
drwxr-xr-x gerd admin 0 2011-10-05 22:09 log   
drwxr-xr-x gerd admin 0 2011-10-05 22:09 output
drwxr-xr-x gerd admin 0 2011-10-05 22:09 work  

You can change the mode bits with plasma chmod:

$ plasma chmod 777 /log

You can change the owning user and group with plasma chown:

$ plasma chown auser /work -auth proot
$ plasma chown :agroup /work
$ plasma chown auser:agroup /work -auth proot

Note that changes of the user is restricted to the superuser, hence we have to add -auth proot. The group can be set without that, provided the user is member of the group.

Impersonation

Impersonation is the process of becoming a regular user of the filesystem. There are three ways of setting the user and group for a session:

The impersonation is done by calling a special RPC procedure of the namenode.

Of course, the filesystem needs to know which users exist, and which groups are defined. For this reason, the files /etc/passwd and /etc/group also exist within PlasmaFS (stored in a database table). The versions in PlasmaFS have exactly the same format as those installed in /etc on each Unix system. One can only impersonate as a user defined in passwd and become only member of a group where group allows this. (These special files can be read and written with the plasma utility, see plasma admin_table.)

Pitfall: When authenticating as proot, nothing is said who will be the owner of newly created files. Because of this,

$ plasma mkdir /dir -auth proot                   # fails

will fail (code EINVAL). The switches -user and -group can be given to specify an arbitrary owner:

$ plasma mkdir /dir -auth proot -user foo -group bar

Authentication tickets

Basically, an authentication ticket is a random number which is used as key in a special access-control table in the namenode. The number (called "verifier") is connected with a user name, a group name, and a list of supplementary groups.

Of course, "proot" can create such tickets arbitrarily. Normal users can only create such tickets for themselves. This is useful for passing the current access rights to further processes which can even be running on different machines (and for map/reduce we need this feature).

Many PlasmaFS programs accept such tickets in the environment variable PLASMAFS_AUTH_TICKET.

The lifetime of the tickets is limited (but can be extended).

Example: One can request a ticket with plasma auth_ticket:

$ plasma auth_ticket
SCRAM-SHA1:cG5vYm9keQ==:eHh4:Z2VyZA==:Z2VyZA==:YWRt,YWRtaW4=,Y2Ryb20=,ZGlhbG91dA==,Z2VyZA==,bGlidmlydGQ=,bHBhZG1pbg==,bXl0aHR2,cGx1Z2Rldg==,c2FtYmFzaGFyZQ==:-7882205748013259769

The ticket contains various parts. Two of these are valuable login data: first, the ticket includes the password of pnobody. Second, the ticket includes a so-called verifier. These data are worth being protected, so:

Example: We get a ticket, and transfer the ticket via ssh to another machine, and run plasma as the same user. This works even if the Unix user on the other machine is different, and if there is no authentication daemon on the other machine.

$ PLASMAFS_AUTH_TICKET=`plasma auth_ticket` \
    ssh user@machine -o "SendEnv PLASMAFS_AUTH_TICKET" \
      <path>/plasma ls / -namenode ... -cluster ...

(Substitute <path> with the directory of the plasma command on machine, and provide the namenode and cluster options - there is usually no ~/.plasmafs on remote machines.)

The authentication daemon

This daemon can be reached via a Unix Domain socket in /tmp. Such sockets have the ability that they reveal who is connected as client, i.e. we can get the user and group ID of the client. The daemon creates an authentication ticket for this identity, which can then be used by the client for impersonation.

Essentially, this means one can access PlasmaFS without password on machines where this daemon is running. The authentication daemon is used by the plasma utility if there is neither the -auth switch nor the PLASMAFS_AUTH_TICKET variable contains a ticket.

Actually, the auth daemon is a broker who retrieves a new authentication ticket for the client, after it has checked the identity of the client.

Protecting the datanodes

The clients create separate connections to the datanodes, and the question is how these connections are protected.

On the RPC level, these connections authenticate as user "pnobody" for normal file I/O (and as "proot" for administrative operations).

The namenode.conf file includes a directive security_level. By default this setting is set to "auth" meaning that the client has to provide the "pnobody" password to connect to the datanode, but otherwise the data are unprotected. One can set this to:

Of course, enabling the "int" or even "priv" level makes data accesses a lot slower, and this is why this is disabled by default.

In addition to the protection on RPC level, there is a separate ticket system for authorizing data accesses, using special datanode tickets. Essentially, the namenode generates such tickets, and hands these out to the client. The client can only run the data I/O operations for which the client has permission, as evident from the ticket. The permissions are granted for each data block separately.

Conceptual limitations

The described authentication system has a few weaknesses:

Basically, the system is safe against external intrusion attempts, but has some issues when restricting the access rights of legitimate users to what these users should be allowed to do.

In addition to this, the current implementation may have further weaknesses or even large security holes. Please refer to the release notes for details! Also, there are configuration options affecting the level of security. For example, one can turn off encryption for datanode accesses.