Module Mapred_streaming

module Mapred_streaming: sig .. end
Support for streaming


Streaming means that the task server does not execute the tasks internally, but starts subprocesses for this purpose. These processes can read stdin to get the input data, and have to write output data to stdout.

The following additional job configs are interpreted:

The job config task_files is very useful to install the executable for the map and reduce commands on the task nodes. E.g.:

       task_files = "my_command";
       map_exec = "./my_command -map arg1 arg2 ...";
       reduce_exec = "./my_command -reduce arg1 arg2 ...";
    

The working directory when starting the command is exactly the directory where the files are installed by the task_files directive.

The following environment variables are also set:

Stderr is redirected to a log file.
val job : unit -> Mapred_def.mapred_job
The streaming job.

The Plasma distribution comes already with a program that runs this job via Mapred_main: mr_streaming