Dataflow implementation in Java
DataRush is sufficiently sophisticated (or at least different) that understanding it takes several passes. I am writing a series of posts aimed toward exploring what DataRush is and is not. This should give the passing programmer a better feel for what DataRush might do for them and how it fits into the broader scheme of concurrent programming techniques.
So, back to the question at hand; is DataRush dataflow? The answer is obvious: that depends upon what "dataflow" is.
According to [CTMCP], dataflow behavior results when expressions contain unbound variables. When execution reaches such an expression, the program simply pauses, awaiting a value. If at some point in the future another thread binds a value to the variable, the program picks up where it left off. They call variables with these characteristics dataflow variables. A simple example in Oz (from page 60) follows:
local X Y Z in X=10 if X>=Y then Z=X else Z=Y end end
Note that X is declared, then immediately bound to a value. The purpose of the expression is to bind Z. However, that leaves Y unbound. From the expression above, we simply cannot tell what value Y should take and, unlike some other programming languages, Oz does not arbitrarily specify a default value.
So what good are dataflow variables? Well, they facilitate the concurrent technique of dataflow programming. Combined with a single-assignment store (variables may be bound at most one time), they lead to the nice property that it doesn't matter in what order we evaluate simultaneously executing expressions. You can view dataflow variables as one-to-many channels allowing the thread in which the variable is bound to send a message ("wake up, the variable's value is ready!") to any waiting threads.
Now you can certainly dig a little deeper into the various meanings of dataflow. You'll find the dataflow variable of [CTMCP] is a concurrent logic variable, which is a promise updated by unification. The name derives from the variable representing the promise of a value to come, which may be fulfilled by any thread within the program. Good old Java provides an interface and implementations for a related idea, a future, in java.util.concurrent.
You'll also find dataflow architecture refers to a non-von Neumann way of coordinating the processing of instructions in a computer. Dataflow architectures don't iterate across instructions with an instruction pointer but fire off units of computation only when their inputs become ready. Mapping these units of computation and their inputs to the threads and dataflow variables discussed above, you can see the relationship between the two: you aren't controlling the order of execution of computations but merely interrelating them by the data they reference.
And this (finally!) brings us to a discussion of what ways DataRush is an implementation of dataflow in Java. Dataflow variables are a rather implicit and clean way of constructing what amounts to channels passing data amongst disparate threads in your program. The dataflow of DataRush is far more explicit: you implement and connect nodes (like the threads of Oz or computation units of a dataflow architecture) by writing Java classes. The following snippet of code mimics the behavior of the Oz expression above:
public class BindProcess extends DataflowNodeBase {
private IntInput y;
private IntOutput z;
private int x;
public BindProcess(IntFlow source, int x) {
this.x = x;
y = newIntInput(source, "y");
z = newIntOutput("z");
}
public void execute() {
while (y.stepNext()) {
if (x >= y.asInt()) {
z.push(x);
} else {
z.push(y.asInt());
}
}
z.pushEndOfData();
}
}
Like the dataflow of Oz, all downstream nodes listening to the same output channel block upon asking for data. Unlike Oz, the input cannot simply be used in an expression but must be explicitly stepped in each node, then polled for a value. Here, the stepNext() method causes the thread running an instance of BindProcess to block if y contains no data. Once data accumulates, the executor wakes the thread and asInt() retrieves the value at the head of the queue. The results of the node's computation must then be explicitly pushed to the z output.
Unlike dataflow variables, DataRush ports transfer batches of data thought of as a continuous stream flowing through the network; DataRush trades what amounts to multiple assignment to the port for requiring the programmer to explicitly manage the iteration through the incoming values.
Bear in mind as long as the instance of BindProcess steps y to end of data (or detaches from it), pushes end of data on z, and meets the cardinality requirements of downstream nodes receiving z, the execute() method may do whatever the programmer desires. Thus the dataflow of DataRush is a sort of dataflow by convention: the while loop, or some mechanism for stepping through a process' inputs like it, is central to the flow of data through the network.
DataRush is not a dataflow programming language, but a framework for constructing software according to a dataflow architecture. In this respect, it is far more similar to Morrison's flow-based programming than to the dataflow of Oz.
[CTMCP] Concepts, Techniques, and Models of Computer Programming. Van Roy and Haridi. 2004.





