What is Pervasive DataRush? 

Pervasive DataRush (PDR) is a 100% Java framework that allows developers to quickly build highly parallel, data-intensive applications that take full advantage of multicore, SMP platforms. No specialized knowledge is required in threading, concurrent memory access, deadlock detection, data workload partitioning/buffering or any other complex aspect of parallel thread execution. In fact, now developers can quickly build highly-parallel data processing applications for today's multicore hardware all without the need to deal with threading libraries, deadlock detection algorithms, or concurrent process design issues.

PDR comes with a rich library of out-of-the-box Java components that can be assembled into a series of data flow operations. Where custom components need to be added or extended, developers simply use the Pervasive DataRush SDK to quickly build and extend their Pervasive DataRush application.

Rich Component Library

  • Foundational library that includes I/O, sort, merge, join and other operators
  • Extensibility is provided by Java components that can be re-used and extended into higher-order components and/or assemblies

Parallel Processing Engine

  • The Pervasive DataRush execution environment (DRE) allows for on-the-fly optimization of data processing applications
  • Any number of custom operators can be easily included by extending the execution classpath of DRE to include your JAR files
  • Debugging is made simpler with DRE execution statistics that provide component-level runtime statistics

PDR is a Java implementation of dataflow. Here is more info from wikipedia.org:

Dataflow Programming: a simple, natural, powerful approach for programming multicore 

"Dataflow languages contrast with the majority of programming languages, which use the imperative programming model. In imperative programming the program is modeled as a series of operations, the data being effectively invisible. This distinction may seem minor, but the paradigm shift is fairly dramatic, and allows dataflow languages to be spread out across multicore, multiprocessor systems for free."

"Dataflow languages promote the data to become the main concept behind any program. The data is now explicit. Operations consist of "black boxes" with inputs and outputs, all of which are always explicitly defined. They run as soon as all of their inputs become valid, as opposed to when the program encounters them."

"Whereas a traditional program essentially consists of a series of statements saying "do this, now do this," a dataflow program is more like a series of workers on an assembly line, who will do their assigned task as soon as the materials arrive. This is why dataflow languages are inherently parallel; the operations have no hidden state to keep track of, and the operations are all 'ready' at the same time."

 

 
 Free DataRush Download

 

Download    Quick Start     Articles     Dev Guide   Industry News     Blogs  

 

A brief history on why Pervasive DataRush is needed ...

Chip vendors such as Azul Systems, Sun Microsystems, AMD, IBM and Intel have transformed the multi-processor SMP server landscape with their recent introductions of 2, 4, 8 and even 48 core processors. Most predict 80+ cores per chip by the year 2010. This means "commodity SMP" servers with 16-32 cores are now available.

Due to heat dissipation issues and "real-estate shortages" on the chip, most designers agreed this strategy could not be sustained - the multicore CPU industry was born. As a result, hardware platforms have completely outpaced the ability of current software designs to take advantage of their new-found compute power.

The problem? For decades, most applications built to perform data-intensive processing were not architected with 2, 4, 8 or even 10's of cores in mind. Legacy programming languages such as COBOL and Fortran do not provide easy to use concurrent programming frameworks, nor do today's most common languages such as Java, C++, Perl and Python.

How can today's software developers, unaccustomed to the complexities of concurrent programming, build applications that are multicore-aware?

Software developers must change their implementation methodologies now, before the disconnect between hardware capabilities and software design leads to massively underutilized computing power in the datacenter.