Technical Information 

Pervasive DataRush (PDR) is a 100% Java platform that allows developers to quickly build highly parallel, data-intensive applications that take full advantage of multicore platforms. No specialized knowledge is required in threading, concurrent memory access, deadlock detection, data workload partitioning/buffering or any other complex aspect of parallel thread execution. In fact, now developers can quickly build highly-parallel data processing applications for today's multicore hardware all without the need to deal with threading libraries, deadlock detection algorithms, or concurrent process design issues.

PDR comes with a rich library of out-of-the-box Java components that can be assembled into a series of data flow operations. Where custom components need to be added or extended, developers simply use the Pervasive DataRush SDK to quickly build and extend their Pervasive DataRush application.

Rich Component Library

  • Foundational library that includes I/O, sort, merge, join and other operators
  • Extensibility provided by Java components that can be re-used and extended into higher-order components and/or assemblies

Parallel Processing Engine

  • The Pervasive DataRush execution environment (DRE) allows for on-the-fly optimization of data processing applications
  • Any number of custom operators can be easily included by extending the execution class path of DRE to include your JAR files
  • Debugging is made simpler with DRE execution statistics that provide component-level run time statistics

PDR is a Java implementation of dataflow.

Dataflow Programming: a simple, natural, powerful approach for programming multicore 

Dataflow languages contrast with the majority of programming languages,which use the imperative programming model. In imperative programming the program is modeled as a series of operations, the data being effectively invisible. This distinction may seem minor, but the paradigm shift is fairly dramatic, and allows dataflow languages to be spread out across multicore,multiprocessor systems for free.

Dataflow languages promote the data to become the main concept behind any program. The data is now explicit. Operations consist of "black boxes" with inputs and outputs, all of which are always explicitly defined. They run as soon as all of their inputs become valid, as opposed to when the program encounters them."

"Whereas a traditional program essentially consists of a series of statements saying "do this, now do this", a dataflow program is more like a series of workers on an assembly line, who will do their assigned task as soon as the materials arrive. This is why dataflow languages are inherently parallel; the operations have no hidden state to keep track of, and the operations are all"ready" at the same time.

PDR Provides a New Approach to Analytics

Pervasive DataRush is a ground-breaking technology enabling massively-parallel processing of analytic applications. It takes full advantage of all of the processing power of widely available commodity, multicore processors. incorporates leading-edge algorithms, runtime capabilities and architectures that enable highly-scalable analytic applications and provides unparalleled performance in transforming “dirty data” into “useable data”. It makes parallel programming of analytic applications practical for the masses.

 

Contact a Knowledgeable Consultant Today!

Why Pervasive DataRush?

The Sheer Magnitude of Data!
Data volumes are exploding, the number of data sources is increasing and data sets in gigabytes or terabytes are common. For example, WalMart records 20 million transactions and AT&T collects 275 million call records every day.

The Proliferation of Multicore Hardware.
The entire data mining industry is unprepared to fully exploit multicore capability. The vast majority of existing code is either single-threaded or parallel in very limited ways.

Data Complexity (“The Curse of Dimensionality”)
Large number of variables are required to perform adequate analytics. For example, Retail  uses 100’s; Financial  Services and Telecommunications  1000’s of variables. Their datasets include millions of rows or instances.

The Market Is Presently Being Challenged
Working with large, complex datasets is extremely hard. Dirty data is a reality and makes the job even harder. Accessing and preparing the data takes too much time (up to 70% of the total project effort is not uncommon). Traditional analytics are dependent on memory-resident computational models. Slow model development and assessment time results in a negative impact on revenue. Models that are finally built are often not refreshed frequently enough because of the difficulty in preparing the data. Finding qualified, expert analysts is difficult and tools that are usable by analysts with lesser expertise are needed

Want more information?  Read about dataflow and data mining operators.