No rush for Pervasive Software's DataRush, but the time is right
Event summary:
Pervasive Software has been using the DataRush parallel Java framework in its data-profiling tool for some time. DataRush has been in beta for over a year, although general availability had been planned for last year. Successful scalability tests with 'lighthouse' customers have paved the way for a production release in 2008.
The 451 take:
The world is going multicore, but parallel programming skills are thin on the ground. The market is slow to adopt new programming models, and some of the traditional parallel programming tools are (possibly unfairly) seen as old hat. Pervasive Software's development of DataRush – a Java framework for highly parallel, data-intensive applications – could be in the right place at the right time. While there are no special tools for helping the developer analyze an application (and hence optimize its parallel performance), Java Management Extensions (JMX) does let you see performance data once you are up and running with DataRush.
Details:
Pervasive Software's evolutionary plan has been to first build its own products, then to add value by exploiting DataRush internally, and finally to make DataRush available as a general-purpose parallel programming framework for Java environments. DataRush supports parallelism in symmetric multiprocessing systems – not clusters or grids.
Parallel programming can be complicated. To get the best performance, a developer often has to understand what is happening at a low level to handle cache management, threading issues and performance tuning. However, it's difficult to do this with Java, as the Java Virtual Machine (JVM) handles thread affinity and it's difficult for a developer to dig too deep. While this is a limitation for an expert wanting to wring the last drop of scalability and performance, it's still a very appropriate approach for taking multicore and parallel processing to the masses. (Note that Intel had a patent issued last year titled 'Flexible acceleration of Java thread synchronization on multiprocessor computers,' which addresses the issue of Java thread affinity, although no product implementation is available yet.)
DataRush tests on synthetic parallel applications have delivered scalability and performance of 28 times faster on 32 cores. This is excellent, considering that the JVM hides the processor affinity and cache management from the application.
Competitive landscape:
The Java Grande Forum was established a decade ago to encourage the development of Java language design for parallel, data-intensive applications. That effort and Java OpenMP petered out after a few years, despite initial industry interest. Pervasive says it doesn't see much competition out there for parallel Java frameworks, a conclusion that we agree with.
The main competition for the exploitation of multicore systems is parallel-processing environments for C++, not Java. The open standard in this space is OpenMP, and all the major compiler companies have good implementations. OpenMP's focus is more on hot computational kernels, while Pervasive's is on building scalable, data-intensive applications. But there's a big overlap between the two.
Others take different approaches. RapidMind has language extensions to C++ that allow the developer to express parallelism in a way that can more easily be exploited by a variety of parallel architectures, while Connective Logic Systems uses graphical tools to describe the structure of parallel applications.




