Crunching Big Data with Java: One Team, One Month, One JVM

Author: 
Jim Falgout
Publication: 
Java Developers Journal
Date: 
2008-03-25

"We have shown that a successful approach for data analytic problems is to take a data-oriented view of the solution. This type of analysis leads to designs that are transferred easily to implementations using dataflow techniques. It was demonstrated during the course of this article how a small team of three developers created a fuzzy matching application in a short amount of time using these techniques. The debugging and profiling tools built into Java and the dataflow framework used were instrumental in optimizing the application not only for minimal runtime, but for optimal resource (CPU) utilization. The development focus on parallelization and utilization lead to good scalability, allowing the application to show much faster runtimes on configurations with more processing cores. This is important because we don't want to have to re-code as more cores arrive on the scene.

The productivity of Java, its excellent IDE support, and the wide variety of Java libraries available make it an excellent platform for software development. My team's work on the fuzzy matching application demonstrates that Java can also be used for applications that are a mix of data-intensive and compute-intensive elements. The Java platform provides an excellent mix of design-time and runtime performance and scalability. With new architectural approaches and dataflow library extensions, Java can be turned into a formidable data-crunching machine."