Linear Scaling

We continue to do scalability testing on any and every platform we can get access to, particularly as bigger and bigger boxes become known to us. We have had an interesting time running DataRush on different processor architectures, different operating systems, different JVMs, never mind that each box has a different amount of memory, different clock rate, and different disk system.

See a listing of these variations here.

Results so far are exciting, and in general, this whole testing effort has been far more important than we first thought it would be. It has been remarkable to us to see how excited people are about the results, and how quickly people understand our framework’s purpose once we show them the graphs. Further, we continue to learn more each time we run them. For example, we were happy to get HP's HP-UX JVM team involved and gained some important insight there, whereas on another vendor’s system, we broke their JVM. They had never seen an app that wanted all the cores, all at once, all at 100%. They are using DataRush as part of their test suite now.

At this point, our tests are very simple, embarrassingly parallel, and are focused on the single attribute of scalability -- how much additional performance do you get when you add resources. The Holy Grail is linear scalability which is defined as doubling performance with a doubling of resources, in other words, demonstrating that there is a minimum amount of overhead being added from adding additional cores.

We are pleased and proud to say that we are achieving great results: near linear scaling on every platform. This hasn't been easy, and in some cases we have had to spend a lot of time with monitoring and profiling tools, but the smart people we are working with are starting to recognize the real achievement that DataRush represents.

The magic is we don't have to touch a line of code in the application to do it. Because a developer using the DataRush platform is no longer responsible for manually designing and coding for a specific number of cores, a DataRush-based application can easily be deployed across machines of different capabilities. This is especially important to ISVs and commercial software partners who want robust data-intensive analytic applications to scale and deliver faster performance as the number of cores multiplies.

Trackback URL for this post:

http://www.pervasivedatarush.com/trackback/191

Reply

  • Allowed HTML tags: <a> <i> <b> <tt> <em> <strong> <cite> <code> <ul> <ol> <li> <p> <blockquote> <img>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.

More information about formatting options