Pervasive provides two core libraries to help the developer quickly deliver their application and take advantage of the Pervasive DataRushTM Parallel Dataflow Engine:
While these libraries are often sufficient to build your application, the Pervasive DataRush Java SDK allows the developer to build custom operators. In fact, the Pervasive DataRush Core Libraries are built using this same, powerful SDK.
DataRush Product Architecture
High Performance Data-intensive Application
Capabilities
For Data Preparation
- Full array of data preparation operators including standard data processing functionality such as: sort, join, aggregation (data grouping), and transformations.
- Operators support connectivity to delimited text, fixed text, databases (JDBC), and proprietary Pervasive DataRush data-staging files.
- The means to stage data to disk in a very efficient format that supports parallel writing and reading. This is useful for staging data between phases of execution and can be a useful way of communicating large data between software components.
- A full data profiling library of operators including the means to create a complex set of metrics to execute against input data.
For Analytics
- Core set of parallelized data mining algorithms built on the Pervasive DataRush engine.
- Algorithms are data scalable and built to work with any size of data, from a few thousand rows to many billions or more. There is no requirement to load all data into memory.
- Classification algorithms for predicting class of data: Decision Trees, Naive-Bayes, KNN, SVM.
- Clustering algorithms for customer segmentation: K-Means.
- Unsupervised learning algorithms for finding unknown patterns in data: ARM, Neural Networks.
- Trending algorithms for understanding and predicting future growth: Linear, Logistic, Polynomial, and Multi-variable Regression.
- Feature Selection algorithms for discovering strong correlations: Principal Component Analysis (PCA).
- PMML model support.