ProDA: An End-to-End Wavelet-Based Progressive Data Analysis System


Online Scientific Applications (OSA) require statistical analysis of large multidimensional datasets. These applications would observe a significant performance boost if most of the data analysis can be moved close to the data rather than executed at the application side. In addition, OSA users prefer to utilize their favorite visualization and reporting tools already provided by the underlying client platform (e.g., a spreadsheet application). Therefore, we develop a 3-tier end-to-end solution to OSA data analysis: wavelets, web-services and smart clients.


In EDBT'02 and PODS'02, we presented our basic idea in adapting wavelets as a database-friendly tool for analysis of multidimensional data. We argued that unlike others we use wavelets to approximate queries rather than data . Moreover, we broaden the concept of query to encompass several statistical functions by introducing a formal method of expressing analytical queries as polynomials on data dimensions.


We then built a system, called ProDA that wraps the wavelet-based query/analysis functions as web-services (demonstrated in SIGMOD'05 and SciData'04 ). With ProDA and other pure wavelet-based databases, data is stored as wavelets instead of its raw form, requiring direct data-manipulation in wavelet domain. Hence, in our SIGMOD'05 paper, we introduced two novel operations for wavelet-transformed data, SHIFT and SPLIT , which work directly on wavelets.


Currently, under a project supported by Microsoft, we are working on the client tier of ProDA to empower it with a hi-fidelity UI, XML/SOAP-based data exchange and remote/offline capabilities. We demonstrate the significance of our 3-tier solution by utilizing it in real-world scientific data analysis applications of our sponsors: JPL and Chevron.


