CSCI 599 List of Suggested Projects

CSCI 599 Fall 2001 - Suggested Projects

Details on each project are posted below the list of projects.

Project Name	Group Leader
Efficient query processing/evaluation using progressive OLAP	Rolfe Schmidt
Nearest neighbor queries for mobile object databases	Mohammad Reza Kolahdouzan
Range queries in spatio-temporal databases	Snehal Thakkar
Spatio-temporal data minning of haptic data	Soham Mehta
Resource Management for Distributed Continuous Media Servers (DCMS)	Farnoush Banaei-Kashani
Multi-dimensional Data Clustering/Outlier detection	Sheng Shi

Project Descriptions

Progressive Evaluation of OLAP Queries
We will implement a 3-tier application that uses progressive algorithms to evaluate batches of range aggregate queries as efficiently as possible. The details of the application are yet to be decided, but two possibilities include:
- a service to provide arbitrary scale moving average plots of stock prices.
- an application providing user configurable summary views of a large climate dataset.
The client application will obtain all data from a middleware web service. We will define the interface for this service in a way that allows it to be used for a wide variety of applications and builds on recent research.
The web service may be partially written in Java, but the core algorithms will be implemented in C++, building on our existing code base. Embedded SQL will be used to access persistent data. Our algorithms will provide progressive OLAP by using successively better wavelet expansions of the batch of queries. These approximations can be evaluated efficiently as long as we store the wavelet transform of the data.
Work on the data tier of this application will include designing the relational store for transformed data and implementing batch load and update algorithms so that data can be streamed in to an operational database.
GOALS FOR PARTICIPANTS:
- Understand basic OLAP challenges and strategies
- Learn how to deliver progressive query evaluation effectively
- Gain experience designing and deploying web services and 3-tier applications
Range Queries in Spatio-temporal databases
We will design a system that allows us to efficiently query spatio-temporal data related to moving objects, i.e. trains , trucks, cars etc. Several types of queries on moving object data have been described in M4. We will implement experiments described in paper S4 using different data model. Provided that we have enough time at the end of semester we might come up with a system similar to one described in the paper or download DOMINO(Database Of MovINg Objects) and build a system to visualize moving vehicle along the roads. We will be learning about sptio-temporal databases, types of spatio-temporal queries, Moving Object databases, moving object data modelling and caching for moving object related databases.
Efficient Minning of Haptic Data We have three options for projects on haptic data:
- Given the data from the cyberglove for specific hand-motion, build a system to restore and redisplay the same motion on the screen using minimal number of samples. Extend it to real-time.
  Main challenges:
  1) Come up with a good method/algorithm to either select meaningful samples good enough to display the motion correctly or come up with some compression algorithm to reduce the data-size.
  2) Given the data, project the hand on screen.
  Things we will learn:
  How to work with spatio-temporal data; how to extract spatial features from data and present them graphically; issues involved in storing and extracting temporal information; issues involved in dealing with spatio-temporal data in real-time.
  It would be good to have atleast one student with background in graphics for this project.
- Given multiple data-sets(data from more than one persons) from cyberglove for specific hand-motion, to mine the data set and extract meaningful information/features from within. Build a system of detailed comparison (more than a statistical compare) between two similar motions of hand, based on these features.
  Main Challenges:
  1) Study the data characteristics and come up with a set of good meaningful features of a hand in motion.
  2) Mine the data to extract these features/information
  3) Come up with a measure of comparing these features for two similar hand-motions.
  Things we will learn:
  Data mining fundamentals and issues.
  It would be good to have atleast one student with background in data-mining for this project.
- We are trying to build a system to recognize colors in American Sign Language. The colors in ASL are all single-handed dynamic signs. We already have a system that recognizes static signs. Given the data from cyberglove, we will be trying to train a classifier (Proposed: Fuzzy neural network) to recognize dynamic signs.
  Students with background in Neural nets/fuzzy systems preferred.
  We intend to achieve significant progress in atleast 2 of these projects this semester. Choice of the project(s) would depend primarily on students' interest.
Resource Management for Distributed Continuous Media Servers (DCMS)
Consider a network of storage and streaming nodes that serve continuous media objects such as video, audio, etc. [1]. Servers are scattered globally according to locality of users and interconnected via high-speed dedicated links (see [2] for a typical service model). A user connects to some local server and requests for an object: we will implement a middle-ware to locate the replicas of the object in the network, determine the path with minimum cost to stream the object to the user across the network, and suggest an appropriate caching strategy to the servers along the path.
Considering the variable cost of the paths, use your imagination to visualize a DCMS network as a large set of moving objects (the servers and the media objects) interconnected via a bunch of springs (the links) according to a specific topology; i.e. similar to the funny models for molecular networks, as we have all seen and played with at our physics/chemistry labs! The loads imposed by users to the network at different points of time are the forces that cause the moving objects to vibrate. In such a dynamic environment, our middle-ware is supposed to monitor the spatial state of the objects and respond range and nearest neighbor queries in reply to local severs that try to reserve resources as users demand for objects. Distance scan queries can also come handy when servers along a path need to know if they are to cache the streamed objects; decision is to be made according to users' access pattern that has temporal attributes.
The details of the middle-ware are yet to be discussed, but generally it should be implemented as a directory service [3]. Implementation of the main functionalities will be in C++, although for visualization of the functionalities we can use Java.
Challenges:
1) Large number of moving objects
2) Rapidly changing environment
3) Real-time constraints for query response, to minimize latency
4) Distributed database design, if we decide to implement a distributed directory service
Who should join the project:
If you want to gain some experience with distributed computing, network resource management, continuous media streaming, and most importantly if you want to challenge yourself with designing an efficient spatio-temporal database, this is the project you should join.
Some References:
[1] Cyrus Shahabi, Mohammad Alshayeji and Shimeng Wang. A Redundant Hierarchical Structure for a Distributed Continuous Media Server , In Proceedings of the IDMS'97, September 1997 (available at http://dimlab.usc.edu/Research.html#II).
[2] http://www.akamai.com
[3] Andrew D. Birrell, Roy Levin, Michael D. Schroeder and Roger M. Needham. Grapevine: an exercise in distributed computing . Commun. ACM 25, 4 (Apr. 1982), Pages 260 - 274.
Multi-dimensional Data Clustering/Outlier detection
We are to implement an application, as effectively as possible, for Clustering (or Trend)/Outlier (or Surprise) detection in multi-dimensional database, and of course, this application should support different levels of resolutions (say, within last week, last month, last year and so on). Basically we will be using the wavelet transforms to achieve this goal. One example would be the 2D TSA-tree in spatial database (http://dimlab.usc.edu/DocsDemos/ssdbm.ps.gz).
The details are yet to be determined, but for the time being, some possible points of concerns might be the following. First, what on earth is the exact mathematical definition of SURPRISE? In fact, this really matters; a proper definition could save us a lot of work. And also, should we apply wavelet transforms on every dimension? The answer is not necessary. So there's one optimization problem here in the algorithm concern. Also, how sparse can we expect from the transform result and do we have other strategy (besides the 2D OTSA) to save the storage space This application will be implemented in C++, and one goal for this project is for you to gain experience in the multi-dimensional data mining.