CSCI 599 Fall 2001 - Suggested Projects

Details on each project are posted below the list of projects.



Project Name Group Leader
Efficient query processing/evaluation using progressive OLAP Rolfe Schmidt
Nearest neighbor queries for mobile object databases Mohammad Reza Kolahdouzan
Range queries in spatio-temporal databases Snehal Thakkar
Spatio-temporal data minning of haptic data Soham Mehta
Resource Management for Distributed Continuous Media Servers (DCMS) Farnoush Banaei-Kashani
Multi-dimensional Data Clustering/Outlier detection Sheng Shi

Project Descriptions

  1. Progressive Evaluation of OLAP Queries

    We will implement a 3-tier application that uses progressive algorithms to evaluate batches of range aggregate queries as efficiently as possible. The details of the application are yet to be decided, but two possibilities include:

    The client application will obtain all data from a middleware web service. We will define the interface for this service in a way that allows it to be used for a wide variety of applications and builds on recent research.

    The web service may be partially written in Java, but the core algorithms will be implemented in C++, building on our existing code base. Embedded SQL will be used to access persistent data. Our algorithms will provide progressive OLAP by using successively better wavelet expansions of the batch of queries. These approximations can be evaluated efficiently as long as we store the wavelet transform of the data.

    Work on the data tier of this application will include designing the relational store for transformed data and implementing batch load and update algorithms so that data can be streamed in to an operational database.

    GOALS FOR PARTICIPANTS:

  2. Range Queries in Spatio-temporal databases

    We will design a system that allows us to efficiently query spatio-temporal data related to moving objects, i.e. trains , trucks, cars etc. Several types of queries on moving object data have been described in M4. We will implement experiments described in paper S4 using different data model. Provided that we have enough time at the end of semester we might come up with a system similar to one described in the paper or download DOMINO(Database Of MovINg Objects) and build a system to visualize moving vehicle along the roads. We will be learning about sptio-temporal databases, types of spatio-temporal queries, Moving Object databases, moving object data modelling and caching for moving object related databases.

  3. Efficient Minning of Haptic Data We have three options for projects on haptic data:

  4. Resource Management for Distributed Continuous Media Servers (DCMS)

    Consider a network of storage and streaming nodes that serve continuous media objects such as video, audio, etc. [1]. Servers are scattered globally according to locality of users and interconnected via high-speed dedicated links (see [2] for a typical service model). A user connects to some local server and requests for an object: we will implement a middle-ware to locate the replicas of the object in the network, determine the path with minimum cost to stream the object to the user across the network, and suggest an appropriate caching strategy to the servers along the path.
    Considering the variable cost of the paths, use your imagination to visualize a DCMS network as a large set of moving objects (the servers and the media objects) interconnected via a bunch of springs (the links) according to a specific topology; i.e. similar to the funny models for molecular networks, as we have all seen and played with at our physics/chemistry labs! The loads imposed by users to the network at different points of time are the forces that cause the moving objects to vibrate. In such a dynamic environment, our middle-ware is supposed to monitor the spatial state of the objects and respond range and nearest neighbor queries in reply to local severs that try to reserve resources as users demand for objects. Distance scan queries can also come handy when servers along a path need to know if they are to cache the streamed objects; decision is to be made according to users' access pattern that has temporal attributes.
    The details of the middle-ware are yet to be discussed, but generally it should be implemented as a directory service [3]. Implementation of the main functionalities will be in C++, although for visualization of the functionalities we can use Java.

    Challenges:
    1) Large number of moving objects
    2) Rapidly changing environment
    3) Real-time constraints for query response, to minimize latency
    4) Distributed database design, if we decide to implement a distributed directory service

    Who should join the project:
    If you want to gain some experience with distributed computing, network resource management, continuous media streaming, and most importantly if you want to challenge yourself with designing an efficient spatio-temporal database, this is the project you should join.

    Some References:
    [1] Cyrus Shahabi, Mohammad Alshayeji and Shimeng Wang. A Redundant Hierarchical Structure for a Distributed Continuous Media Server , In Proceedings of the IDMS'97, September 1997 (available at http://dimlab.usc.edu/Research.html#II).
    [2] http://www.akamai.com
    [3] Andrew D. Birrell, Roy Levin, Michael D. Schroeder and Roger M. Needham. Grapevine: an exercise in distributed computing . Commun. ACM 25, 4 (Apr. 1982), Pages 260 - 274.

  5. Multi-dimensional Data Clustering/Outlier detection

    We are to implement an application, as effectively as possible, for Clustering (or Trend)/Outlier (or Surprise) detection in multi-dimensional database, and of course, this application should support different levels of resolutions (say, within last week, last month, last year and so on). Basically we will be using the wavelet transforms to achieve this goal. One example would be the 2D TSA-tree in spatial database (http://dimlab.usc.edu/DocsDemos/ssdbm.ps.gz).

    The details are yet to be determined, but for the time being, some possible points of concerns might be the following. First, what on earth is the exact mathematical definition of SURPRISE? In fact, this really matters; a proper definition could save us a lot of work. And also, should we apply wavelet transforms on every dimension? The answer is not necessary. So there's one optimization problem here in the algorithm concern. Also, how sparse can we expect from the transform result and do we have other strategy (besides the 2D OTSA) to save the storage space This application will be implemented in C++, and one goal for this project is for you to gain experience in the multi-dimensional data mining.