Details on each project are posted below the list of projects.
Project Name | Group Leader |
Efficient query processing/evaluation using progressive OLAP | Rolfe Schmidt |
Nearest neighbor queries for mobile object databases | Mohammad Reza Kolahdouzan |
Range queries in spatio-temporal databases | Snehal Thakkar |
Spatio-temporal data minning of haptic data | Soham Mehta |
Resource Management for Distributed Continuous Media Servers (DCMS) | Farnoush Banaei-Kashani |
Multi-dimensional Data Clustering/Outlier detection | Sheng Shi |
Project Descriptions
We will implement a 3-tier application that uses progressive algorithms to evaluate batches of range aggregate queries as efficiently as possible. The details of the application are yet to be decided, but two possibilities include:
The client application will obtain all data from a middleware web service. We will define the interface for this service in a way that allows it to be used for a wide variety of applications and builds on recent research.
The web service may be partially written in Java, but the core algorithms will be implemented in C++, building on our existing code base. Embedded SQL will be used to access persistent data. Our algorithms will provide progressive OLAP by using successively better wavelet expansions of the batch of queries. These approximations can be evaluated efficiently as long as we store the wavelet transform of the data.
Work on the data tier of this application will include designing the relational store for transformed data and implementing batch load and update algorithms so that data can be streamed in to an operational database.
GOALS FOR PARTICIPANTS:
We will design a system that allows us to efficiently query spatio-temporal data related to moving objects, i.e. trains , trucks, cars etc. Several types of queries on moving object data have been described in M4. We will implement experiments described in paper S4 using different data model. Provided that we have enough time at the end of semester we might come up with a system similar to one described in the paper or download DOMINO(Database Of MovINg Objects) and build a system to visualize moving vehicle along the roads. We will be learning about sptio-temporal databases, types of spatio-temporal queries, Moving Object databases, moving object data modelling and caching for moving object related databases.
Main challenges:
1) Come up with a good method/algorithm to either select meaningful
samples good enough to display the motion correctly or come up with some
compression algorithm to reduce the data-size.
2) Given the data, project the hand on screen.
Things we will learn:
How to work with spatio-temporal data; how to extract spatial features
from data and present them graphically; issues involved in storing and
extracting temporal information; issues involved in dealing with
spatio-temporal data in real-time.
It would be good to have atleast one student with background in graphics
for this project.
Main Challenges:
1) Study the data characteristics and come up with a set of good
meaningful features of a hand in motion.
2) Mine the data to extract these features/information
3) Come up with a measure of comparing these features for two similar
hand-motions.
Things we will learn:
Data mining fundamentals and issues.
It would be good to have atleast one student with background in
data-mining for this project.
Consider a network of storage and streaming nodes that serve continuous
media objects such as video, audio, etc. [1]. Servers are scattered
globally according to locality of users and interconnected via high-speed
dedicated links (see [2] for a typical service model). A user connects to
some local server and requests for an object: we will implement a
middle-ware to locate the replicas of the object in the network, determine
the path with minimum cost to stream the object to the user across the
network, and suggest an appropriate caching strategy to the servers along
the path.
Considering the variable cost of the paths, use your imagination to
visualize a DCMS network as a large set of moving objects (the servers and
the media objects) interconnected via a bunch of springs (the
links) according to a specific topology; i.e. similar to the funny models
for molecular networks, as we have all seen and played with at our
physics/chemistry labs! The loads imposed by users to the network at
different points of time are the forces that cause the moving objects to
vibrate. In such a dynamic environment, our middle-ware is supposed to
monitor the spatial state of the objects and respond range and nearest
neighbor queries in reply to local severs that try to reserve resources as
users demand for objects. Distance scan queries can also come handy when
servers along a path need to know if they are to cache the streamed
objects; decision is to be made according to users' access pattern that
has temporal attributes.
The details of the middle-ware are yet to be discussed, but generally it
should be implemented as a directory service [3]. Implementation of the
main functionalities will be in C++, although for visualization of the
functionalities we can use Java.
Challenges:
1) Large number of moving objects
2) Rapidly changing environment
3) Real-time constraints for query response, to minimize latency
4) Distributed database design, if we decide to implement a distributed directory service
Who should join the project:
If you want to gain some experience with distributed computing, network
resource management, continuous media streaming, and most importantly if
you want to challenge yourself with designing an efficient spatio-temporal
database, this is the project you should join.
Some References:
[1] Cyrus Shahabi, Mohammad Alshayeji and Shimeng Wang. A Redundant
Hierarchical Structure for a Distributed Continuous Media Server , In
Proceedings of the IDMS'97, September 1997 (available at
http://dimlab.usc.edu/Research.html#II).
[2] http://www.akamai.com
[3] Andrew D. Birrell, Roy Levin, Michael D. Schroeder and Roger
M. Needham. Grapevine: an exercise in distributed computing . Commun. ACM
25, 4 (Apr. 1982), Pages 260 - 274.
We are to implement an application, as effectively as possible, for Clustering (or Trend)/Outlier (or Surprise) detection in multi-dimensional database, and of course, this application should support different levels of resolutions (say, within last week, last month, last year and so on). Basically we will be using the wavelet transforms to achieve this goal. One example would be the 2D TSA-tree in spatial database (http://dimlab.usc.edu/DocsDemos/ssdbm.ps.gz).
The details are yet to be determined, but for the time being, some possible points of concerns might be the following. First, what on earth is the exact mathematical definition of SURPRISE? In fact, this really matters; a proper definition could save us a lot of work. And also, should we apply wavelet transforms on every dimension? The answer is not necessary. So there's one optimization problem here in the algorithm concern. Also, how sparse can we expect from the transform result and do we have other strategy (besides the 2D OTSA) to save the storage space This application will be implemented in C++, and one goal for this project is for you to gain experience in the multi-dimensional data mining.