In many OLAP and decision support environments, it is often desirable to answer complex long-running aggregate database queries approximately, provided some estimate of the error is also given. For example, when a user asks: "give me the aggregate rate of serial murders, grouped by the US states," she/he is probably not interested in getting answers to the nearest digit. We approach this difficult problem using statistical sampling-based techniques. Interesting se\curity and privacy related issues occur in such a query-answering system. For example, it may be ok for illegal users to get information such as the average number of the policemen deployed in a suburb, but getting information for lower levels of granularity (e.g., what are the qualifications of the main agent involved) may not be permissible. We are investigating solutions that involve data perturbations, i.e., in which higher-granularity aggregates can be reliably retrieved, but very specific aggregations are inaccurate.
© 2002 - 2005, The University of Texas at Arlington. Privacy Policy and Terms & Conditions