| |
In many OLAP
and decision support environments, it is often desirable
to answer complex long-running aggregate database queries
approximately, provided some estimate of the error is
also given. For example, when a user asks: "give me the
aggregate rate of serial murders, grouped by the US states,"
she/he is probably not interested in getting answers to
the nearest digit. We approach this difficult problem
using statistical sampling-based techniques. Interesting
se\curity and privacy related issues occur in such a query-answering
system. For example, it may be ok for illegal users to
get information such as the average number of the policemen
deployed in a suburb, but getting information for lower
levels of granularity (e.g., what are the qualifications
of the main agent involved) may not be permissible. We
are investigating solutions that involve data perturbations,
i.e., in which higher-granularity aggregates can be reliably
retrieved, but very specific aggregations are inaccurate.
|