Measures can be organized into three categories based on the kind of
aggregate functions
used:
used:
- distributive,
- algebraic,
- holistic.
Distributive.
An aggregate function is distributive if it can be computed in a distributed manner. Suppose the data are partitioned into n sets. We apply the function to each partition, resulting in n aggregate values. If the result derived by applying the function to the n aggregate values is the same as that derived by applying the function to the entire data set (without partitioning), the function can be computed in a distributed manner.
An aggregate function is distributive if it can be computed in a distributed manner. Suppose the data are partitioned into n sets. We apply the function to each partition, resulting in n aggregate values. If the result derived by applying the function to the n aggregate values is the same as that derived by applying the function to the entire data set (without partitioning), the function can be computed in a distributed manner.
For example, count() can be computed for a data cube by first partitioning
the cube into a set of subcubes, computing count() for each subcube, and then
summing up the counts obtained for each subcube. Hence, count() is a
distributive
aggregate function. For the same reason, sum(), min(), and max() are distributive aggregate functions.
aggregate function. For the same reason, sum(), min(), and max() are distributive aggregate functions.
A measure is distributive if it is obtained by applying a distributive
aggregate function. Distributive measures can be computed efficiently because
they can be computed in a distributive manner.
Algebraic.
An aggregate function is algebraic if it can be computed by an algebraic function with m arguments (where m is a bounded positive integer), each of which is obtained by applying a distributive aggregate function.
An aggregate function is algebraic if it can be computed by an algebraic function with m arguments (where m is a bounded positive integer), each of which is obtained by applying a distributive aggregate function.
For example, avg() (average) can be computed by sum()/count(), where both
sum() and count() are distributive
aggregate functions. Similarly, it can be shown that min N() and max N() (which find the N minimum and N maximum values, respectively, in a given set) and standard deviation() are algebraic aggregate functions.
aggregate functions. Similarly, it can be shown that min N() and max N() (which find the N minimum and N maximum values, respectively, in a given set) and standard deviation() are algebraic aggregate functions.
A measure is algebraic if it is obtained by applying an algebraic
aggregate function.
Holistic.
An aggregate function is holistic if there is no constant bound on the storage size needed to describe a subaggregate. That is, there does not exist an algebraic function with m arguments (where m is a constant) that characterizes the computation.
An aggregate function is holistic if there is no constant bound on the storage size needed to describe a subaggregate. That is, there does not exist an algebraic function with m arguments (where m is a constant) that characterizes the computation.
Common examples of holistic functions include median(), mode(), and
rank().
A measure is holistic if it is obtained by applying a holistic aggregate
function.
<Sources: https://andyblg.wordpress.com/2010/05/05/categorization-of-measures/>
댓글 없음:
댓글 쓰기