Graduate Thesis Or Dissertation
 

A Statistical Analysis and Interpretation of the HyperLogLog Algorithm

Public Deposited

Downloadable Content

Download PDF
https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/2f75rg754

Descriptions

Attribute NameValues
Creator
Abstract
  • The HyperLogLog (HLL) algorithm is used to estimate the cardinality of large sets. This thesis gives a novel analysis of the HyperLogLog algorithm by using techniques from statistics and probability. Initially, closed form bounds for the mean and variance of the max of n independent and identically distributed geometric random variables are developed. Surprisingly, the beta, digamma and trigamma functions show up in integral ways in the bounds discovered. These bounds and other techniques from probability are used to investigate the asymptotic behavior of the estimator associated with the HyperLogLog algorithm. The maximum likelihood estimator of the HyperLogLog algorithm is also derived and compared to the original HyperLogLog algorithm. Finally, some preliminary work is included that abstracts the idea of random interval subdivision to random area splitting for triangular regions. This is a generalization of the Kakutani-splitting process.
License
Resource Type
Date Issued
Degree Level
Degree Name
Degree Field
Degree Grantor
Commencement Year
Advisor
Committee Member
Academic Affiliation
Rights Statement
Publisher
Peer Reviewed
Language

Relationships

Parents:

This work has no parents.

In Collection:

Items