In multi-instance multi-label (MIML) learning, datasets are given in the form of bags, each of which contains multiple instances
and is associated with multiple labels. This paper considers a novel instance clustering problem in MIML learning, where the
bag labels are used as background knowledge to help group instances into...
The detection and determination of clusters has been of special interest among re-
searchers from different fields for a long time. In particular, assessing whether the clusters
are significant is a question that has been asked by a number of experimenters. In Fuentes
and Casella (2009), the authors put forth...
In the field of machine learning, clustering and classification are two fundamental tasks.
Traditionally, clustering is an unsupervised method, where no supervision about the
data is available for learning; classification is a supervised task, where fully-labeled data
are collected for training a classifier. In some scenarios, however, we may not...
This paper studies cluster ensembles for high dimensional data clustering. We examine three different approaches to constructing cluster ensembles. To address high dimensionality, we focus on ensemble construction methods that build on two popular dimension reduction techniques, random projection and principal component analysis (PCA). We present evidence showing that ensembles...
This paper presents the use of data clustering methods applied to the analysis results of a design-stage, functional failure reasoning tool. A system simulation using qualitative descriptions of component behaviors and a functional reasoning tool are used to identify the functional impact of a large set of potential single and...
Identifying coherent sub-graphs in networks is important
in many applications. In power systems, large systems
are divided into areas and zones to aid in planning and control
applications. But not every partitioning is equally good for
all applications; different applications have different goals, or
attributes, against which solutions should be...
DNA methylation is a well-recognized epigenetic mechanism that has been the subject of a growing
body of literature typically focused on the identification and study of profiles of DNA methylation and their
association with human diseases and exposures. In recent years, a number of unsupervised clustering algorithms,
both parametric and...
In organizational management, researchers and managers study separations or faultlines that occur in diverse teams when members
form subgroups based on the alignment of multiple demographic characteristics. The team faultline concept is operationalized
using multivariate cluster analysis—analysts use faultline measures to identify subgroups/clusters in a team and to quantify how...
We present a generalized estimating equations approach for estimating the concordance correlation coefficient and the
kappa coefficient from sample survey data. The estimates and their accompanying standard error need to correctly account
for the sampling design. Weighted measures of the concordance correlation coefficient and the kappa coefficient, along with
the...