## SEARCH

#### Institution

##### ( see all 8)

- Rutgers University 12 (%)
- Central Economics-Mathematics Institute 2 (%)
- Birkbeck University of London 1 (%)
- DIMACS, Rutgers University 1 (%)
- ID Business Solutions 1 (%)

#### Author

##### ( see all 9)

- Mirkin, Boris [x] 15 (%)
- BorisMirkin Boris Mirkin 2 (%)
- Arabie, Phipps 1 (%)
- Chiang, Mark Ming-Tso 1 (%)
- EvgueniKolossov Evgueni Kolossov 1 (%)

#### Publication

#### Subject

##### ( see all 15)

- Statistics [x] 15 (%)
- Statistics, general 12 (%)
- Artificial Intelligence (incl. Robotics) 10 (%)
- Operation Research/Decision Theory 10 (%)
- Optimization 10 (%)

## CURRENTLY DISPLAYING:

Most articles

Fewest articles

Showing 1 to 10 of 15 matching Articles
Results per page:

## Hybrid k-Means: Combining Regression-Wise and Centroid-Based Criteria for QSAR

### Selected Contributions in Data Analysis and Classification (2007-01-01): 225-233 , January 01, 2007

This paper further extends the ‘kernel’-based approach to clustering proposed by E. Diday in early 70s. According to this approach, a cluster’s centroid can be represented by parameters of any analytical model, such as linear regression equation, built over the cluster. We address the problem of producing regression-wise clusters to be separated in the input variable space by building a hybrid clustering criterion that combines the regression-wise clustering criterion with the conventional centroid-based one.

## Hierarchy as a Clustering Structure

### Mathematical Classification and Clustering (1996-01-01) 11: 329-397 , January 01, 1996

Directions for representing and comparing hierarchies are discussed.

Clustering methods that are invariant under monotone dissimilarity transformations are analyzed.

Most recent theories and methods concerning such concepts as ultrametric, tree metric, Robinson matrix, pyramid, and weak hierarchy are presented.

A linear theory for binary hierarchy is proposed to allow decomposing the data entries, as well as covariances, by the clusters.

## Clustering Algorithms: a Review

### Mathematical Classification and Clustering (1996-01-01) 11: 109-168 , January 01, 1996

A review of clustering concepts and algorithms is provided emphasizing: (a) output cluster structure, (b) input data kind, and (c) criterion.

A dozen cluster structures is considered including those used in either supervised or unsupervised learning or both.

The techniques discussed cover such algorithms as nearest neighbor, K-Means (moving centers), agglomerative clustering, conceptual clustering, EM-algorithm, high-density clustering, and back-propagation.

Interpretation is considered as achieving clustering goals (partly, via presentation of the same data with both extensional and intensional forms of cluster structures).

## Partition: Rectangular Data Table

### Mathematical Classification and Clustering (1996-01-01) 11: 285-327 , January 01, 1996

Bilinear clustering for mixed — quantitative, nominal and binary — variables is proved to be a theory-motivated extension of K-Means method.

Decomposition of the data scatter into “explained” and “residual” parts is provided (for each of the two norms: sum of squares and moduli).

Contribution weights are derived to attack machine learning problems (conceptual description, selecting and transforming the variables, and knowledge discovery).

The explained data scatter parts related to nominal variables appear to coincide with the chi-squared Pearson coefficient and some other popular indices, as well.

Approximation (bi)-partitioning for contingency tables substantiates and extends some popular clustering techniques.

## Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads

### Journal of Classification (2010-03-01) 27: 3-40 , March 01, 2010

The issue of determining “the right number of clusters” in K-Means has attracted considerable interest, especially in the recent years. Cluster intermix appears to be a factor most affecting the clustering results. This paper proposes an experimental setting for comparison of different approaches at data generated from Gaussian clusters with the controlled parameters of between- and within-cluster spread to model cluster intermix. The setting allows for evaluating the centroid recovery on par with conventional evaluation of the cluster recovery. The subjects of our interest are two versions of the “intelligent” *K*-Means method, *ik*-Means, that find the “right” number of clusters by extracting “anomalous patterns” from the data one-by-one. We compare them with seven other methods, including Hartigan’s rule, averaged Silhouette width and Gap statistic, under different between- and within-cluster spread-shape conditions. There are several consistent patterns in the results of our experiments, such as that the right *K* is reproduced best by Hartigan’s rule – but not clusters or their centroids. This leads us to propose an adjusted version of i*K-*Means, which performs well in the current experiment setting.

## Back Matter - Mathematical Classification and Clustering

### Mathematical Classification and Clustering (1996-01-01): 11 , January 01, 1996

## Mathematical Classification and Clustering

### Mathematical Classification and Clustering (1996-01-01): 11 , January 01, 1996

## Geometry of Data Sets

### Mathematical Classification and Clustering (1996-01-01) 11: 59-107 , January 01, 1996

Entity-to-variable data table can be represented geometrically in three different settings of which one (row-points) pertains to conventional clustering, another (column-vectors), to conceptual clustering, and the third one (matrix space), to approximation clustering.

Two principles for standardizing the conditional data tables are suggested as related to the data scatter.

Standardizing the aggregable data is suggested based on the flow index concept introduced.

Graph-theoretic concepts related to clustering are considered.

Low-rank approximation of data, including the popular Principal component and Correspondence analysis techniques, are discussed and extended into a general Sequential fitting procedure, SEFIT, which will be employed for approximation clustering.

## Partition: Square Data Table

### Mathematical Classification and Clustering (1996-01-01) 11: 229-284 , January 01, 1996

Forms of representing and comparing partitions are reviewed.

Mathematical analysis of some of the agglomerative clustering axioms is presented.

Approximation clustering methods for aggregating square data tables are suggested along with associated mathematical theories:

Uniform partitioning as based on a “soft” similarity threshold;

Structured partitioning (along with the structure of between-class associations);

Aggregation of mobility and other aggregable interaction data as based on chi-squared criterion and underlying substantive modeling.