## SEARCH

#### Institution

##### ( see all 236)

- McMaster University 8 (%)
- University of Queensland 7 (%)
- University of Bologna 5 (%)
- Indian Institute of Technology Kanpur 4 (%)
- Indian Statistical Institute 4 (%)

#### Author

##### ( see all 358)

- Balakrishnan, N. 5 (%)
- Aitkin, Murray 4 (%)
- Celeux, Gilles 4 (%)
- Galimberti, Giuliano 4 (%)
- Ingrassia, Salvatore 4 (%)

#### Publication

##### ( see all 30)

- Statistics and Computing 39 (%)
- Computational Statistics 20 (%)
- Lifetime Data Analysis 19 (%)
- Annals of the Institute of Statistical Mathematics 15 (%)
- Statistical Papers 15 (%)

## CURRENTLY DISPLAYING:

Most articles

Fewest articles

Showing 1 to 10 of 181 matching Articles
Results per page:

## A Cautionary Note on Likelihood Ratio Tests in Mixture Models

### Annals of the Institute of Statistical Mathematics (2000-09-01) 52: 481-487 , September 01, 2000

We show that iterative methods for maximizing the likelihood in a mixture of exponentials model depend strongly on their particular implementation. Different starting strategies and stopping rules yield completely different estimators of the parameters. This is demonstrated for the likelihood ratio test of homogeneity against two-component exponential mixtures, when the test statistic is calculated by the EM algorithm.

## Model-based clustering and classification with non-normal mixture distributions

### Statistical Methods & Applications (2013-11-01) 22: 427-454 , November 01, 2013

Non-normal mixture distributions have received increasing attention in recent years. Finite mixtures of multivariate skew-symmetric distributions, in particular, the skew normal and skew $$t$$ -mixture models, are emerging as promising extensions to the traditional normal and $$t$$ -mixture models. Most of these parametric families of skew distributions are closely related, and can be classified into four forms under a recently proposed scheme, namely, the restricted, unrestricted, extended, and generalised forms. In this paper, we consider some of these existing proposals of multivariate non-normal mixture models and illustrate their practical use in several real applications. We first discuss the characterizations along with a brief account of some distributions belonging to the above classification scheme, then references for software implementation of EM-type algorithms for the estimation of the model parameters are given. We then compare the relative performance of restricted and unrestricted skew mixture models in clustering, discriminant analysis, and density estimation on six real datasets from flow cytometry, finance, and image analysis. We also compare the performance of mixtures of skew normal and $$t$$ -component distributions with other non-normal component distributions, including mixtures with multivariate normal-inverse-Gaussian distributions, shifted asymmetric Laplace distributions and generalized hyperbolic distributions.

## Piecewise Linear Approximations for Cure Rate Models and Associated Inferential Issues

### Methodology and Computing in Applied Probability (2016-12-01) 18: 937-966 , December 01, 2016

Cure rate models offer a convenient way to model time-to-event data by allowing a proportion of individuals in the population to be completely cured so that they never face the event of interest (say, death). The most studied cure rate models can be defined through a competing cause scenario in which the random variables corresponding to the time-to-event for each competing causes are conditionally independent and identically distributed while the actual number of competing causes is a latent discrete random variable. The main interest is then in the estimation of the cured proportion as well as in developing inference about failure times of the susceptibles. The existing literature consists of parametric and non/semi-parametric approaches, while the expectation maximization (EM) algorithm offers an efficient tool for the estimation of the model parameters due to the presence of right censoring in the data. In this paper, we study the cases wherein the number of competing causes is either a binary or Poisson random variable and a piecewise linear function is used for modeling the hazard function of the time-to-event. Exact likelihood inference is then developed based on the EM algorithm and the inverse of the observed information matrix is used for developing asymptotic confidence intervals. The Monte Carlo simulation study demonstrates the accuracy of the proposed non-parametric approach compared to the results attained from the true correct parametric model. The proposed model and the inferential method is finally illustrated with a data set on cutaneous melanoma.

## Adaptive importance sampling in general mixture classes

### Statistics and Computing (2008-12-01) 18: 447-459 , December 01, 2008

In this paper, we propose an adaptive algorithm that iteratively updates both the weights and component parameters of a mixture importance sampling density so as to optimise the performance of importance sampling, as measured by an entropy criterion. The method, called M-PMC, is shown to be applicable to a wide class of importance sampling densities, which includes in particular mixtures of multivariate Student *t* distributions. The performance of the proposed scheme is studied on both artificial and real examples, highlighting in particular the benefit of a novel Rao-Blackwellisation device which can be easily incorporated in the updating scheme.

## Application of Mixture Models to Large Datasets

### Big Data Analytics (2016-01-01): 57-74 , January 01, 2016

Mixture distributions are commonly being applied for modelling and for discriminant and cluster analyses in a wide variety of situations. We first consider normal and *t*-mixture models. As they are highly parameterized, we review methods to enable them to be fitted to large datasets involving many observations and variables. Attention is then given to extensions of these mixture models to mixtures with skew normal and skew *t*-distributions for the segmentation of data into clusters of non-elliptical shape. The focus is then on the latter models in conjunction with the JCM (joint clustering and matching) procedure for an automated approach to the clustering of cells in a sample in flow cytometry where a large number of cells and their associated markers have been measured. For a class of multiple samples, we consider the use of JCM for matching the sample-specific clusters across the samples in the class and for improving the clustering of each individual sample. The supervised classification of a sample is also considered in the case where there are different classes of samples corresponding, for example, to different outcomes or treatment strategies for patients undergoing medical screening or treatment.

## Inference on finite population categorical response: nonparametric regression-based predictive approach

### AStA Advances in Statistical Analysis (2012-01-01) 96: 69-98 , January 01, 2012

Suppose that a finite population consists of *N* distinct units. Associated with the *i*th unit is a polychotomous response vector, *d*_{i}, and a vector of auxiliary variable *x*_{i}. The values *x*_{i}’s are known for the entire population but *d*_{i}’s are known only for the units selected in the sample. The problem is to estimate the finite population proportion vector *P*. One of the fundamental questions in finite population sampling is how to make use of the complete auxiliary information effectively at the estimation stage. In this article a predictive estimator is proposed which incorporates the auxiliary information at the estimation stage by invoking a superpopulation model. However, the use of such estimators is often criticized since the working superpopulation model may not be correct. To protect the predictive estimator from the possible model failure, a nonparametric regression model is considered in the superpopulation. The asymptotic properties of the proposed estimator are derived and also a bootstrap-based hybrid re-sampling method for estimating the variance of the proposed estimator is developed. Results of a simulation study are reported on the performances of the predictive estimator and its re-sampling-based variance estimator from the model-based viewpoint. Finally, a data survey related to the opinions of 686 individuals on the cause of addiction is used for an empirical study to investigate the performance of the nonparametric predictive estimator from the design-based viewpoint.

## Analysis of rounded data in mixture normal model

### Statistical Papers (2012-11-01) 53: 895-914 , November 01, 2012

Rounding errors have a considerable impact on statistical inferences, especially when the data size is large and the finite normal mixture model is very important in many applied statistical problems, such as bioinformatics. In this article, we investigate the statistical impacts of rounding errors to the finite normal mixture model with a known number of components, and develop a new estimation method to obtain consistent and asymptotically normal estimates for the unknown parameters based on rounded data drawn from this kind of models.

## Finite Mixture Modeling of Gaussian Regression Time Series with Application to Dendrochronology

### Journal of Classification (2016-10-01) 33: 412-441 , October 01, 2016

Finite mixture modeling is a popular statistical technique capable of accounting for various shapes in data. One popular application of mixture models is model-based clustering. This paper considers the problem of clustering regression autoregressive moving average time series. Two novel estimation procedures for the considered framework are developed. The first one yields the conditional maximum likelihood estimates which can be used in cases when the length of times series is substantial. Simple analytical expressions make fast parameter estimation possible. The second method incorporates the Kalman filter and yields the exact maximum likelihood estimates. The procedure for assessing variability in obtained estimates is discussed. We also show that the Bayesian information criterion can be successfully used to choose the optimal number of mixture components and correctly assess time series orders. The performance of the developed methodology is evaluated on simulation studies. An application to the analysis of tree ring data is thoroughly considered. The results are very promising as the proposed approach overcomes the limitations of other methods developed so far.

## Additive mixed models with approximate Dirichlet process mixtures: the EM approach

### Statistics and Computing (2016-01-01) 26: 73-92 , January 01, 2016

We consider additive mixed models for longitudinal data with a nonlinear time trend. As random effects distribution an approximate Dirichlet process mixture is proposed that is based on the truncated version of the stick breaking presentation of the Dirichlet process and provides a Gaussian mixture with a data driven choice of the number of mixture components. The main advantage of the specification is its ability to identify clusters of subjects with a similar random effects structure. For the estimation of the trend curve the mixed model representation of penalized splines is used. An Expectation-Maximization algorithm is given that solves the estimation problem and that exhibits advantages over Markov chain Monte Carlo approaches, which are typically used when modeling with Dirichlet processes. The method is evaluated in a simulation study and applied to theophylline data and to body mass index profiles of children.

## A multilevel finite mixture item response model to cluster examinees and schools

### Advances in Data Analysis and Classification (2016-03-01) 10: 53-70 , March 01, 2016

Within the educational context, a key goal is to assess students’ acquired skills and to cluster students according to their ability level. In this regard, a relevant element to be accounted for is the possible effect of the school students come from. For this aim, we provide a methodological tool which takes into account the multilevel structure of the data (i.e., students in schools) and allows us to cluster both students and schools into homogeneous classes of ability and effectiveness, and to assess the effect of certain students’ and school characteristics on the probability to belong to such classes. The proposed approach relies on an extended class of multidimensional latent class IRT models characterised by: (i) latent traits defined at student and school level, (ii) latent traits represented through random vectors with a discrete distribution, (iii) the inclusion of covariates at student and school level, and (iv) a two-parameter logistic parametrisation for the conditional probability of a correct response given the ability. The approach is applied for the analysis of data collected by two national tests administered in Italy to middle school students in June 2009: the INVALSI Language Test and the Mathematics Test.