## SEARCH

#### Author

##### ( see all 293)

- Lian, Heng 4 (%)
- Tutz, Gerhard 4 (%)
- Xu, Jinfeng 4 (%)
- Cavanaugh, Joseph E. 3 (%)
- Leng, Chenlei 3 (%)

#### Publication

##### ( see all 28)

- Statistics and Computing 38 (%)
- Computational Statistics 17 (%)
- Annals of the Institute of Statistical Mathematics 16 (%)
- Statistical Papers 9 (%)
- Lifetime Data Analysis 5 (%)

#### Subject

##### ( see all 35)

- Statistics [x] 126 (%)
- Statistics, general 73 (%)
- Statistics and Computing/Statistics Programs 45 (%)
- Statistical Theory and Methods 42 (%)
- Probability and Statistics in Computer Science 41 (%)

## CURRENTLY DISPLAYING:

Most articles

Fewest articles

Showing 1 to 10 of 126 matching Articles
Results per page:

## Variable selection for generalized varying coefficient models with longitudinal data

### Statistical Papers (2016-03-01) 57: 115-132 , March 01, 2016

In this paper, we apply the penalized quadratic inference function to perform variable selection and estimation simultaneously for generalized varying coefficient models with longitudinal data. The proposed approach is based on basis function approximations and the group SCAD penalty, which can incorporate information on the correlation structure within the same subject to achieve an efficient estimator. Furthermore, we discuss the asymptotic theory of our proposed procedure under suitable conditions, including consistency in variable selection and the oracle property in estimation. Finally, monte carlo simulations and a real data analysis are conducted to examine the finite sample performance of the proposed procedure.

## The predictive Lasso

### Statistics and Computing (2012-09-01) 22: 1069-1084 , September 01, 2012

We propose a shrinkage procedure for simultaneous variable selection and estimation in generalized linear models (GLMs) with an explicit predictive motivation. The procedure estimates the coefficients by minimizing the Kullback-Leibler divergence of a set of predictive distributions to the corresponding predictive distributions for the full model, subject to an *l*_{1} constraint on the coefficient vector. This results in selection of a parsimonious model with similar predictive performance to the full model. Thanks to its similar form to the original Lasso problem for GLMs, our procedure can benefit from available *l*_{1}-regularization path algorithms. Simulation studies and real data examples confirm the efficiency of our method in terms of predictive performance on future observations.

## Penalized empirical likelihood inference for sparse additive hazards regression with a diverging number of covariates

### Statistics and Computing (2017-09-01) 27: 1347-1364 , September 01, 2017

High-dimensional sparse modeling with censored survival data is of great practical importance, as exemplified by applications in high-throughput genomic data analysis. In this paper, we propose a class of regularization methods, integrating both the penalized empirical likelihood and pseudoscore approaches, for variable selection and estimation in sparse and high-dimensional additive hazards regression models. When the number of covariates grows with the sample size, we establish asymptotic properties of the resulting estimator and the oracle property of the proposed method. It is shown that the proposed estimator is more efficient than that obtained from the non-concave penalized likelihood approach in the literature. Based on a penalized empirical likelihood ratio statistic, we further develop a nonparametric likelihood approach for testing the linear hypothesis of regression coefficients and constructing confidence regions consequently. Simulation studies are carried out to evaluate the performance of the proposed methodology and also two real data sets are analyzed.

## Weighting and selection of variables for cluster analysis

### Journal of Classification (1995-03-01) 12: 113-136 , March 01, 1995

One of the thorniest aspects of cluster analysis continues to be the weighting and selection of variables. This paper reports on the performance of nine methods on eight “leading case” simulated and real sets of data. The results demonstrate shortcomings of weighting based on the standard deviation or range as well as other more complex schemes in the literature. Weighting schemes based upon carefully chosen estimates of within-cluster and between-cluster variability are generally more effective. These estimates do not require knowledge of the cluster structure. Additional research is essential: worry-free approaches do not yet exist.

## Bayesian covariance matrix estimation using a mixture of decomposable graphical models

### Statistics and Computing (2009-09-01) 19: 303-316 , September 01, 2009

We present a Bayesian approach to estimating a covariance matrix by using a prior that is a mixture over all decomposable graphs, with the probability of each graph size specified by the user and graphs of equal size assigned equal probability. Most previous approaches assume that all graphs are equally probable. We show empirically that the prior that assigns equal probability over graph sizes outperforms the prior that assigns equal probability over all graphs in more efficiently estimating the covariance matrix. The prior requires knowing the number of decomposable graphs for each graph size and we give a simulation method for estimating these counts. We also present a Markov chain Monte Carlo method for estimating the posterior distribution of the covariance matrix that is much more efficient than current methods. Both the prior and the simulation method to evaluate the prior apply generally to any decomposable graphical model.

## Simultaneous estimation and variable selection in median regression using Lasso-type penalty

### Annals of the Institute of Statistical Mathematics (2010-06-01) 62: 487-514 , June 01, 2010

We consider the median regression with a LASSO-type penalty term for variable selection. With the fixed number of variables in regression model, a two-stage method is proposed for simultaneous estimation and variable selection where the degree of penalty is adaptively chosen. A Bayesian information criterion type approach is proposed and used to obtain a data-driven procedure which is proved to automatically select asymptotically optimal tuning parameters. It is shown that the resultant estimator achieves the so-called oracle property. The combination of the median regression and LASSO penalty is computationally easy to implement via the standard linear programming. A random perturbation scheme can be made use of to get simple estimator of the standard error. Simulation studies are conducted to assess the finite-sample performance of the proposed method. We illustrate the methodology with a real example.

## Sparse optimal discriminant clustering

### Statistics and Computing (2016-05-01) 26: 629-639 , May 01, 2016

In this manuscript, we reinvestigate an existing clustering procedure, optimal discriminant clustering (ODC; Zhang and Dai in Adv Neural Inf Process Syst 23(12):2241–2249, 2009), and propose to use cross-validation to select the tuning parameter. Furthermore, because in high-dimensional data many of the features may be non-informative for clustering, we develop a variation of ODC, sparse optimal discriminant clustering (SODC), by adding a group-lasso type of penalty to ODC. We also demonstrate that both ODC and SDOC can be used as a dimension reduction tool for data visualization in cluster analysis.

## Sliced inverse regression for survival data

### Statistical Papers (2014-02-01) 55: 209-220 , February 01, 2014

We apply the univariate sliced inverse regression to survival data. Our approach is different from the other papers on this subject. The right-censored observations are taken into account during the slicing of the survival times by assigning each of them with equal weight to all of the slices with longer survival. We test this method with different distributions for the two main survival data models, the accelerated lifetime model and Cox’s proportional hazards model. In both cases and under different conditions of sparsity, sample size and dimension of parameters, this non-parametric approach finds the data structure and can be viewed as a variable selector.

## Partial linear modelling with multi-functional covariates

### Computational Statistics (2015-09-01) 30: 647-671 , September 01, 2015

This paper takes part on the current literature on semi-parametric regression modelling for statistical samples composed of multi-functional data. A new kind of partially linear model (so-called MFPLR model) is proposed. It allows for more than one functional covariate, for incorporating as well continuous and discrete effects of functional variables and for modelling these effects as well in a nonparametric as in a linear way. Based on the continuous specificity of functional data, a new method is proposed for variable selection (so-called PVS method). In addition, from this procedure, new estimates of the various parameters involved in the partial linear model are constructed. A simulation study illustrates the finite sample size behavior of the PVS procedure for selecting the influential variables. Through some real data analysis, it is shown how the method is reaching the three main objectives of any semi-parametric procedure. Firstly, the flexibility of the nonparametric component of the model allows to get nice predictive behavior; secondly, the linear component of the model allows to get interpretable outputs; thirdly, the low computational cost insures an easy applicability. Even if the intent is to be used in multi-functional problems, it will briefly discuss how it can also be used in uni-functional problems as a boosting tool for improving prediction power. Finally, note that the main feature of this paper is of applied nature but some basic asymptotics are also stated in a final “Appendix”.

## Variable Selection for Clustering and Classification

### Journal of Classification (2014-07-01) 31: 136-153 , July 01, 2014

As data sets continue to grow in size and complexity, effective and efficient techniques are needed to target important features in the variable space. Many of the variable selection techniques that are commonly used alongside clustering algorithms are based upon determining the best variable subspace according to model fitting in a stepwise manner. These techniques are often computationally intensive and can require extended periods of time to run; in fact, some are prohibitively computationally expensive for high-dimensional data. In this paper, a novel variable selection technique is introduced for use in clustering and classification analyses that is both intuitive and computationally efficient. We focus largely on applications in mixture model-based learning, but the technique could be adapted for use with various other clustering/classification methods. Our approach is illustrated on both simulated and real data, highlighted by contrasting its performance with that of other comparable variable selection techniques on the real data sets.