## SEARCH

#### Institution

##### ( see all 466)

- University of Pennsylvania 15 (%)
- Cornell University 14 (%)
- Probability, Statistics & Information 14 (%)
- Centre National de la Recherche Scientifique 12 (%)
- George Mason University 12 (%)

#### Author

##### ( see all 996)

- Eckstein, Peter P. 25 (%)
- Foster, Dean P. 15 (%)
- Stine, Robert A. 15 (%)
- Waterman, Richard P. 15 (%)
- Lebart, Ludovic 14 (%)

#### Publication

##### ( see all 26)

- Data Science, Classification, and Related Methods 95 (%)
- COMPSTAT 76 (%)
- Journal of Medical Systems 44 (%)
- Annals of the Institute of Statistical Mathematics 43 (%)
- Statistical Papers 40 (%)

#### Subject

##### ( see all 29)

- Statistics [x] 620 (%)
- Statistics, general 343 (%)
- Statistics for Business/Economics/Mathematical Finance/Insurance 218 (%)
- Probability Theory and Stochastic Processes 119 (%)
- Data Structures 95 (%)

## CURRENTLY DISPLAYING:

Most articles

Fewest articles

Showing 1 to 10 of 620 matching Articles
Results per page:

## Model-Selection Uncertainty with Examples

### Model Selection and Inference (1998-01-01): 118-158 , January 01, 1998

The understanding of model-selection uncertainty requires that one consider the process that generates the sample data we observe. For a given field, laboratory, or computer simulation study, data are observed on some process or system. If a second, independent, data set could be observed on the same process or system under nearly identical conditions, the new data set would differ somewhat from the first. Clearly, both data sets would contain information about the process, but the information would likely be slightly different, by chance. An obvious goal of data analysis is to make an inference about the process based on the data observed. A less obvious goal of data analysis is to make inferences about the process that are not overly specific with respect to the (single) data set observed. That is, we would like our inferences to be robust, with respect to the particular data set observed, in such a way that we tend to avoid problems associated with over-fitting (overinterpreting) the limited data we have. Thus, we would like some ability to make inferences about the process as if a large number of other data sets were also available. The interpretation of a confidence interval is similar; i.e., in repeated samples from the process, 95% of the data sets will generate a confidence interval that includes the true parameter value. This idea extends to the idea of generating a confidence (sub) set of the models considered such that with high relative frequency, over samples, that set of models contains the actual K-L best model of the set of models considered, while being as small a subset as possible (analogous to short confidence intervals).

## Analyzing Geostatistical Data

### S+SpatialStats (1998-01-01): 67-109 , January 01, 1998

This chapter introduces functions available in *S-Plus* and *S+SpatialStats* for analyzing geostatistical data. Geostatistical data, also termed random field data, consist of measurements taken at fixed locations. For a complete description of geostatistical data see chapter 1. Specifically, this chapter discusses methods related to variogram analysis and kriging. Variogram estimation and kriging were originally introduced as geostatistical methods for use in mining applications. In recent years, these methods have been applied to many disciplines including meteorology, forestry, agriculture, cartography, climatology, and fisheries.

In this chapter you will learn about the following topics:

Estimating Variograms (section 4.1).

Fitting Theoretical Variogram Models (section 4.2).

Performing Ordinary and Universal Kriging (section 4.3).

Simulating Geostatistical Data (section 4.4).

## A method for classifying unaligned biological sequences

### Data Science, Classification, and Related Methods (1998-01-01) , January 01, 1998

### Summary

It is needless to emphasize the importance of classification of protein sequences in molecular biology. Various methods of classification are currently being used by biologists (Landès et aí.1992) but most of them require the sequences to be prealigned — and thus to be of equal length — using one of the several multiple alignment algorithms available, so as to make the site-by-site comparison of sequences possible. Two LLA-based approaches for classifying prealigned sequences were already proposed (Lerman et al. (1994a)) whose results compared favourably with most currently used methods. The first approach made use of the “preordonnance” coding and the second one, the idea of “significant windows”. The new directions of research leading to a clustering method free from this somewhat strong constraint were also suggested by the authors. The present paper gives an account of the recent developments of our research, consisting of a new method that gets round the sequence comparison problem faced with while dealing with unaligned sequences, thanks to the “significant windows” approach.

## Stichprobentheorie

### Repetitorium Statistik (1998-01-01): 252-268 , January 01, 1998

### Zusammenfassung

Die Stichprobentheorie ist das Teilgebiet der Induktiven Statistik, das die theoretischen Grundlagen und die mathematisch-statistischen Verfahren für die Auswahl einer bestimmten Menge von Merkmalsträgern aus einer Grundgesamtheit zum Zwecke des Schlusses vom Teils aufs Ganze bereitstellt.

## Efficiency Evaluation of Skilled Nursing Facilities

### Journal of Medical Systems (1998-08-01) 22: 211-224 , August 01, 1998

*This study employs Data Envelopment Analysis (DEA) to determine technical efficiency using skilled nursing facilities in the United States, using a 10% national sample of 324 skilled nursing facilities stratified by ownership and size cluster groupings. Results show that nonprofit and for-profit firms operate using significantly different modes of production, thus allowing the best of the for-profits to achieve a level of technical efficiency .86 times higher than the most efficient nonprofit homes. The best larger nursing homes are .89 times more efficient than the best smaller facilities, also indicating a difference in production goals and technologies. A rationale for these differences is sought through an analysis of DEA generated slacks and a logistic regression. Controlling for size and ownership in the DEA, a higher percentage of Medicare patients leads to lower efficiency, while higher occupancy and greater percentage of Medicaid patients lead to greater efficiency. Regional characteristics do not impact efficiency. It is concluded that reimbursement policies should account for differences in organizational goals created by size and ownership differentials. The great variations in efficiency demonstrate tremendous potential for cost-savings through imitation of efficient firms.*

## Exploring Textual Data

### Exploring Textual Data (1998-01-01): 4 , January 01, 1998

## Dependence Between Order Statistics in Samples from Finite Population and its Application to Ranked Set Sampling

### Annals of the Institute of Statistical Mathematics (1998-03-01) 50: 49-70 , March 01, 1998

Let X1, X2,..., Xm, Y1, Y2,..., Yn be a simple random sample without replacement from a finite population and let X(1) ≤ X(2) ≤...≤ X(m) and Y(1) ≤ Y(2) ≤...≤ Y(n) be the order statistics of X1, X2,..., Xm and Y1, Y2,..., Yn, respectively. It is shown that the joint distribution of X(i) and X(j) is positively likelihood ratio dependent and Y(j) is negatively regression dependent on X(i). Using these results, it is shown that when samples are drawn without replacement from a finite population, the relative precision of the ranked set sampling estimator of the population mean, relative to the simple random sample estimator with the same number of units quantified, is bounded below by 1.

## Absorption Probabilities of a Brownian Motion in a Triangular Domain

### Advances in Stochastic Models for Reliability, Quality and Safety (1998-01-01): 197-210 , January 01, 1998

The absorption probabilities of a two-dimensional Brownian motion with independent components in a triangular domain are evaluated for special parameter cases. They are obtained from a known random walk result.

## Idaresa — a Tool for Construction, Description and Use of Harmonised Datasets from National Surveys

### COMPSTAT (1998-01-01): 299-304 , January 01, 1998

The Centre for Educational Sociology (CES) is a research centre at the University of Edinburgh. Research projects carried out at the centre are often based around secondary analysis of large and complex datasets and can require construction of harmonised datasets for comparative analysis. VTLMT is a two-year project which is funded under the Leonardo da Vinci Programme of the European Union.1 It is co-ordinated by the Economic and Social Research Institute (ESRI) in Dublin with partners in the Netherlands, France and Scotland. The analysis is based around a subset of school leavers and requires the construction of a dataset combining data from school leaver surveys carried out in the four countries. ‘Home Internationals’ is a two-year project which is funded by the Economic and Social Research Centre of the UK^{2}. It is based at CES and involves analysis of a dataset which has combined data from the different parts of the UK.