Research

(Figure: R output for various statistical analyses)

My professional experience in academia, industry, and government has exposed me to a wide range of statistical methods that are coupled with fascinating real-world data problems.  As a result, I have developed interest in a unique blend of research topics, some of which overlap. My work is applicable to diverse research problems, especially in areas where the principled use of innovative statistical approaches is essential.  This is central to my future research plans, which include both interdisciplinary and intradisciplinary collaboration.  Below I discuss some of my active research and general interests in each of these areas.

Mixture Models:  When sampled data arise from a population that consists of several homogenous subpopulations, then one can use (finite) mixture models to characterize the data.  Mixtures of regressions are used when the subpopulations consist of a response that is functionally related to a set of predictors.  My research focuses on developing semiparametric extensions to the standard mixture-of-linear-regressions model to allow more flexibility in the modeling process.  I also actively maintain the R package mixtools, which I developed as a graduate student.

Tolerance Regions:  Tolerance regions are statistical regions that are expected to contain at least a specified proportion of the sampled population for a given confidence level. They are heavily utilized in clinical and industrial applications, for quality control, and for environmental monitoring.  Tolerance intervals for univariate data are, perhaps, the most widely-studied.  However, very few distributions have closed-form solutions.  My research focuses on the development of improved tolerance intervals for discrete distributions, tolerance intervals (both pointwise and joint) for regression settings, and nonparametric multivariate tolerance regions.  I am also the sole developer and maintainer of the R package tolerance.

Data Depth:  Data depth is a method that provides a center-outward ranking of multivariate data.  Statistical data depth as a tool in multivariate analysis has seen considerable growth in recent years and data depth problems have yielded numerous collaborative projects at the intersection of mathematics, statistics, and computational geometry.  My research focuses on novel applications of data depth, such as for the construction of non-elliptical statistical regions (e.g., prediction and tolerance regions) and to inform strategies for trimming or Winsorizing data.

Zero-Inflated Models:  Zero-inflated models were first studied as a way to characterize a manufacturing process that moves randomly back-and-forth between a perfect state (where defects are extremely rare) and an imperfect state (where defects are possible, but not inevitable). The imperfect state is usually characterized using a Poisson or negative binomial distribution.  Besides manufacturing, zero-inflated models are commonly used in ecology, demography, and biology.  I am interested in the development of more flexible zero-inflated regression models, such as multivariate analogues to the traditional zero-inflated regression model.  

Computational Statistics:  Computational statistics is an essential component of any statistician's work.  My research discussed above depends heavily upon statistical algorithms and computationally-intensive procedures.  For example, many of the optimization algorithms developed for my mixture models research are called EM-like algorithms (and not true EM algorithms) since they do not have a provable ascent property; i.e., they do not guarantee an increase in any objective function at each iteration.  Depending on the error structure used for the semiparametric mixture-of-regressions model, there are numerous candidates for objective functions that one could use, each of which could have its own numerical complexities.

Non/Semiparametric Methods:  Nonparametric and semiparametric approaches are used when difficulties arise with the corresponding parametric approach.  With the increasing amount of big data being collected, the inherent complexities with such data often necessitates the use of non/semiparametric methods.  Much of my research incorporates non/semiparametric extensions to achieve greater flexibility.  For example, I have developed semiparametric mixtures-of-regressions models and I am currently developing an approach to construct hyperrectangular tolerance regions nonparametrically.

Astrostatistics:  Astrostatistics is the discipline concerning statistical analysis of astrophysical data.  While an old discipline, it has seen exponential growth in recent years.  Much credit is due to the Statistical Challenges in Modern Astronomy conferences and Summer Schools in Statistics for Astronomers, which have been organized by Penn State Professors G. Jogesh Babu and Eric Feigelson.  Since its inception in 2005, I have been involved some years with the summer schools either as a conference assistant or lecturer for the Astrostatistics R Tutorials.  I have also seen some of the fascinating data challenges facing today's astronomers.  I am interested in the analysis of gamma-ray bursts, which are explosions of intense gamma radiation that have been observed in distant galaxies.  Gamma ray burst data are typically analyzed using piecewise linear regression models.  I am interested in the development of novel regression models (e.g., imposing a mixture structure on the errors) to better characterize gamma ray burst data.

X