Estimation of Individual Admixtu

Estimation of Individual Admixture: Analytical and Study Design Considerations

Hua Tang¹, Jie Peng², Pei Wang², Neil J. Risch³

¹Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109;

²Department of Statistics, and ³Department of Genetics, Stanford University, Stanford, California 94305

ABSTRACT:

The genome of an admixed individual represents a mixture of alleles from different ancestries. In the United States, the two largest minority groups, African Americans and Hispanics, are both admixed. An understanding of the admixture proportion at an individual level (individual admixture, or IA) is valuable for both population geneticists and epidemiologists who conduct case-control association studies in these groups. Here we present an extension of a previously described frequentist (maximum likelihood or ML) approach to estimate individual admixture that allows for uncertainty in ancestral allele frequencies. We compare this approach both to prior partial likelihood based methods as well as more recently described Bayesian MCMC methods. Our full ML method demonstrates increased robustness when compared to an existing partial ML approach. Simulations also suggest that this frequentist estimator achieves similar efficiency,measured by the mean squared error criterion, as Bayesian methods but requires just a tiny fraction of the computational time to produce point estimates, allowing for extensive analysis(e.g. simulations) not possible by Bayesian methods. Our simulation results demonstrate that inclusion of ancestral populations or their surrogates in the analysis is required by any method of IA estimation to obtain reasonable results.

Keywords: admixture, EM algorithm, maximum likelihood estimate.