Estimation of Individual Admixture: Analytical and Study Design Considerations

Hua Tang1, Jie Peng2, Pei Wang2, Neil J. Risch3

1Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington 98109;

2Department of Statistics, and 3Department of Genetics, Stanford University, Stanford, California 94305

ABSTRACT:

The genome of an admixed individual represents a mixture of alleles from different ancestries. In the United States, the two largest minority groups, African Americans and Hispanics, are both admixed. An understanding of the admixture proportion at an individual level (individual admixture, or IA) is valuable for both population geneticists and epidemiologists who conduct case-control association studies in these groups. Here we present an extension of a previously described frequentist (maximum likelihood or ML) approach to estimate individual admixture that allows for uncertainty in ancestral allele frequencies. We compare this approach both to prior partial likelihood based methods as well as more recently described Bayesian MCMC methods. Our full ML method demonstrates increased robustness when compared to an existing partial ML approach. Simulations also suggest that this frequentist estimator achieves similar efficiency,measured by the mean squared error criterion, as Bayesian methods but requires just a tiny fraction of the computational time to produce point estimates, allowing for extensive analysis(e.g. simulations) not possible by Bayesian methods. Our simulation results demonstrate that inclusion of ancestral populations or their surrogates in the analysis is required by any method of IA estimation to obtain reasonable results.

Keywords: admixture, EM algorithm, maximum likelihood estimate.