Advanced Biostatistical Methods

  • Dr Gustavo de los Campos & Dr Ana Vazquez

    faculty-icon

    Methods and Software for Large Scale Genomic Analysis

    The QuantGen group, led by Drs. de los Campos and Vazquez has developed several methods and software packages for analysis of large-scale genomic data sets. The packages developed and maintained by the group include:

    • BGLR software for high-dimensional Bayesian regression implements various types of shrinkage and variable selection methods and is optimized for analysis of very large genomic data sets. The development and maintenance of the package has been funded by R01 GM101219 (PI de los Campos, 2 cycles). [ R | GitHub | publication ]
    • BGData is a suite of R-packages for handling extremely large genomic data sets (millions of samples and millions of features). The package implements memory mapping of bed files, linked arrays, and many functions for genomic data analysis, including computing kinship and GWAS. The development and maintenance of the software was supported by R01 GM101219 (PI de los campos). [ R | GitHub |publication ]
    • MTM is a software for multi-trait analysis of complex traits. The functionality of this package has been incorporated (with a much more efficient and complete implementation) into the BGLR R-package. [ GitHub  ].
    • pedigreemm enables fitting mixed effects models including pedigree information. The package pedigreeTools implements various operations on pedigrees, including pruning, editing, computation of additive relationships and functions thereof (e.g., Cholesky decomposition). [ R | GitHub ]
    • pleiotest is an R-package for multi-trait GWAS and pleiotropy analysis. The package is largely developed in C++ and it is highly optimized for analysis of very large genomic data sets. [ GitHub ]
    • PedigreeTools: A suit of functions for pedigree analyses, including: sorting and editing pedigree data, computing inbreeding, additive relationships and functions of it.
    • MOSS (multi-omic integration with Sparse Singular Value Decomposition), integrates multiple and large omic data layers of high-dimensional and big size datasets. [ R | GitHub ].

     

    Analysis and prediction of complex human traits with biobank data

    The QuantGen group, led by Drs. de los Campos and Vazquez, has developed and implemented methods for complex-trait prediction DNA information. This research has been funded by NIH grants R01GM0999992 and R01GM101219(PI de los Campos) and span from the development of parametric and semi-parametric methods, algorithms, and applications involving biobank-sized data. Selected publications: de los Campos et al. (2010) , Makowsky et al. (2011), de los Campos et al. (2013) de los Campos et al. 2015, Kim et al. (2017) Bellot et al. (2018) Lello et al. (2019)

     

    Incorporating sex and ethnic differences in genomic models

    Classical genomic models assumes that the effects of genes on disease risk are homogeneous across the subject of a population. However, genomic research has shown that sex and ethnic differences can modulate the effects of genes. de los Campos and Vazquez have developed whole-genome regression models that incorporate sex and ethnic differences as well as high-dimensional environmental information. Selected publications: Jarquin et al. (2015), Veturi et al. (2018), Funkhouser et al. (2020).

     

    ORIGINS (PI Rebecca Knickmeyer)

    The QuantGen group contributes statistical and genetic expertise to the ORIGINs working group of the Enigma consortium. The prenatal and early postnatal period represents the foundational phase of human brain development This working group focuses on (1) Identifying genetic factors contributing to early brain development, (2) Developing predictive models for cognitive ability and emotional functioning using genetic variation, environmental risk factors, and neuroimaging phenotypes, and (3) Clarify how genetic risk for psychiatric disease manifests in infancy and early childhood.

     

  • Dr Chenxi Li

    faculty-icon  Faculty Page

     

    Genetic/genomic survival association and risk prediction

    Most of the genetic association studies of human diseases use case-control phenotypes (diseased vs. non-diseased). However, time-to-disease traits are more informative for the gene-disease association and are more suitable for building risk prediction models. We are developing robust and efficient statistical methods to detect genetic associations and predict disease risks with omics data for various types of survival outcomes and models. This project is led by Dr. Chenxi Li.

     

  • Dr Xiaoyu Liang
    faculty-icon Faculty Page

     

    Methods Development for Gene Selection of Complex Diseases

    To identify genetic variants associated with complex traits or diseases, people need to use data with larger sample size and more powerful tests, especially when causal genetic variants have weak effects. However, acquiring data of sufficient scale can be challenging. We have developed gene-based association tests that leverage publicly available GWAS summary statistics. These methods have been successfully employed to pinpoint genes associated with conditions like schizophrenia and fasting glucose levels.


    Methods Development for Joint Analysis of Multiple Phenotypes in Association Studies

    The association between genetic variants and a single phenotype is usually weak. It is increasingly recognized that joint analysis of multiple phenotypes can be potentially more powerful than the univariate analysis and can shed new light on underlying biological mechanisms of complex diseases. Our research primarily focuses on the development and application of innovative methods for jointly analyzing multiple phenotypes in genome-wide and epigenome-wide association studies, aimed at enhancing our ability to uncover the genetic underpinnings of complex diseases like chronic obstructive pulmonary disease, schizophrenia, and rheumatoid arthritis

  • Dr Zhehui Luo

    faculty-icon Faculty page

     

    Causal inference using observational data

    Many real-world interventions are not randomly assigned. Even in randomized controlled trials, intercurrent events may prevent valid inference of efficacy. Non-compliance and non-random drop-out may lead to selection bias and comprise the interval validity of a study. Dr. Zhehui Luo collaborates with epidemiologists, sociologists, clinicians and other scientists to find appropriate methods dealing with complications of causal inference using observational data. Currently she is exploring different impacts of non-compliance in placebo-controlled versus active-controlled trials and examining the potential of using spatial-temporal data to mimic randomized trials. The applications apply to trials for reproductive health and birth outcomes.

      

    Health Services Research

    Using large administrative claims data in research has advantages and disadvantages, which demands thorough understanding of the structure and limitations of such databases. Working closely with experts in generating, modifying and storing these data and utilizing her expertise in causal inference, Dr. Zhehui Luo provides insights in the design, analysis and interpretation of several program evaluation studies that have policy implications for improving population health. Currently she is investigating the impact of extending Medicaid benefit to persons affected by the Flint water crisis on health service utilization and implementing community health worker home visiting programs on birth outcomes. Dr. Zhehui Luo

     

  • Dr Hongxiang (David) Qiu

    facultyicon.png Faculty page

     

    Causal inference

    Causal effects answer questions like "What would happen to the outcome if an intervention were implemented". Without randomized trials, associations may differ from causal effects, and causal effect estimation requires careful reasoning about confounding bias, selection bias, or other sources of biases, while avoiding unrealistic assumptions on the population. Dr. Qiu works on developing optimal statistical procedures for causal inference under minimal assumptions. The methods apply to perinatal & pediatric epidemiology.


    Coarsened data & data fusion

    Data are often coarsened, for example, missing or censored. Researchers may also wish to leverage data from different sources to obtain more accurate estimators and better machine learning models. Dr. Qiu works on developing optimal statistical procedures to fully extract information from coarsened data and data from different sources. The methods apply to patient-reported outcomes, perinatal & pediatric epidemiology, vaccine trials, machine learning, and prediction sets.

 Biostatistics