Advanced Biostatistical Methods

  • Dr Gustavo de los Campos & Dr Ana Vazquez


    Methods and Software for Large Scale Genomic Analysis

    The QuantGen group, led by Drs. de los Campos and Vazquez has developed several methods and software packages for analysis of large-scale genomic data sets. The packages developed and maintained by the group include:

    • BGLR software for high-dimensional Bayesian regression implements various types of shrinkage and variable selection methods and is optimized for analysis of very large genomic data sets. The development and maintenance of the package has been funded by R01 GM101219 (PI de los Campos, 2 cycles). [ R | GitHub | publication ]
    • BGData is a suite of R-packages for handling extremely large genomic data sets (millions of samples and millions of features). The package implements memory mapping of bed files, linked arrays, and many functions for genomic data analysis, including computing kinship and GWAS. The development and maintenance of the software was supported by R01 GM101219 (PI de los campos). [ R | GitHub |publication ]
    • MTM is a software for multi-trait analysis of complex traits. The functionality of this package has been incorporated (with a much more efficient and complete implementation) into the BGLR R-package. [ GitHub  ].
    • pedigreemm enables fitting mixed effects models including pedigree information. The package pedigreeTools implements various operations on pedigrees, including pruning, editing, computation of additive relationships and functions thereof (e.g., Cholesky decomposition). [ R | GitHub ]
    • pleiotest is an R-package for multi-trait GWAS and pleiotropy analysis. The package is largely developed in C++ and it is highly optimized for analysis of very large genomic data sets. [ GitHub ]
    • PedigreeTools: A suit of functions for pedigree analyses, including: sorting and editing pedigree data, computing inbreeding, additive relationships and functions of it.
    • MOSS (multi-omic integration with Sparse Singular Value Decomposition), integrates multiple and large omic data layers of high-dimensional and big size datasets. [ R | GitHub ].


    Analysis and prediction of complex human traits with biobank data

    The QuantGen group, led by Drs. de los Campos and Vazquez, has developed and implemented methods for complex-trait prediction DNA information. This research has been funded by NIH grants R01GM0999992 and R01GM101219(PI de los Campos) and span from the development of parametric and semi-parametric methods, algorithms, and applications involving biobank-sized data. Selected publications: de los Campos et al. (2010) , Makowsky et al. (2011), de los Campos et al. (2013) de los Campos et al. 2015, Kim et al. (2017) Bellot et al. (2018) Lello et al. (2019)


    Incorporating sex and ethnic differences in genomic models

    Classical genomic models assumes that the effects of genes on disease risk are homogeneous across the subject of a population. However, genomic research has shown that sex and ethnic differences can modulate the effects of genes. de los Campos and Vazquez have developed whole-genome regression models that incorporate sex and ethnic differences as well as high-dimensional environmental information. Selected publications: Jarquin et al. (2015), Veturi et al. (2018), Funkhouser et al. (2020).


    ORIGINS (PI Rebecca Knickmeyer)

    The QuantGen group contributes statistical and genetic expertise to the ORIGINs working group of the Enigma consortium. The prenatal and early postnatal period represents the foundational phase of human brain development This working group focuses on (1) Identifying genetic factors contributing to early brain development, (2) Developing predictive models for cognitive ability and emotional functioning using genetic variation, environmental risk factors, and neuroimaging phenotypes, and (3) Clarify how genetic risk for psychiatric disease manifests in infancy and early childhood.


  • Dr Joseph Gardiner

    faculty-icon  Faculty Page


    Improving Diabetic Patients’ Adherence to Treatment and Prevention of Cardiovascular Disease

    An NHBLI-sponsored study led by Dr. Ade Olomu, seeks to deliver an intervention to improve medication adherence with the goal of prevention of cardiovascular disease in patients with diabetes. The study adopts a cluster randomized design wherein at least 12 practice teams at federally qualified health centers in Alcona, Lansing and Saginaw will engage low SES population over a 12-month intervention. The control condition is the standard of care supplemented with patient education. The intervention employs best practices in delivering preventive services and meeting selected health care goals through a customized 15-week program of text-messages. The 5-year project will enroll over 300 patients. Dr. Joseph Gardiner serves as co-investigator and Study Biostatistician on the research team that includes experts in clinical medicine, healthcare communication, medical anthropology, and health services research.  


    Treating Brain Swelling in Pediatric Cerebral Malaria

    An NIAID-sponsored study lead by Dr. Terrie Taylor, seeks to examine strategies to treat children with cerebral malaria (CM). Malawian children with CM and severely increased brain volumes on screening brain magnetic resonance imaging with be randomized to one of three study arms: standard of care (SOC); SOC supplemented by intravenous hypertonic saline; SOC supplemented by early intubation and mechanical ventilation. The primary outcome is failure of the first treatment to which the child is assigned or death, whichever comes first. Practical considerations in conducting the study, and the annual 6-month malaria season were incorporated in a group sequential trial design with early stopping for either efficacy or futility. Dr. Joseph Gardiner serves as co-investigator and Study Biostatistician on this study.

  • Dr Chenxi Li

    faculty-icon  Faculty Page


    Genetic/genomic survival association and risk prediction

    Most of the genetic association studies of human diseases use case-control phenotypes (diseased vs. non-diseased). However, time-to-disease traits are more informative for the gene-disease association and are more suitable for building risk prediction models. We are developing robust and efficient statistical methods to detect genetic associations and predict disease risks with omics data for various types of survival outcomes and models. This project is led by Dr. Chenxi Li.


  • Dr Xiaoyu Liang
    faculty-icon Faculty Page


    Methods Development for Gene Selection of Complex Diseases

    To identify genetic variants associated with complex traits or diseases, people need to use data with larger sample size and more powerful tests, especially when causal genetic variants have weak effects. However, acquiring data of sufficient scale can be challenging. We have developed gene-based association tests that leverage publicly available GWAS summary statistics. These methods have been successfully employed to pinpoint genes associated with conditions like schizophrenia and fasting glucose levels.

    Methods Development for Joint Analysis of Multiple Phenotypes in Association Studies

    The association between genetic variants and a single phenotype is usually weak. It is increasingly recognized that joint analysis of multiple phenotypes can be potentially more powerful than the univariate analysis and can shed new light on underlying biological mechanisms of complex diseases. Our research primarily focuses on the development and application of innovative methods for jointly analyzing multiple phenotypes in genome-wide and epigenome-wide association studies, aimed at enhancing our ability to uncover the genetic underpinnings of complex diseases like chronic obstructive pulmonary disease, schizophrenia, and rheumatoid arthritis

  • Dr Zhehui Luo

    faculty-icon Faculty page


    Causal inference using observational data

    Many real-world interventions are not randomly assigned. Even in randomized controlled trials, intercurrent events may prevent valid inference of efficacy. Non-compliance and non-random drop-out may lead to selection bias and comprise the interval validity of a study. Dr. Zhehui Luo collaborates with epidemiologists, sociologists, clinicians and other scientists to find appropriate methods dealing with complications of causal inference using observational data. Currently she is exploring different impacts of non-compliance in placebo-controlled versus active-controlled trials and examining the potential of using spatial-temporal data to mimic randomized trials. The applications apply to trials for reproductive health and birth outcomes.


    Health Services Research

    Using large administrative claims data in research has advantages and disadvantages, which demands thorough understanding of the structure and limitations of such databases. Working closely with experts in generating, modifying and storing these data and utilizing her expertise in causal inference, Dr. Zhehui Luo provides insights in the design, analysis and interpretation of several program evaluation studies that have policy implications for improving population health. Currently she is investigating the impact of extending Medicaid benefit to persons affected by the Flint water crisis on health service utilization and implementing community health worker home visiting programs on birth outcomes. Dr. Zhehui Luo