Genotype imputation, or simply imputation in the context of our database, is to estimate the unobserved genotypes and replace the missing genotypes in a given dataset. Our imputation service is designed to meet 3 different requests for imputation:
  • achieving the best imputation result of Han population data with reference panel based on our NGS datasets of Han Chinese;
  • carrying out classical imputation tasks with public reference panels of global populations;
  • estimate and replace the missing genotypes in the data of users.

Our imputation service is implemented by common used tools: SHAPEIT4, IMPUTE2, Minimac3, Beagle5, PBWT (Only Beagle4 and PBWT can impute genotypes without reference panels). There are 5 reference panels available in our imputation service. Currently the imputation function is limitted to the biallelic SNV data.

Software and references:

Reference panels:

Reference panel from websites of SHAPEIT2 and IMPUTE2 which is based on 1000 Genome Phase 3 data (26 global populations, 2,504 individuals, 81,706,022 variants)

Reference panel based on CONVERGE dataset which only keeps the sites passed the filter recommended by the author of paper “11,670 whole genome sequences representative of the Han Chinese population from the CONVERGE project”(10,640 Han females, 5,814,870 variants)

Reference panel of Haplotype Reference Consortium (HRC) release 1.1 (22691 individuals in chromosome 1, 27165 individuals in the other chromosomes, 39,131,578 variants)

Reference panel based on Fermikit uniting variants of SGDP dataset (“The Simons Genome Diversity Project 300 genomes from 142 diverse populations”) from (263 individuals, 29,543,030 variants)

Reference panel of pure Han Chinese genomes, the combination of high coverage WGS of 319 Han individuals, low coverage WGS of 11,878 Han individuals, and 102,586 individuals with 8,056,973 variants genotyped or partially imputed.