In the database section, PGG.Han provides visualization of the fine-scale genetic structure of the Han Chinese population and genome-wide allele frequency of genetic and geographical sub-populations

Genetic structure (Jump)
1. Genetic Affinity

This page shows the genetic relationship within the Han Chinese population at two different levels. The first half is divided into sub-populations by provinces, and the second half is divided by genetic structure. Click on a sub-population on the map and the genetic relationship between the sub-population and other sub-populations is shown on the right.

2. Population Structure

This part shows the genetic coordinates of Han Chinese population, including grouping by province and by genetic structure. Click on a sub-population on the map and its genetic coordinates will be displayed on the right.

3. Ancestry

This part shows the genetic composition of the Han Chinese population in the context of worldwide populations. Each individual is represented by a single line broken into K colored segments, with lengths proportional to the K inferred components (Cs). The population IDs are presented outside of the circle of the plot. You can see the results of different K by clicking on the drop-down menu.

Variant (Jump)

The high- quality genome-wide SNP genotyped data can be queried on this page. We provide two different ways of querying, position or rsID. In addition to displaying a map of frequency distribution and a data table, we also provide external links to other databases.

In the analysis section, The PGG.Han provides: 1) nested AIMs panels for detecting and controlling population stratification in medical and evolutionary studies; 2) a population-structure-aware shared control for genotype-phenotype association studies (e.g., GWAS); 3) a Han-Chinese-specific reference panel for genotype imputation. Computational tools are implemented into the PGG.Han, and an online user interface is provided for data analysis and results visualization.

Data security

Any data you upload is protected seriously. Only you can read them. You can delete your data at any time form our servers. We do not use it for our own analyses.

Using the server
1. Prepare your data

Acceptable Input: Compressed VCF (*.vcf.gz), PLINK 1 binary(*.bed, *.bim, *.fam), EIGENSTRAT(*.geno, *.ind, *.snp).

The following information is required:
a. All alleles of the forward strand;
b. GRCh37 Coordinates.

2. Registration and Login (Jump)

Registration is required for first use of PGG.HAN Analysis Server. After logging in, the service can be used for free.

3. Upload data & QC (Jump)

We provide two ways to upload data. For small-scale data (<100MB), you can upload it directly on the web through HTTP protocol. For large-scale data, you need to upload to our server via FTP.

Before you start a analysis, you need to do a data check. Only data that completes data checking can be used for other analysis.

4. Pipelines

a. Ancestry inference
Various commonly used algorithms/analyses for ancestry inference are applied to dissect the ancestry composition and genetic affinity of an individual of interest. For more details and a demo report, click here.
b. Imputation
Our imputation service is designed to meet 3 different requests for imputation: 1) achieving the best imputation result of Han population data with reference panel based on our NGS datasets of Han Chinese; 2) carrying out classical imputation tasks with public reference panels of global populations; 3) estimate and replace the missing genotypes in the data of users. For more details and a demo report, click here.
We provide the platform for GWAS analysis, as well as the largest control of Han Chinese population (Han100K). Users only need to provide genotype data, covariate and phenotype files. For more details and a demo report, click here.

5. Results

All analysis results will be presented in the form of file downloads and visual online reports.

If you are going to run imputation with a reference panel, we recommend the combination “SHAPEIT2 + IMPUTE2” which achieves the highest imputation precision and sensitivity among all the combinations of tools. Since only Beagle4 and PBWT are able to do imputation without reference panels, they are the exclusive choices for recalling the missing genotypes in users’ data without reference panels.

The choices of reference panels are made upon your request for imputation:
  • If your data only contains Han samples and you only care about the Han population, reference panel “Han100K” works better than any other reference panels in our imputation service.
  • If your data contains samples of the other populations, reference panels “1KG” “HRC” and “SGDP” can be used to give you results which can be taken into analysis with global populations. While “HRC” includes “1KG”, the advantage of “HRC” over “1KG” mainly lies in the better representation of European populations. Since “SGDP” contains more population than “1KG” and “HRC”, the relatively small sample size limits its application.
  • If you only wish to recall the missing genotypes in your data rather than expand variant sites, please run imputation without reference panel.

Currently the imputation service is only able to handle the biallelic SNP data in autosomes. On the other hand, this issue also depends on whether the reference panel is used in imputation. If no reference panel is used, all variant in your data will be taken into imputation. While imputation with reference panel requires the quality control and strand recalibration of input data, there might be some sites discarded before imputation.