Introduction
Why pg_gpu?
pg_gpu is a GPU-accelerated population genetics library, loosely
modeled on scikit-allel but
with CuPy-powered CUDA kernels and an expanded set of statistics
(windowed analyses, lostruct, moments-LD, Patterson’s F-statistics,
accessible masks, block jackknife / bootstrap, and more). If a workflow
runs in scikit-allel, the equivalent in pg_gpu typically returns
the same numbers much faster.
The library is aimed at users who need to run at the scale of modern
resequencing data – whole chromosomes, hundreds of thousands of
variants, dozens to hundreds of windows – without writing custom CUDA.
Furthermore, current simulation-based inference methods (e.g. ABC,
SMC-ABC, and deep learning) require many thousands of simulations, and
pg_gpu can be used to compute statistics on simulated data at scale.
What’s in the box
Diversity – \(\pi\), Watterson’s \(\theta\), Tajima’s D, Fay-Wu’s H, Zeng’s E, Achaz framework theta estimators, heterozygosity, AFS.
Divergence – FST (Hudson / Weir-Cockerham / Nei), dxy, da, PBS, Snn, Gmin, dd, Zx.
Selection – iHS, nSL, XP-EHH, XP-nSL, EHH decay, Garud’s H.
LD – pairwise r2, ZnS, omega, sigma_D2, windowed LD decay, two-population moments-LD compatible with
moments.LD.Admixture – Patterson’s F2 / F3 / D, with block-jackknife wrappers.
Structure – PCA, randomized PCA, PCoA, GRM, local PCA / lostruct.
Resampling – general-purpose
block_jackknifeandblock_bootstrap(including the ratio-of-sums case).Windowed analysis – a single
windowed_analysisentry point that fuses many of the above into a single GPU kernel pass.
Where to go next
Installation – system requirements and the (very short) pixi install steps.
Quick Start Guide – one short code block per major feature; the best place to skim the API surface.
Missing Data Handling – accessible-site masks, span normalization, and the
include/excludemodes that affect every per-site statistic.Examples – longer end-to-end demos, including reproducible scripts shipped under
examples/.API Reference – the autogenerated API reference for the public modules.