Introduction

Why pg_gpu?

pg_gpu is a GPU-accelerated population genetics library, loosely modeled on scikit-allel but with CuPy-powered CUDA kernels and an expanded set of statistics (windowed analyses, lostruct, moments-LD, Patterson’s F-statistics, accessible masks, block jackknife / bootstrap, and more). If a workflow runs in scikit-allel, the equivalent in pg_gpu typically returns the same numbers much faster.

The library is aimed at users who need to run at the scale of modern resequencing data – whole chromosomes, hundreds of thousands of variants, dozens to hundreds of windows – without writing custom CUDA. Furthermore, current simulation-based inference methods (e.g. ABC, SMC-ABC, and deep learning) require many thousands of simulations, and pg_gpu can be used to compute statistics on simulated data at scale.

What’s in the box

  • Diversity\(\pi\), Watterson’s \(\theta\), Tajima’s D, Fay-Wu’s H, Zeng’s E, Achaz framework theta estimators, heterozygosity, AFS.

  • Divergence – FST (Hudson / Weir-Cockerham / Nei), dxy, da, PBS, Snn, Gmin, dd, Zx.

  • Selection – iHS, nSL, XP-EHH, XP-nSL, EHH decay, Garud’s H.

  • LD – pairwise r2, ZnS, omega, sigma_D2, windowed LD decay, two-population moments-LD compatible with moments.LD.

  • Admixture – Patterson’s F2 / F3 / D, with block-jackknife wrappers.

  • Structure – PCA, randomized PCA, PCoA, GRM, local PCA / lostruct.

  • Resampling – general-purpose block_jackknife and block_bootstrap (including the ratio-of-sums case).

  • Windowed analysis – a single windowed_analysis entry point that fuses many of the above into a single GPU kernel pass.

Where to go next

  • Installation – system requirements and the (very short) pixi install steps.

  • Quick Start Guide – one short code block per major feature; the best place to skim the API surface.

  • Missing Data Handling – accessible-site masks, span normalization, and the include / exclude modes that affect every per-site statistic.

  • Examples – longer end-to-end demos, including reproducible scripts shipped under examples/.

  • API Reference – the autogenerated API reference for the public modules.