pg_gpu Documentation

GPU-accelerated population genetics statistics for Python.

Contents:

Overview

pg_gpu provides GPU-accelerated computation of population genetics statistics using CuPy. It covers linkage disequilibrium, diversity, divergence, selection scans, site frequency spectra, admixture statistics, and dimensionality reduction (PCA, PCoA, local PCA / lostruct).

Key Features

  • Fast GPU computation using CuPy with fused CUDA kernels for compute-intensive operations

  • Comprehensive statistics: LD (D, D-squared, Dz, pi2, r/r-squared), diversity (pi, theta, Tajima’s D, heterozygosity, Fay & Wu’s H), divergence (FST Hudson/Weir-Cockerham/Nei, Dxy, Da, Snn, Gmin, dd, dd_rank, Zx), selection scans (iHS, XP-EHH, nSL, XP-nSL, Garud’s H, EHH decay), SFS (unfolded, folded, joint, scaled), admixture (Patterson’s F2, F3, D)

  • Fused windowed analysis: compute all statistics across all genomic windows in a single GPU pass – up to 60x faster than scikit-allel

  • Automatic missing data handling across all modules

  • Quality-aware filtering – load VCF FORMAT / INFO arrays (GQ, DP, MQ, …) with fields=, mask variants and genotypes from them, and round-trip the survivors into a clean VCZ. See Quality-Aware Filtering: GQ, DP, MQ from VCF/VCZ.

  • Multi-population analyses with flexible population specification

  • 8 theta estimators and 4 neutrality tests (pi, theta_w, theta_h, theta_l, eta1, eta1_star, minus_eta1, minus_eta1_star, Tajima’s D, Fay-Wu’s H, Zeng’s E, DH)

  • Validated against scikit-allel – 29 statistics verified at machine precision using real Ag1000G data

  • Biobank-scale streaming – VCZ stores too large to fit on the GPU open as a streaming view that walks the chromosome chunk by chunk; every per-window / SFS / moments-LD / pairwise relatedness kernel dispatches transparently. See Biobank-Scale Streaming from VCZ.

Installation

pixi install
pixi shell

Quick Example

from pg_gpu import HaplotypeMatrix, diversity, selection

# Load data
h = HaplotypeMatrix.from_vcf("data.vcf")

# Diversity
pi_val = diversity.pi(h)
tajd = diversity.tajimas_d(h)

# Selection scans
ihs_scores = selection.ihs(h)

# LD r-squared
r2 = h.pairwise_r2()

Indices and tables