pg_gpu Documentation ==================== GPU-accelerated population genetics statistics for Python. .. toctree:: :maxdepth: 2 :caption: Contents: introduction installation features quickstart api missing_data examples tutorials workflows changelog Overview -------- pg_gpu provides GPU-accelerated computation of population genetics statistics using CuPy. It covers linkage disequilibrium, diversity, divergence, selection scans, site frequency spectra, admixture statistics, and dimensionality reduction (PCA, PCoA, local PCA / lostruct). Key Features ~~~~~~~~~~~~ * **Fast GPU computation** using CuPy with fused CUDA kernels for compute-intensive operations * **Comprehensive statistics**: LD (D, D-squared, Dz, pi2, r/r-squared), diversity (pi, theta, Tajima's D, heterozygosity, Fay & Wu's H), divergence (FST Hudson/Weir-Cockerham/Nei, Dxy, Da, Snn, Gmin, dd, dd_rank, Zx), selection scans (iHS, XP-EHH, nSL, XP-nSL, Garud's H, EHH decay), SFS (unfolded, folded, joint, scaled), admixture (Patterson's F2, F3, D) * **Fused windowed analysis**: compute all statistics across all genomic windows in a single GPU pass -- up to 60x faster than scikit-allel * **Automatic missing data handling** across all modules * **Quality-aware filtering** -- load VCF FORMAT / INFO arrays (``GQ``, ``DP``, ``MQ``, ...) with ``fields=``, mask variants and genotypes from them, and round-trip the survivors into a clean VCZ. See :doc:`tutorials/qc_fields`. * **Multi-population analyses** with flexible population specification * **8 theta estimators and 4 neutrality tests** (pi, theta_w, theta_h, theta_l, eta1, eta1_star, minus_eta1, minus_eta1_star, Tajima's D, Fay-Wu's H, Zeng's E, DH) * **Validated against scikit-allel** -- 29 statistics verified at machine precision using real Ag1000G data * **Biobank-scale streaming** -- VCZ stores too large to fit on the GPU open as a streaming view that walks the chromosome chunk by chunk; every per-window / SFS / moments-LD / pairwise relatedness kernel dispatches transparently. See :doc:`tutorials/biobank_streaming`. Installation ------------ .. code-block:: bash pixi install pixi shell Quick Example ------------- .. code-block:: python from pg_gpu import HaplotypeMatrix, diversity, selection # Load data h = HaplotypeMatrix.from_vcf("data.vcf") # Diversity pi_val = diversity.pi(h) tajd = diversity.tajimas_d(h) # Selection scans ihs_scores = selection.ihs(h) # LD r-squared r2 = h.pairwise_r2() Indices and tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`