pg_gpu Documentation
====================

GPU-accelerated population genetics statistics for Python.

.. toctree::
   :maxdepth: 2
   :caption: Contents:

   introduction
   installation
   features
   quickstart
   api
   missing_data
   examples
   tutorials
   workflows
   changelog

Overview
--------

pg_gpu provides GPU-accelerated computation of population genetics statistics
using CuPy. It covers linkage disequilibrium, diversity, divergence, selection
scans, site frequency spectra, admixture statistics, and dimensionality
reduction (PCA, PCoA, local PCA / lostruct).

Key Features
~~~~~~~~~~~~

* **Fast GPU computation** using CuPy with fused CUDA kernels for compute-intensive operations
* **Comprehensive statistics**: LD (D, D-squared, Dz, pi2, r/r-squared), diversity (pi, theta, Tajima's D, heterozygosity, Fay & Wu's H), divergence (FST Hudson/Weir-Cockerham/Nei, Dxy, Da, Snn, Gmin, dd, dd_rank, Zx), selection scans (iHS, XP-EHH, nSL, XP-nSL, Garud's H, EHH decay), SFS (unfolded, folded, joint, scaled), admixture (Patterson's F2, F3, D)
* **Fused windowed analysis**: compute all statistics across all genomic windows in a single GPU pass -- up to 60x faster than scikit-allel
* **Automatic missing data handling** across all modules
* **Quality-aware filtering** -- load VCF FORMAT / INFO arrays (``GQ``, ``DP``, ``MQ``, ...) with ``fields=``, mask variants and genotypes from them, and round-trip the survivors into a clean VCZ. See :doc:`tutorials/qc_fields`.
* **Multi-population analyses** with flexible population specification
* **8 theta estimators and 4 neutrality tests** (pi, theta_w, theta_h, theta_l, eta1, eta1_star, minus_eta1, minus_eta1_star, Tajima's D, Fay-Wu's H, Zeng's E, DH)
* **Validated against scikit-allel** -- 29 statistics verified at machine precision using real Ag1000G data
* **Biobank-scale streaming** -- VCZ stores too large to fit on the GPU open as a streaming view that walks the chromosome chunk by chunk; every per-window / SFS / moments-LD / pairwise relatedness kernel dispatches transparently. See :doc:`tutorials/biobank_streaming`.

Installation
------------

.. code-block:: bash

   pixi install
   pixi shell

Quick Example
-------------

.. code-block:: python

   from pg_gpu import HaplotypeMatrix, diversity, selection

   # Load data
   h = HaplotypeMatrix.from_vcf("data.vcf")

   # Diversity
   pi_val = diversity.pi(h)
   tajd = diversity.tajimas_d(h)

   # Selection scans
   ihs_scores = selection.ihs(h)

   # LD r-squared
   r2 = h.pairwise_r2()

Indices and tables
==================

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`