Changelog ========= v0.1.0 (Current) ----------------- First public release of pg_gpu. Core Data Structures ~~~~~~~~~~~~~~~~~~~~ * ``HaplotypeMatrix`` -- phased haplotype data (0/1 with -1 for missing). Loaders: ``from_vcf`` (with ``region=`` and ``samples=`` subsetting), ``from_zarr`` (auto-detects VCZ / scikit-allel / chromosome-grouped layouts), ``from_ts``, and direct NumPy construction. ``to_zarr`` writes VCZ by default. ``vcf_to_zarr`` provides multicore VCF-to-zarr conversion. Sample names from VCFs are preserved; ``load_pop_file('pops.txt')`` assigns populations using stored sample names. * ``GenotypeMatrix`` -- diploid genotypes (0/1/2). Same loaders and zarr round-trip as ``HaplotypeMatrix``. Many public functions auto-dispatch on input type (haplotype vs genotype). Linkage Disequilibrium ~~~~~~~~~~~~~~~~~~~~~~ * Core statistics: ``r``, ``r_squared``, ``dd`` (D-squared), ``dz``, ``pi2`` (Ragsdale & Gravel 2019), ``zns`` (Kelly), ``omega`` (Kim & Nielsen), ``mu_ld`` (RAiSD). * LD pruning: ``locate_unlinked``; windowed :math:`r^2` decay: ``windowed_r_squared``. * Two-population LD via ``compute_ld_statistics_gpu_two_pops`` with chunked GPU execution. Diversity Statistics ~~~~~~~~~~~~~~~~~~~~ * Theta estimators: ``pi``, ``theta_w``, ``theta_h``, ``theta_l``, ``eta1``, ``eta1_star``, ``minus_eta1``, ``minus_eta1_star``. * Neutrality tests: ``tajimas_d``, ``fay_wus_h``, ``normalized_fay_wus_h``, ``zeng_e``, ``zeng_dh``. * Heterozygosity / inbreeding: ``heterozygosity_expected``, ``heterozygosity_observed``, ``inbreeding_coefficient``. * Haplotype-level: ``haplotype_diversity``, ``haplotype_count``, ``daf_histogram``, ``diplotype_frequency_spectrum``. * ``FrequencySpectrum`` class for custom weight functions, SFS projection, and the Achaz (2009) variance framework. * All statistics accept ``missing_data='include' | 'exclude'`` and a unified ``span_normalize`` parameter that auto-detects the best denominator (accessible bases if mask set, else genomic span). Divergence Statistics ~~~~~~~~~~~~~~~~~~~~~ * FST estimators: ``fst_hudson`` (ratio of averages), ``fst_weir_cockerham``, ``fst_nei``; ``pairwise_fst`` for multiple populations. * Absolute / net divergence: ``dxy``, ``da``. * Population Branch Statistic: ``pbs`` (normalized PBSn1). * Distance-based two-population statistics (Schrider et al. 2018 and related): ``snn``, ``dxy_min``, ``gmin``, ``dd``, ``dd_rank``, ``zx``. Callers can pre-compute ``pairwise_distance_matrix`` once and pass it to multiple stats, or use the combined ``distance_based_stats``. Selection Scans ~~~~~~~~~~~~~~~ * Haplotype-based: ``ihs`` (fused CUDA kernel, bitmask pair tracking, block-level EHH reductions), ``nsl``, ``xpehh``, ``xpnsl``, ``ehh_decay``. * Garud's H: ``garud_h`` (H1, H12, H123, H2/H1) via GPU dot-product hashing of haplotypes; ``moving_garud_h`` uses cumulative prefix sums for O(1) per-window hash computation. * Standardization: ``standardize``, ``standardize_by_allele_count``. * Diploid variants: ``zns_diploid``, ``omega_diploid``, ``garud_h_diploid``, ``daf_histogram_diploid``. Site Frequency Spectrum ~~~~~~~~~~~~~~~~~~~~~~~ * Unfolded and folded SFS: ``sfs``, ``sfs_folded``, ``sfs_scaled``, ``sfs_folded_scaled``. * Two-population joint SFS: ``joint_sfs``, ``joint_sfs_folded``, ``joint_sfs_scaled``, ``joint_sfs_folded_scaled``. * Folding utilities: ``fold_sfs``, ``fold_joint_sfs``. Admixture / F-Statistics ~~~~~~~~~~~~~~~~~~~~~~~~ * Per-variant: ``patterson_f2`` (F2 branch length), ``patterson_f3`` (admixture test), ``patterson_d`` (ABBA-BABA). * Windowed: ``moving_patterson_f3``, ``moving_patterson_d``. * Block-jackknife with standard error: ``average_patterson_f3``, ``average_patterson_d``. Resampling ~~~~~~~~~~ * Public ``pg_gpu.resampling`` module with ``block_jackknife`` and ``block_bootstrap`` for block-resampled standard errors / CIs on any scalar genome-wide statistic (genome-wide mean Tajima's D, ratio-of-sums estimators, etc.). Promotes the previously private ``_jackknife`` helper from ``admixture``. The weighted jackknife follows the Busing et al. (1999) delete-:math:`m_j` formulation for unequal block sizes. * ``examples/sweep_tajimas_d_bootstrap.py`` -- 95% bootstrap CI on Tajima's D under a completed sweep, showing sweep-local vs distal mean difference CIs that exclude zero. Dimensionality Reduction and Distance ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * ``pca`` -- GPU-accelerated SVD PCA. * ``randomized_pca`` -- truncated-SVD approximation for large datasets. * ``pairwise_distance`` -- GPU-accelerated with memory-safe batching. * ``pcoa`` -- classical MDS from a distance matrix. * ``local_pca`` -- GPU port of Li & Ralph (2019) lostruct for detecting regions where population structure differs from the chromosome-wide pattern. Per-window top-k eigendecomposition via a single batched ``cp.linalg.eigh`` over a stacked ``(n_windows, n_samples, n_samples)`` tensor. * ``pc_dist`` -- Frobenius distance between per-window low-rank covariance reps via the trace identity (no cov-matrix re-materialization). L1, L2, or no normalization. * ``corners`` -- extreme-cluster selection in a 2D MDS embedding via Welzl's minimum enclosing circle. * ``local_pca_jackknife`` -- delete-1 block jackknife standard error of local PCs, also GPU-batched with sign-aligned replicates. * ``LocalPCAResult`` dataclass with ``.windows`` / ``.eigvals`` / ``.eigvecs`` / ``.sumsq`` plus ``.to_lostruct_matrix()`` for compatibility with the R ``lostruct::eigen_windows`` layout. Relatedness and Kinship ~~~~~~~~~~~~~~~~~~~~~~~ * ``grm`` -- Genetic Relationship Matrix (Yang et al. 2011). * ``ibs`` -- pairwise Identity-By-State proportions. Fused Windowed Analysis ~~~~~~~~~~~~~~~~~~~~~~~ The ``windowed_analysis()`` convenience function routes through fused CUDA kernels (one kernel launch for all windows) when using non-overlapping windows with ``missing_data='include'``: * Single-population: ``pi``, ``theta_w``, ``tajimas_d``, ``segregating_sites``, ``singletons``. * Two-population: ``fst``, ``fst_hudson``, ``fst_wc``, ``dxy``, ``da``. * Selection: ``garud_h1``, ``garud_h12``, ``garud_h123``, ``garud_h2h1``, ``mean_nsl``. * Structure: ``local_pca`` (returns a ``LocalPCAResult``; scalar stats requested alongside are merged onto ``result.windows``). * Structure: ``local_pca_jackknife`` computes delete-1 block jackknife standard error and populates ``LocalPCAResult.jackknife_se``. When both are requested together, per-window matrix preparation is shared. Lower-level windowed entry points: ``windowed_statistics`` (scatter-add aggregation) and ``windowed_statistics_fused`` (custom bin edges, one thread block per window). Distance Distribution Statistics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * ``pairwise_diffs`` -- Hamming distance distributions (haploid or diploid). * ``dist_var``, ``dist_skew``, ``dist_kurt`` -- moments of the pairwise-distance distribution (Schrider et al. 2018). * ``dist_moments`` -- all three in one call. diploSHIC / RAiSD Statistics ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * ``mu_var``, ``mu_sfs`` -- SNP density and SFS edge fraction (RAiSD). * ``max_daf`` -- maximum derived allele frequency. Visualization (``pg_gpu.plotting``) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ * SFS: ``plot_sfs``, ``plot_joint_sfs``. * LD: ``plot_pairwise_ld``, ``plot_ld_decay``. * PCA / structure: ``plot_pca``, ``plot_pairwise_distance``. * Windowed statistics: ``plot_windowed``, ``plot_windowed_panel``. * Haplotypes: ``plot_haplotype_frequencies``, ``plot_variant_locator``. Missing Data Handling ~~~~~~~~~~~~~~~~~~~~~ * Missing values are encoded as ``-1`` (haplotype) or ``-1`` sentinel (genotype). * Every statistic accepts ``missing_data='include'`` (per-site valid data, default) or ``missing_data='exclude'`` (only fully genotyped sites). Simulation testing confirms ``include`` is unbiased under MCAR. * LD projection estimator available via ``estimator='sigma_d2'`` on ``zns`` / ``omega``. The default is ``estimator='auto'``, which resolves to ``'sigma_d2'`` on ``HaplotypeMatrix`` inputs (the recommended path; uses the unbiased Ragsdale & Gravel 2019 estimators) and falls back to ``'r2'`` for pre-computed r² arrays or ``GenotypeMatrix`` inputs. ``windowed_analysis`` follows the same default for windowed ``zns`` and ``omega``. Moments Integration ~~~~~~~~~~~~~~~~~~~ ``pg_gpu.moments_ld.compute_ld_statistics`` is a GPU drop-in for ``moments.LD.Parsing.compute_ld_statistics``. Returns the 15 two-population LD statistics and 3 heterozygosity statistics in the exact layout moments expects for demographic inference. Requires the ``moments`` pixi environment: ``pixi install -e moments``. Validation ~~~~~~~~~~ * Cross-validation script (``tests/validate_against_allel.py``) comparing 31 statistics against scikit-allel on real Ag1000G data (1M variants, 200 haplotypes). Divergence, diversity, and selection statistics match scikit-allel at machine precision; a timing table is included. * Local PCA (lostruct) outputs validated against the R ``lostruct`` package via frozen JSON references committed under ``tests/data/``. R is **not** a dependency of the pixi env or CI -- the comparison runs against the committed JSON. An optional ``requires_r`` test regenerates the references via rpy2 when R + lostruct are available locally. Performance ~~~~~~~~~~~ All statistics run on CuPy with custom CUDA kernels for compute-bound paths. Scalar statistics at 1M variants, 200 haplotypes: .. list-table:: :header-rows: 1 :widths: 30 15 15 15 * - Statistic - allel (s) - pg_gpu (s) - Speedup * - Weir-Cockerham FST - 9.85 - 0.02 - **468x** * - Patterson F2 - 0.15 - 0.009 - **18x** * - nSL (255k variants) - 8.1 - 0.56 - **15x** * - Patterson F3 - 0.14 - 0.016 - **9x** * - EHH decay (255k) - 0.06 - 0.008 - **8x** * - Hudson FST - 0.12 - 0.017 - **7x** * - iHS (255k variants) - 9.9 - 1.5 - **7x** * - Dxy - 0.07 - 0.016 - **4x** Windowed statistics at 5.3M variants, 100kb windows, 200 haplotypes: .. list-table:: :header-rows: 1 :widths: 30 15 15 15 * - Statistic - allel (s) - pg_gpu (s) - Speedup * - pi + theta_w + tajimas_d - 0.81 - 0.013 - **60x** * - All 5 single-pop stats - 0.81 - 0.013 - **60x** * - FST (Hudson) - 0.59 - 0.18 - **3x** * - All 12 stats together - n/a - 0.66 - single call Examples ~~~~~~~~ End-to-end demo scripts in ``examples/``: * ``pg_gpu_tour.ipynb`` -- interactive tour using Anopheles gambiae X chromosome data. * ``admixture_detection.py`` -- block-jackknife ABBA-BABA on simulated null and admixed msprime scenarios. * ``accessibility_mask.py`` -- windowed :math:`\pi` with and without an accessibility mask over a low-:math:`\mu` "exon" region. * ``ld_blocks.py`` -- LD-block partitioning via :math:`r^2` bridging scores. * ``local_pca.py`` -- lostruct pipeline on a simulated partial selective sweep (``SweepGenicSelection`` with end frequency 0.5). Infrastructure ~~~~~~~~~~~~~~ * pixi-based environment management; ``moments`` integration lives in a separate pixi feature. * Shared ``_utils.py`` module for population extraction. * Public API returns NumPy arrays (not CuPy) -- no need to call ``.get()`` on results.