Features

pg_gpu provides GPU-accelerated computation of population genetics statistics using CuPy. All statistics return NumPy arrays and handle missing data automatically. Below is a comprehensive catalog of every implemented statistic.

Diversity Statistics

Function	Description	Reference
`pi`	Nucleotide diversity	Nei & Li (1979)
`theta_w`	Watterson’s theta	Watterson (1975)
`tajimas_d`	Tajima’s D neutrality test	Tajima (1989)
`fay_wus_h`	Fay & Wu’s H (excess high-frequency derived alleles)	Fay & Wu (2000)
`normalized_fay_wus_h`	Normalized H (H*)	Zeng et al. (2006)
`theta_h`	Fay & Wu’s theta_H	Fay & Wu (2000)
`theta_l`	Theta_L	Zeng et al. (2006)
`zeng_e`	Zeng’s E neutrality test	Zeng et al. (2006)
`zeng_dh`	Zeng’s DH joint test	Zeng et al. (2006)
`segregating_sites`	Count of segregating sites
`singleton_count`	Count of singletons
`haplotype_diversity`	Haplotype diversity (1 - sum of squared frequencies)
`haplotype_count`	Number of distinct haplotypes
`heterozygosity_expected`	Expected heterozygosity (gene diversity) per variant
`heterozygosity_observed`	Observed heterozygosity per variant
`inbreeding_coefficient`	Wright’s F per variant	Wright (1951)
`allele_frequency_spectrum`	Allele frequency spectrum
`max_daf`	Maximum derived allele frequency
`daf_histogram`	Derived allele frequency histogram
`diplotype_frequency_spectrum`	Diplotype (multi-locus genotype) frequency spectrum
`diversity_stats`	All core diversity statistics in one call

Divergence Statistics

Function	Description	Reference
`fst_hudson`	Hudson’s FST	Hudson et al. (1992)
`fst_weir_cockerham`	Weir & Cockerham’s FST (method of moments)	Weir & Cockerham (1984)
`fst_nei`	Nei’s GST	Nei (1973)
`dxy`	Absolute divergence (mean pairwise differences between pops)	Nei (1987)
`da`	Net divergence (Dxy minus mean within-pop pi)	Nei & Li (1979)
`pbs`	Population Branch Statistic	Yi et al. (2010)
`pairwise_fst`	Pairwise FST matrix for multiple populations

Distance-Based Two-Population Statistics

Function	Description	Reference
`snn`	Nearest-neighbor statistic	Hudson (2000)
`dxy_min`	Minimum pairwise distance between populations	Geneva et al. (2015)
`gmin`	Gmin ratio (Dxy_min / Dxy_mean)	Geneva et al. (2015)
`dd`	Relative minimum divergence (dd1, dd2)	Schrider et al. (2018)
`dd_rank`	Rank of minimum between-pop distance in within-pop distribution	Schrider et al. (2018)
`zx`	ZnS ratio (within-pop LD / total LD)	Schrider et al. (2018)

Linkage Disequilibrium

Function	Description	Reference
`r`	Pearson correlation between variant pairs
`r_squared`	Squared correlation (r-squared) between variant pairs
`dd` (LD)	D-squared (two-locus LD statistic)	Ragsdale & Gravel (2019)
`dz`	Dz statistic (multi-population LD)	Ragsdale & Gravel (2019)
`pi2`	Two-locus nucleotide diversity	Ragsdale & Gravel (2019)
`zns`	Kelly’s ZnS (mean pairwise LD); defaults to the unbiased \(\sigma_D^2\) estimator on `HaplotypeMatrix` inputs, falls back to naive \(r^2\) for pre-computed arrays.	Kelly (1997); Ragsdale & Gravel (2019)
`omega`	Kim & Nielsen’s Omega (partitioned LD); same default policy as `zns`.	Kim & Nielsen (2004); Ragsdale & Gravel (2019)
`mu_ld`	Haplotype pattern exclusivity (RAiSD LD component)	Alachiotis & Pavlidis (2018)

Selection Scans

Function	Description	Reference
`ihs`	Integrated Haplotype Score	Voight et al. (2006)
`nsl`	Number of Segregating Sites by Length	Ferrer-Admetlla et al. (2014)
`xpehh`	Cross-population Extended Haplotype Homozygosity	Sabeti et al. (2007)
`xpnsl`	Cross-population nSL	Szpiech et al. (2021)
`garud_h`	Garud’s H1, H12, H123, H2/H1	Garud et al. (2015)
`moving_garud_h`	Garud’s H in moving windows	Garud et al. (2015)
`ehh_decay`	Extended Haplotype Homozygosity decay	Sabeti et al. (2002)

Site Frequency Spectrum

Function	Description	Reference
`sfs`	Unfolded SFS
`sfs_folded`	Folded SFS (minor allele counts)
`sfs_scaled`	Scaled unfolded SFS
`sfs_folded_scaled`	Scaled folded SFS
`joint_sfs`	Joint SFS (two populations)
`joint_sfs_folded`	Folded joint SFS
`joint_sfs_scaled`	Scaled joint SFS
`joint_sfs_folded_scaled`	Scaled folded joint SFS
`fold_sfs`	Fold an unfolded SFS
`fold_joint_sfs`	Fold a joint SFS

Admixture and F-Statistics

Function	Description	Reference
`patterson_f2`	F2 branch length between two populations	Patterson et al. (2012)
`patterson_f3`	F3 admixture test	Patterson et al. (2012)
`patterson_d`	Patterson’s D (ABBA-BABA)	Patterson et al. (2012)
`moving_patterson_f3`	Windowed F3	Patterson et al. (2012)
`moving_patterson_d`	Windowed D	Patterson et al. (2012)
`average_patterson_f3`	F3 with block-jackknife standard error	Patterson et al. (2012)
`average_patterson_d`	D with block-jackknife standard error	Patterson et al. (2012)

Resampling (Block Jackknife and Bootstrap)

Function	Description	Reference
`block_jackknife`	Delete-1 block jackknife standard error; supports unequal block sizes	Busing et al. (1999)
`block_bootstrap`	Block bootstrap standard error and replicate distribution	Efron & Tibshirani (1993)

Both operate on pre-binned per-block values and a user-supplied statistic, so any scalar aggregate (genome-wide mean Tajima’s D, per-population \(\pi\), ratio-of-sums estimators like normed F3 / D) can get a calibrated standard error / CI with a single call.

FrequencySpectrum (Power-User SFS Interface)

The FrequencySpectrum class provides direct access to SFS-based estimation for custom weight functions, SFS projection, and the general Achaz (2009) variance framework.

Method	Description	Reference
`FrequencySpectrum.theta`	Any theta estimator as weighted SFS dot product	Achaz (2009)
`FrequencySpectrum.neutrality_test`	Generalized neutrality test from any two theta estimators	Achaz (2009)
`FrequencySpectrum.project`	SFS projection via hypergeometric sampling	Gutenkunst et al. (2009)

Built-in estimators: pi, watterson, theta_h, theta_l, eta1, eta1_star, minus_eta1, minus_eta1_star. Custom weight functions are also supported.

Dimensionality Reduction and Distance

Function	Description	Reference
`pca`	Principal Component Analysis (GPU-accelerated SVD)	Patterson et al. (2006)
`randomized_pca`	Randomized PCA (truncated SVD approximation)	Halko et al. (2011)
`pairwise_distance`	Pairwise genetic distance (Euclidean, cityblock, etc.)
`pcoa`	Principal Coordinate Analysis (classical MDS)
`local_pca`	Per-window PCA (lostruct); GPU-batched `eigh` over stacked per-window Gram matrices	Li & Ralph (2019)
`local_pca_jackknife`	Delete-1 block jackknife standard error of local PCs (batched)	Li & Ralph (2019)
`pc_dist`	Frobenius distance between per-window low-rank covariance reps	Li & Ralph (2019)
`corners`	Extreme-cluster selection in a 2D MDS embedding (Welzl MEC)	Li & Ralph (2019)

Relatedness and Kinship

Function	Description	Reference
`grm`	Genetic Relationship Matrix	Yang et al. (2011)
`ibs`	Pairwise Identity by State proportions

Distance Distribution Statistics

Function	Description	Reference
`pairwise_diffs`	Pairwise Hamming distances (haploid or diploid)
`dist_var`	Variance of pairwise distance distribution	Schrider et al. (2018)
`dist_skew`	Skewness of pairwise distance distribution	Schrider et al. (2018)
`dist_kurt`	Excess kurtosis of pairwise distance distribution	Schrider et al. (2018)
`dist_moments`	Variance, skewness, and kurtosis in one call	Schrider et al. (2018)

Biobank-Scale Streaming

A VCZ store is a Zarr encoding of a VCF stored on disk: the genotype matrix is split into compressed chunks, each chunk a small array of samples by variants. pg_gpu reads VCZ stores; if your data is in VCF you can convert it with the bio2zarr tools (vcf2zarr explode then vcf2zarr encode). The streaming codepath needs that VCZ layout because it relies on a fast per-chunk decode, an operation that can’t be done on a VCF. See Biobank-Scale Streaming from VCZ for the VCF→VCZ conversion and a worked end-to-end example.

A VCZ store too large for the GPU (tens to hundreds of thousands of haplotypes) opens via HaplotypeMatrix.from_zarr / GenotypeMatrix.from_zarr as a streaming view that walks the chromosome chunk by chunk; every kernel listed below dispatches on the streaming object the same way it would on a fully loaded matrix – the calling code is identical to the in-memory path.

What runs on a streaming matrix:

Entry point	How it streams
`windowed_analysis`	Each chunk’s windows are computed on the GPU as the chunk arrives, then rows are concatenated; window grids are aligned to the chunk grid so no window straddles a chunk boundary.
`sfs.sfs`, `sfs.joint_sfs`, `sfs.sfs_folded`, `sfs.joint_sfs_folded`	Each chunk contributes its own per-site frequency-spectrum counts; the chromosome-wide answer is the sum across chunks.
`sfs.project_joint_sfs`	Same per-chunk accumulation, but the joint SFS is projected (via hypergeometric sampling) to a small target grid as it is built, so the full `(n1+1, n2+1)` histogram is never materialized.
`HaplotypeMatrix.compute_ld_statistics_gpu_single_pop` / `_two_pops`	Variant-pair statistics within `max_bp_dist` are summed per chunk into the bp bins. Pairs that fall on opposite sides of a chunk boundary would be missed by naive per-chunk sums, so the last `max_bp_dist` of one chunk is carried forward and paired with the start of the next, counted exactly once. Returns moments-LD `DD`, `Dz`, `pi²` (3 stats single-pop, 15 stats two-pop).
`relatedness.ibs`, `relatedness.grm`	Streamed along the variant axis; the individual axis is tiled into row blocks. `(n_ind, n_ind)` accumulators live on host so the output can exceed GPU memory. `grm` is a two-pass operation (first to calculate chromosome-wide allele frequencies, second to accumulate a per-chunk outer product).
`StreamingHaplotypeMatrix.materialize(region, sample_subset)`	Pulls one sub-region (and optionally a subset of haplotypes) of the chromosome into GPU memory for kernels that need every variant simultaneously – `pairwise_r2`, Garud’s H, or any custom recipe.
`zarr_io.allel_zarr_to_vcz`	Streaming converter from scikit-allel layout to VCZ for stores that pre-date bio2zarr.

Fused Windowed Statistics

The windowed_analysis() function computes statistics across all genomic windows in a single GPU pass via fused CUDA kernels.

Statistic	Description
`pi`	Nucleotide diversity per window
`theta_w`	Watterson’s theta per window
`tajimas_d`	Tajima’s D per window
`segregating_sites`	Segregating site count per window
`singletons`	Singleton count per window
`fst`	Hudson’s FST per window
`fst_wc`	Weir-Cockerham FST per window
`dxy`	Absolute divergence per window
`da`	Net divergence per window
`garud_h1`, `garud_h12`, `garud_h123`, `garud_h2h1`	Garud’s H statistics per window
`mean_nsl`	Mean nSL per window
`local_pca`	Per-window local PCA (lostruct); returns a `LocalPCAResult` with eigvals, eigvecs, and window metadata