Read PLINK files into Pandas data frames with support of mapping multiple BED
files at once.
It is as simple as:
from pandas_plink import read_plink
(bim, fam, G) = read_plink('/path/to/data')
assuming that you have the files
The returned matrix
0 Homozygous for first allele in .bim file
2 Homozygous for second allele in .bim file
NaN Missing genotype
G is a Dask array instead of an usual NumPy array.
It allows for lazy-loading large datasets that would not be able to fit