Read PLINK files into Pandas data frames with support of mapping multiple BED
files at once.
It is as simple as:
from pandas_plink import read_plink
(bim, fam, G) = read_plink('/path/to/data')
assuming that you have the files
The returned matrix
0 Homozygous having the first allele (column
2 Homozygous having the second allele (column
NaN Missing genotype
G is a Dask array instead of an usual NumPy array.
It allows for lazy-loading large datasets that would not be able to fit