pandas-plink
latest

Table of contents

  • Install
  • Usage
  • API
    • pandas_plink.Chunk
    • pandas_plink.get_data_folder
    • pandas_plink.read_grm
    • pandas_plink.read_plink
    • pandas_plink.read_plink1_bin
    • pandas_plink.read_rel
    • pandas_plink.write_plink1_bin
    • pandas_plink.test
pandas-plink
  • »
  • API »
  • pandas_plink.write_plink1_bin
  • Edit on GitHub

pandas_plink.write_plink1_bin¶

pandas_plink.write_plink1_bin(G, bed, bim=None, fam=None, major='variant', verbose=True)[source]¶

Write PLINK 1 binary files into a data array.

A PLINK 1 binary file set consists of three files:

  • BED: containing the genotype.

  • BIM: containing variant information.

  • FAM: containing sample information.

The user must provide the genotype (dosage) via a xarray.DataArray matrix with data type numpy.float32 or numpy.float64. That matrix must have two named dimensions: sample and variant. The only allowed values for the genotype are: 0, 1, 2, and math.nan.

Examples

The following example produces a BED file with data.

>>> from xarray import DataArray
>>> from pandas_plink import read_plink1_bin, write_plink1_bin
>>>
>>> G = DataArray(
...     [[3.0, 2.0, 2.0], [0.0, 0.0, 1.0]],
...     dims=["sample", "variant"],
...     coords = dict(
...         sample  = ["boffy", "jolly"],
...         fid     = ("sample", ["humin"] * 2 ),
...
...         variant = ["not", "sure", "what"],
...         snp     = ("variant", ["rs1", "rs2", "rs3"]),
...         chrom   = ("variant", ["1", "1", "2"]),
...         a0      = ("variant", ['A', 'T', 'G']),
...         a1      = ("variant", ['C', 'A', 'T']),
...     )
... )
>>>
>>> print(G)
<xarray.DataArray (sample: 2, variant: 3)>
array([[3.00, 2.00, 2.00],
       [0.00, 0.00, 1.00]])
Coordinates:
  * sample   (sample) <U5 'boffy' 'jolly'
    fid      (sample) <U5 'humin' 'humin'
  * variant  (variant) <U4 'not' 'sure' 'what'
    snp      (variant) <U3 'rs1' 'rs2' 'rs3'
    chrom    (variant) <U1 '1' '1' '2'
    a0       (variant) <U1 'A' 'T' 'G'
    a1       (variant) <U1 'C' 'A' 'T'
>>> write_plink1_bin(G, "sample.bed", verbose=False)
>>>
>>> G = read_plink1_bin("sample.bed", verbose=False)
>>> print(G)
<xarray.DataArray 'genotype' (sample: 2, variant: 3)>
dask.array<transpose, shape=(2, 3), dtype=float32, chunksize=(2, 3), chunktype=numpy.ndarray>
Coordinates: (12/14)
  * sample   (sample) object 'boffy' 'jolly'
  * variant  (variant) <U8 'variant0' 'variant1' 'variant2'
    fid      (sample) object 'humin' 'humin'
    iid      (sample) object 'boffy' 'jolly'
    father   (sample) object '?' '?'
    mother   (sample) object '?' '?'
    ...       ...
    chrom    (variant) object '1' '1' '2'
    snp      (variant) object 'rs1' 'rs2' 'rs3'
    cm       (variant) float64 0.0 0.0 0.0
    pos      (variant) int32 0 0 0
    a0       (variant) object 'A' 'T' 'G'
    a1       (variant) object 'C' 'A' 'T'

The following example reads two BED files and two BIM files correspondig to chromosomes 11 and 12, and read a single FAM file whose filename is inferred from the BED filenames. It then saves the resulting matrix to disk.

>>> from os.path import join
>>> from pandas_plink import read_plink1_bin, write_plink1_bin
>>> from pandas_plink import get_data_folder
>>>
>>> G = read_plink1_bin(join(get_data_folder(), "chr*.bed"), verbose=False)
>>> write_plink1_bin(G, "all.bed", verbose=False)
>>> G = read_plink1_bin("all.bed", verbose=False)
>>> print(G)
<xarray.DataArray 'genotype' (sample: 14, variant: 1252)>
dask.array<transpose, shape=(14, 1252), dtype=float32, chunksize=(14, 1024), chunktype=numpy.ndarray>
Coordinates: (12/14)
  * sample   (sample) object 'B001' 'B002' 'B003' ... 'B012' 'B013' 'B014'
  * variant  (variant) <U11 'variant0' 'variant1' ... 'variant1251'
    fid      (sample) object 'B001' 'B002' 'B003' ... 'B012' 'B013' 'B014'
    iid      (sample) object 'B001' 'B002' 'B003' ... 'B012' 'B013' 'B014'
    father   (sample) object '0' '0' '0' '0' '0' '0' ... '0' '0' '0' '0' '0' '0'
    mother   (sample) object '0' '0' '0' '0' '0' '0' ... '0' '0' '0' '0' '0' '0'
    ...       ...
    chrom    (variant) object '11' '11' '11' '11' '11' ... '12' '12' '12' '12'
    snp      (variant) object '316849996' '316874359' ... '373081507'
    cm       (variant) float64 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
    pos      (variant) int32 157439 181802 248969 ... 27163741 27205125 27367844
    a0       (variant) object 'C' 'G' 'G' 'C' 'C' 'T' ... 'A' 'G' 'A' 'T' 'G'
    a1       (variant) object 'T' 'C' 'C' 'T' 'T' 'A' ... 'G' 'A' 'T' 'C' 'A'
Parameters
  • G (DataArray) – Genotype matrix with metainformation about samples and variants.

  • bed (Union[str, Path]) – Path to a BED file.

  • bim (Union[str, Path, None]) – Path to a BIM file.It defaults to None, in which case it will try to be inferred.

  • fam (Union[str, Path, None]) – Path to a FAM file. It defaults to None, in which case it will try to be inferred.

  • major (str) – It can be either "sample" or "variant" (recommended and default). Specify the matrix layout on the BED file.

  • verbose (bool) – True for progress information; False otherwise.

Next Previous

© Copyright 2018, Danilo Horta. Revision b580df7f.

Built with Sphinx using a theme provided by Read the Docs.