pandas-plink
latest

Table of contents

  • Install
  • Usage
  • API
    • pandas_plink.Chunk
    • pandas_plink.get_data_folder
    • pandas_plink.read_grm
    • pandas_plink.read_plink
    • pandas_plink.read_plink1_bin
    • pandas_plink.read_rel
    • pandas_plink.write_plink1_bin
    • pandas_plink.test
pandas-plink
  • »
  • API »
  • pandas_plink.read_grm
  • Edit on GitHub

pandas_plink.read_grm¶

pandas_plink.read_grm(filepath, id_filepath=None, n_snps_filepath=None)[source]¶

Read GCTA realized relationship matrix files.

A GRM file set consists of two or three files: (i) one containing the covariance matrix; (ii) one contaning sample IDs; and (iii) possibly another one containing the number of non-missing SNPs.

It supports plain text, binary, and compressed files. The usual file extensions for those types are .grm, grm.bin, and .grm.gz, respectively.

Example

>>> from os.path import join
>>> from pandas_plink import read_grm
>>> from pandas_plink import get_data_folder
>>> filepath = join(get_data_folder(), "grm-list", "plink2.grm")
>>> id_filepath = join(get_data_folder(), "grm-list", "plink2.grm.id")
>>> (K, n_snps) = read_grm(filepath, id_filepath)
>>> print(K)
<xarray.DataArray (sample_0: 10, sample_1: 10)>
array([[ 0.89,  0.23, -0.19, -0.01, -0.14,  0.29,  0.27, -0.23, -0.10,
        -0.21],
       [ 0.23,  1.08, -0.45,  0.19, -0.19,  0.17,  0.41, -0.01, -0.13,
        -0.13],
       [-0.19, -0.45,  1.18, -0.04, -0.15, -0.20, -0.31, -0.04,  0.30,
        -0.01],
       [-0.01,  0.19, -0.04,  0.90, -0.07,  0.01,  0.06, -0.19, -0.09,
         0.17],
       [-0.14, -0.19, -0.15, -0.07,  1.18,  0.09, -0.03,  0.10,  0.22,
         0.17],
       [ 0.29,  0.17, -0.20,  0.01,  0.09,  0.96,  0.07, -0.04, -0.09,
        -0.23],
       [ 0.27,  0.41, -0.31,  0.06, -0.03,  0.07,  0.71, -0.10, -0.09,
        -0.06],
       [-0.23, -0.01, -0.04, -0.19,  0.10, -0.04, -0.10,  1.42, -0.30,
        -0.07],
       [-0.10, -0.13,  0.30, -0.09,  0.22, -0.09, -0.09, -0.30,  0.91,
        -0.02],
       [-0.21, -0.13, -0.01,  0.17,  0.17, -0.23, -0.06, -0.07, -0.02,
         0.91]])
Coordinates:
  * sample_0  (sample_0) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
  * sample_1  (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
    fid       (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
    iid       (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
>>> print(n_snps)
[50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50
 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50
 50 50 50 50 50 50 50]
>>> filepath = join(get_data_folder(), "grm-bin", "plink.grm.bin")
>>> id_filepath = join(get_data_folder(), "grm-bin", "plink.grm.id")
>>> n_snps_filepath = join(get_data_folder(), "grm-bin", "plink.grm.N.bin")
>>> (K, n_snps) = read_grm(filepath, id_filepath, n_snps_filepath)
>>> print(K)
<xarray.DataArray (sample_0: 10, sample_1: 10)>
array([[ 0.79,  0.10, -0.19, -0.07, -0.27,  0.15,  0.20, -0.27, -0.18,
        -0.26],
       [ 0.10,  1.07, -0.45,  0.04, -0.34, -0.01,  0.23, -0.15, -0.25,
        -0.25],
       [-0.19, -0.45,  1.39, -0.07, -0.24, -0.23, -0.35, -0.09,  0.28,
        -0.05],
       [-0.07,  0.04, -0.07,  0.82, -0.16, -0.13, -0.05, -0.30, -0.17,
         0.08],
       [-0.27, -0.34, -0.24, -0.16,  1.08, -0.10, -0.12, -0.03,  0.11,
         0.06],
       [ 0.15, -0.01, -0.23, -0.13, -0.10,  0.94, -0.05, -0.11, -0.16,
        -0.31],
       [ 0.20,  0.23, -0.35, -0.05, -0.12, -0.05,  0.59, -0.18, -0.14,
        -0.13],
       [-0.27, -0.15, -0.09, -0.30, -0.03, -0.11, -0.18,  1.49, -0.32,
        -0.05],
       [-0.18, -0.25,  0.28, -0.17,  0.11, -0.16, -0.14, -0.32,  0.89,
        -0.06],
       [-0.26, -0.25, -0.05,  0.08,  0.06, -0.31, -0.13, -0.05, -0.06,
         0.95]])
Coordinates:
  * sample_0  (sample_0) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
  * sample_1  (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
    fid       (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
    iid       (sample_1) object 'HG00419' 'HG00650' ... 'NA20508' 'NA20753'
>>> print(n_snps)
[50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50
 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50 50
 50 50 50 50 50 50 50]
Parameters
  • filepath (str) – Path to the matrix file.

  • id_filepath (str, optional) – Path to the file containing family and individual IDs. It defaults to None, in which case it will try to be inferred.

  • n_snps_filepath (str, optional) – Path to the file containing the number of non-missing SNPs. It defaults to None, in which case it will try to be inferred.

Returns

  • grm (xarray.DataArray) – Realized relationship matrix.

  • n_snps (numpy.ndarray) – Number of non-missing SNPs.

Next Previous

© Copyright 2018, Danilo Horta. Revision b580df7f.

Built with Sphinx using a theme provided by Read the Docs.