snpsο
tools for reading, writing, merging, and remapping SNPs

snpsο
tools for reading, writing, merging, and remapping SNPs π§¬
snps
strives to be an easy-to-use and accessible open-source library for working with
genotype data
Featuresο
Input / Outputο
Read raw data (genotype) files from a variety of direct-to-consumer (DTC) DNA testing sources with a SNPs object
Read and write VCF files (e.g., convert 23andMe to VCF)
Merge raw data files from different DNA tests, identifying discrepant SNPs in the process
Read data in a variety of formats (e.g., files, bytes, compressed with gzip or zip)
Handle several variations of file types, validated via openSNP parsing analysis
Build / Assembly Detection and Remappingο
Detect the build / assembly of SNPs (supports builds 36, 37, and 38)
Remap SNPs between builds / assemblies
Data Cleaningο
Perform quality control (QC) / filter low quality SNPs based on chip clusters
Fix several common issues when loading SNPs
Sort SNPs based on chromosome and position
Deduplicate RSIDs
Deduplicate alleles in the non-PAR regions of the X and Y chromosomes for males
Deduplicate alleles on MT
Assign PAR SNPs to the X or Y chromosome
Analysisο
Derive sex from SNPs
Detect deduced genotype / chip array and chip version based on chip clusters
Predict ancestry from SNPs (when installed with ezancestry)
Supported Genotype Filesο
snps
supports VCF files and
genotype files from the following DNA testing sources:
Additionally, snps
can read a variety of βgenericβ CSV and TSV files.
Dependenciesο
snps
requires Python 3.7.1+ and the following Python
packages:
Installationο
snps
is available on the
Python Package Index. Install snps
(and its required
Python dependencies) via pip
:
$ pip install snps
For ancestry prediction
capability, snps
can be installed with ezancestry:
$ pip install snps[ezancestry]
Examplesο
Download Example Dataο
First, letβs setup logging to get some helpful output:
>>> import logging, sys
>>> logger = logging.getLogger()
>>> logger.setLevel(logging.INFO)
>>> logger.addHandler(logging.StreamHandler(sys.stdout))
Now weβre ready to download some example data from openSNP:
>>> from snps.resources import Resources
>>> r = Resources()
>>> paths = r.download_example_datasets()
Downloading resources/662.23andme.340.txt.gz
Downloading resources/662.ftdna-illumina.341.csv.gz
Load Raw Dataο
Load a 23andMe raw data file:
>>> from snps import SNPs
>>> s = SNPs("resources/662.23andme.340.txt.gz")
>>> s.source
'23andMe'
>>> s.count
991786
The SNPs
class accepts a path to a file or a bytes object. A Reader
class attempts to
infer the data source and load the SNPs. The loaded SNPs are
normalized and
available via a pandas.DataFrame
:
>>> df = s.snps
>>> df.columns.values
array(['chrom', 'pos', 'genotype'], dtype=object)
>>> df.index.name
'rsid'
>>> df.chrom.dtype.name
'object'
>>> df.pos.dtype.name
'uint32'
>>> df.genotype.dtype.name
'object'
>>> len(df)
991786
snps
also attempts to detect the build / assembly of the data:
>>> s.build
37
>>> s.build_detected
True
>>> s.assembly
'GRCh37'
Merge Raw Data Filesο
The dataset consists of raw data files from two different DNA testing sources - letβs combine
these files. Specifically, weβll update the SNPs
object with SNPs from a
Family Tree DNA file.
>>> merge_results = s.merge([SNPs("resources/662.ftdna-illumina.341.csv.gz")])
Merging SNPs('662.ftdna-illumina.341.csv.gz')
SNPs('662.ftdna-illumina.341.csv.gz') has Build 36; remapping to Build 37
Downloading resources/NCBI36_GRCh37.tar.gz
27 SNP positions were discrepant; keeping original positions
151 SNP genotypes were discrepant; marking those as null
>>> s.source
'23andMe, FTDNA'
>>> s.count
1006960
>>> s.build
37
>>> s.build_detected
True
If the SNPs being merged have a build that differs from the destination build, the SNPs to merge
will be remapped automatically. After this example merge, the build is still detected, since the
build was detected for all SNPs
objects that were merged.
As the data gets added, itβs compared to the existing data, and SNP position and genotype
discrepancies are identified. (The discrepancy thresholds can be tuned via parameters.) These
discrepant SNPs are available for inspection after the merge via properties of the SNPs
object.
>>> len(s.discrepant_merge_genotypes)
151
Additionally, any non-called / null genotypes will be updated during the merge, if the file being merged has a called genotype for the SNP.
Moreover, merge
takes a chrom
parameter - this enables merging of only SNPs associated
with the specified chromosome (e.g., βYβ or βMTβ).
Finally, merge
returns a list of dict
, where each dict
has information corresponding
to the results of each merge (e.g., SNPs in common).
>>> sorted(list(merge_results[0].keys()))
['common_rsids', 'discrepant_genotype_rsids', 'discrepant_position_rsids', 'merged']
>>> merge_results[0]["merged"]
True
>>> len(merge_results[0]["common_rsids"])
692918
Remap SNPsο
Now, letβs remap the merged SNPs to change the assembly / build:
>>> s.snps.loc["rs3094315"].pos
752566
>>> chromosomes_remapped, chromosomes_not_remapped = s.remap(38)
Downloading resources/GRCh37_GRCh38.tar.gz
>>> s.build
38
>>> s.assembly
'GRCh38'
>>> s.snps.loc["rs3094315"].pos
817186
SNPs can be remapped between Build 36 (NCBI36
), Build 37 (GRCh37
), and Build 38
(GRCh38
).
Save SNPsο
Ok, so far weβve merged the SNPs from two files (ensuring the same build in the process and identifying discrepancies along the way). Then, we remapped the SNPs to Build 38. Now, letβs save the merged and remapped dataset consisting of 1M+ SNPs to a tab-separated values (TSV) file:
>>> saved_snps = s.to_tsv("out.txt")
Saving output/out.txt
>>> print(saved_snps)
output/out.txt
Moreover, letβs get the reference sequences for this assembly and save the SNPs as a VCF file:
>>> saved_snps = s.to_vcf("out.vcf")
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.1.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.2.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.3.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.4.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.5.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.6.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.7.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.8.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.9.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.10.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.11.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.12.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.13.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.14.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.15.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.16.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.17.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.18.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.19.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.20.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.22.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.X.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.Y.fa.gz
Downloading resources/fasta/GRCh38/Homo_sapiens.GRCh38.dna.chromosome.MT.fa.gz
Saving output/out.vcf
1 SNP positions were found to be discrepant when saving VCF
When saving a VCF, if any SNPs have positions outside of the reference sequence, they are marked
as discrepant and are available via a property of the SNPs
object.
All output files are saved to the output directory.
Documentationο
Documentation is available here.
Acknowledgementsο
Thanks to Mike Agostino, Padma Reddy, Kevin Arvai, openSNP, Open Humans, and Sano Genetics.
Output Filesο
The various output files produced by snps
are detailed below. Output files are saved in the
output directory, which is defined at the instantiation of a SNPs
object.
Save SNPsο
SNPs can be saved with SNPs.save
. By default, one tab-separated
.txt
or .vcf
file (vcf=True
) is output when SNPs are saved. If comma is specified as
the separator (sep=","
), the default extension is .csv
.
The content of non-VCF files (after comment lines, which start with #
) is as follows:
Column |
Description |
---|---|
rsid |
SNP ID |
chromosome |
Chromosome of SNP |
position |
Position of SNP |
genotype |
Genotype of SNP |
When filename
is not specified, default filenames are used as described below.
SNPs.save
ο
<source>_<assembly>.txt / <source>_<assembly>.csvο
Where source
is the detected source(s) of SNPs data and assembly
is the assembly of the
SNPs being saved.
Installationο
snps
is available on the
Python Package Index. Install snps
(and its required
Python dependencies) via pip
:
$ pip install snps
Installation and Usage on a Raspberry Piο
The instructions below provide the steps to install snps
on a
Raspberry Pi (tested with
βRaspberry Pi OS (32-bit) Liteβ,
release date 2020-08-20). For more details about Python on the Raspberry Pi, see
here.
Note
Text after a prompt (e.g., $
) is the command to type at the command line. The
instructions assume a fresh install of Raspberry Pi OS and that after logging in as
the pi
user, the current working directory is /home/pi
.
Install
pip
for Python 3:pi@raspberrypi:~ $ sudo apt install python3-pip
Press βyβ followed by βenterβ to continue. This enables us to install packages from the Python Package Index.
Install the
venv
module:pi@raspberrypi:~ $ sudo apt install python3-venv
Press βyβ followed by βenterβ to continue. This enables us to create a virtual environment to isolate the
snps
installation from other system Python packages.-
pi@raspberrypi:~ $ sudo apt install libatlas-base-dev
Press βyβ followed by βenterβ to continue. This is required for NumPy, a dependency of
snps
. Create a directory for
snps
and change working directory:pi@raspberrypi:~ $ mkdir snps pi@raspberrypi:~ $ cd snps
Create a virtual environment for
snps
:pi@raspberrypi:~/snps $ python3 -m venv .venv
The virtual environment is located at
/home/pi/snps/.venv
.Activate the virtual environment:
pi@raspberrypi:~/snps $ source .venv/bin/activate
Now when you invoke Python or
pip
, the virtual environmentβs version will be used (as indicated by the(.venv)
before the prompt). This can be verified as follows:(.venv) pi@raspberrypi:~/snps $ which python /home/pi/snps/.venv/bin/python
Install
snps
:(.venv) pi@raspberrypi:~/snps $ pip install snps
Start Python:
(.venv) pi@raspberrypi:~/snps $ python Python 3.7.3 (default, Jul 25 2020, 13:03:44) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>
Use
snps
; examples shown in the README should now work.At completion of usage, the virtual environment can be deactivated:
(.venv) pi@raspberrypi:~/snps $ deactivate pi@raspberrypi:~/snps $
Changelogο
The changelog is maintained here: https://github.com/apriha/snps/releases
Contributingο
Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.
Bug reportsο
When reporting a bug please include:
Your operating system name and version.
Any details about your local setup that might be helpful in troubleshooting.
Detailed steps to reproduce the bug.
Documentation improvementsο
snps
could always use more documentation, whether as part of the official snps
docs, in
docstrings, or even on the web in blog posts, articles, and such. See below for info on how to
generate documentation.
Feature requests and feedbackο
The best way to send feedback is to file an issue at https://github.com/apriha/snps/issues.
If you are proposing a feature:
Explain in detail how it would work.
Keep the scope as narrow as possible, to make it easier to implement.
Remember that this is a volunteer-driven project, and that code contributions are welcome :)
Developmentο
To set up snps
for local development:
Fork snps (look for the βForkβ button).
Clone your fork locally:
$ git clone git@github.com:your_name_here/snps.git
Create a branch for local development from the
develop
branch:$ cd snps $ git checkout develop $ git checkout -b name-of-your-bugfix-or-feature develop
Setup a development environment:
$ pip install pipenv $ pipenv install --dev
When youβre done making changes, run all the tests with:
$ pipenv run pytest --cov-report=html --cov=snps tests
Note
Downloads during tests are disabled by default. To enable downloads, set the environment variable
DOWNLOADS_ENABLED=true
.Note
If you receive errors when running the tests, you may need to specify the temporary directory with an environment variable, e.g.,
TMPDIR="/path/to/tmp/dir"
.Note
After running the tests, a coverage report can be viewed by opening
htmlcov/index.html
in a browser.Check code formatting:
$ pipenv run black --check --diff .
Commit your changes and push your branch to GitHub:
$ git add . $ git commit -m "Your detailed description of your changes." $ git push origin name-of-your-bugfix-or-feature
Submit a pull request through the GitHub website.
Pull request guidelinesο
If you need some code review or feedback while youβre developing the code, just make the pull request.
For merging, you should:
Ensure tests pass.
Update documentation when thereβs new API, functionality, etc.
Add yourself to
CONTRIBUTORS.rst
if youβd like.
Documentationο
After the development environment has been setup, documentation can be generated via the following command:
$ pipenv run sphinx-build -T -E -D language=en docs docs/_build
Then, the documentation can be viewed by opening docs/_build/index.html
in a browser.
Contributorsο
Contributors to
snps
are listed below.
Core Developersο
Name |
GitHub |
---|---|
Andrew Riha |
|
Will Jones |
Other Contributorsο
Listed in alphabetical order.
Name |
GitHub |
---|---|
Alan Moffet |
|
Anatoli Babenia |
|
Castedo Ellerman |
|
Gerard Manning |
|
Julian Runnels |
|
Kevin Arvai |
|
Phil Palmer |
|
Yoan Bouzin |
Code Documentationο
SNPsο
SNPs
reads, writes, merges, and remaps genotype / raw data files.
- class snps.snps.SNPs(file='', only_detect_source=False, assign_par_snps=False, output_dir='output', resources_dir='resources', deduplicate=True, deduplicate_XY_chrom=True, deduplicate_MT_chrom=True, parallelize=False, processes=2, rsids=())[source]ο
Bases:
object
- __init__(file='', only_detect_source=False, assign_par_snps=False, output_dir='output', resources_dir='resources', deduplicate=True, deduplicate_XY_chrom=True, deduplicate_MT_chrom=True, parallelize=False, processes=2, rsids=())[source]ο
Object used to read, write, and remap genotype / raw data files.
- Parameters
file (str or bytes) β path to file to load or bytes to load
only_detect_source (bool) β only detect the source of the data
assign_par_snps (bool) β assign PAR SNPs to the X and Y chromosomes
output_dir (str) β path to output directory
resources_dir (str) β name / path of resources directory
deduplicate (bool) β deduplicate RSIDs and make SNPs available as SNPs.duplicate
deduplicate_MT_chrom (bool) β deduplicate alleles on MT; see SNPs.heterozygous_MT
deduplicate_XY_chrom (bool or str) β deduplicate alleles in the non-PAR regions of X and Y for males; see SNPs.discrepant_XY if a str then this is the sex determination method to use X Y or XY
parallelize (bool) β utilize multiprocessing to speedup calculations
processes (int) β processes to launch if multiprocessing
rsids (tuple, optional) β rsids to extract if loading a VCF file
- property assemblyο
Assembly of SNPs.
- Returns
- Return type
str
- property buildο
Build of SNPs.
- Returns
- Return type
int
- property build_detectedο
Status indicating if build of SNPs was detected.
- Returns
- Return type
bool
- property chipο
Detected deduced genotype / chip array, if any, per
compute_cluster_overlap
.- Returns
detected chip array, else empty str
- Return type
str
- property chip_versionο
Detected genotype / chip array version, if any, per
compute_cluster_overlap
.Notes
Chip array version is only applicable to 23andMe (v3, v4, v5) and AncestryDNA (v1, v2) files.
- Returns
detected chip array version, e.g., βv4β, else empty str
- Return type
str
- property chromosomesο
Chromosomes of SNPs.
- Returns
list of str chromosomes (e.g., [β1β, β2β, β3β, βMTβ], empty list if no chromosomes
- Return type
list
- property chromosomes_summaryο
Summary of the chromosomes of SNPs.
- Returns
human-readable listing of chromosomes (e.g., β1-3, MTβ), empty str if no chromosomes
- Return type
str
- property clusterο
Detected chip cluster, if any, per
compute_cluster_overlap
.Notes
Refer to
compute_cluster_overlap
for more details about chip clusters.- Returns
detected chip cluster, e.g., βc1β, else empty str
- Return type
str
- compute_cluster_overlap(cluster_overlap_threshold=0.95)[source]ο
Compute overlap with chip clusters.
Chip clusters, which are defined in 1, are associated with deduced genotype / chip arrays and DTC companies.
This method also sets the values returned by the cluster, chip, and chip_version properties, based on max overlap, if the specified threshold is satisfied.
- Parameters
cluster_overlap_threshold (float) β threshold for cluster to overlap this SNPs object, and vice versa, to set values returned by the cluster, chip, and chip_version properties
- Returns
pandas.DataFrame with the following columns:
- company_composition
DTC company composition of associated cluster from 1
- chip_base_deduced
deduced genotype / chip array of associated cluster from 1
- snps_in_cluster
count of SNPs in cluster
- snps_in_common
count of SNPs in common with cluster (inner merge with cluster)
- overlap_with_cluster
percentage overlap of snps_in_common with cluster
- overlap_with_self
percentage overlap of snps_in_common with this SNPs object
- Return type
pandas.DataFrame
References
- 1(1,2,3,4)
Chang Lu, Bastian Greshake Tzovaras, Julian Gough, A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research, Computational and Structural Biotechnology Journal, Volume 19, 2021, Pages 3747-3754, ISSN 2001-0370, https://doi.org/10.1016/j.csbj.2021.06.040.
- property countο
Count of SNPs.
- Returns
- Return type
int
- detect_build()[source]ο
Detect build of SNPs.
Use the coordinates of common SNPs to identify the build / assembly of a genotype file that is being loaded.
Notes
rs3094315 : plus strand in 36, 37, and 38
rs11928389 : plus strand in 36, minus strand in 37 and 38
rs2500347 : plus strand in 36 and 37, minus strand in 38
rs964481 : plus strand in 36, 37, and 38
rs2341354 : plus strand in 36, 37, and 38
rs3850290 : plus strand in 36, 37, and 38
rs1329546 : plus strand in 36, 37, and 38
- Returns
detected build of SNPs, else 0
- Return type
int
References
Yates et. al. (doi:10.1093/bioinformatics/btu613), http://europepmc.org/search/?query=DOI:10.1093/bioinformatics/btu613
Zerbino et. al. (doi.org/10.1093/nar/gkx1098), https://doi.org/10.1093/nar/gkx1098
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1;29(1):308-11.
Database of Single Nucleotide Polymorphisms (dbSNP). Bethesda (MD): National Center for Biotechnology Information, National Library of Medicine. dbSNP accession: rs3094315, rs11928389, rs2500347, rs964481, rs2341354, rs3850290, and rs1329546 (dbSNP Build ID: 151). Available from: http://www.ncbi.nlm.nih.gov/SNP/
- determine_sex(heterozygous_x_snps_threshold=0.03, y_snps_not_null_threshold=0.3, chrom='X')[source]ο
Determine sex from SNPs using thresholds.
- Parameters
heterozygous_x_snps_threshold (float) β percentage heterozygous X SNPs; above this threshold, Female is determined
y_snps_not_null_threshold (float) β percentage Y SNPs that are not null; above this threshold, Male is determined
chrom ({βXβ, βYβ}) β use X or Y chromosome SNPs to determine sex
- Returns
βMaleβ or βFemaleβ if detected, else empty str
- Return type
str
- property discrepant_XYο
Discrepant XY SNPs.
A discrepant XY SNP is a heterozygous SNP in the non-PAR region of the X or Y chromosome found during deduplication for a detected male genotype.
- Returns
normalized
snps
dataframe- Return type
pandas.DataFrame
- property discrepant_merge_genotypesο
SNPs with discrepant genotypes discovered while merging SNPs.
Notes
Definitions of columns in this dataframe are as follows:
Column
Description
rsid
SNP ID
chrom
Chromosome of existing SNP
pos
Position of existing SNP
genotype
Genotype of existing SNP
chrom_added
Chromosome of added SNP
pos_added
Position of added SNP
genotype_added
Genotype of added SNP (discrepant with genotype)
- Returns
- Return type
pandas.DataFrame
- property discrepant_merge_positionsο
SNPs with discrepant positions discovered while merging SNPs.
Notes
Definitions of columns in this dataframe are as follows:
Column
Description
rsid
SNP ID
chrom
Chromosome of existing SNP
pos
Position of existing SNP
genotype
Genotype of existing SNP
chrom_added
Chromosome of added SNP
pos_added
Position of added SNP (discrepant with pos)
genotype_added
Genotype of added SNP
- Returns
- Return type
pandas.DataFrame
- property discrepant_merge_positions_genotypesο
SNPs with discrepant positions and / or genotypes discovered while merging SNPs.
Notes
Definitions of columns in this dataframe are as follows:
Column
Description
rsid
SNP ID
chrom
Chromosome of existing SNP
pos
Position of existing SNP
genotype
Genotype of existing SNP
chrom_added
Chromosome of added SNP
pos_added
Position of added SNP (possibly discrepant with pos)
genotype_added
Genotype of added SNP (possibly discrepant with genotype)
- Returns
- Return type
pandas.DataFrame
- property discrepant_vcf_positionο
SNPs with discrepant positions discovered while saving VCF.
- Returns
normalized
snps
dataframe- Return type
pandas.DataFrame
- property duplicateο
Duplicate SNPs.
A duplicate SNP has the same RSID as another SNP. The first occurrence of the RSID is not considered a duplicate SNP.
- Returns
normalized
snps
dataframe- Return type
pandas.DataFrame
- get_count(chrom='')[source]ο
Count of SNPs.
- Parameters
chrom (str, optional) β chromosome (e.g., β1β, βXβ, βMTβ)
- Returns
- Return type
int
- static get_par_regions(build)[source]ο
Get PAR regions for the X and Y chromosomes.
- Parameters
build (int) β build of SNPs
- Returns
PAR regions for the given build
- Return type
pandas.DataFrame
References
Genome Reference Consortium, https://www.ncbi.nlm.nih.gov/grc/human
Yates et. al. (doi:10.1093/bioinformatics/btu613), http://europepmc.org/search/?query=DOI:10.1093/bioinformatics/btu613
Zerbino et. al. (doi.org/10.1093/nar/gkx1098), https://doi.org/10.1093/nar/gkx1098
- heterozygous(chrom='')[source]ο
Get heterozygous SNPs.
- Parameters
chrom (str, optional) β chromosome (e.g., β1β, βXβ, βMTβ)
- Returns
normalized
snps
dataframe- Return type
pandas.DataFrame
- property heterozygous_MTο
Heterozygous SNPs on the MT chromosome found during deduplication.
- Returns
normalized
snps
dataframe- Return type
pandas.DataFrame
- homozygous(chrom='')[source]ο
Get homozygous SNPs.
- Parameters
chrom (str, optional) β chromosome (e.g., β1β, βXβ, βMTβ)
- Returns
normalized
snps
dataframe- Return type
pandas.DataFrame
- identify_low_quality_snps()[source]ο
Identify low quality SNPs based on chip clusters.
Any low quality SNPs are removed from the
snps_qc
dataframe and are made available aslow_quality
.Notes
Chip clusters, which are defined in 1, are associated with low quality SNPs. As such, low quality SNPs will only be identified when this SNPs object corresponds to a cluster per
compute_cluster_overlap()
.
- property low_qualityο
SNPs identified as low quality, if any, per
identify_low_quality_snps()
.- Returns
normalized
snps
dataframe- Return type
pandas.DataFrame
- merge(snps_objects=(), discrepant_positions_threshold=100, discrepant_genotypes_threshold=500, remap=True, chrom='')[source]ο
Merge other
SNPs
objects into thisSNPs
object.- Parameters
snps_objects (list or tuple of
SNPs
) β otherSNPs
objects to merge into thisSNPs
objectdiscrepant_positions_threshold (int) β threshold for discrepant SNP positions between existing data and data to be loaded; a large value could indicate mismatched genome assemblies
discrepant_genotypes_threshold (int) β threshold for discrepant genotype data between existing data and data to be loaded; a large value could indicated mismatched individuals
remap (bool) β if necessary, remap other
SNPs
objects to have the same build as thisSNPs
object before mergingchrom (str, optional) β chromosome to merge (e.g., β1β, βYβ, βMTβ)
- Returns
for each
SNPs
object to merge, a dict with the following items:- merged (bool)
whether
SNPs
object was merged- common_rsids (pandas.Index)
SNPs in common
- discrepant_position_rsids (pandas.Index)
SNPs with discrepant positions
- discrepant_genotype_rsids (pandas.Index)
SNPs with discrepant genotypes
- Return type
list of dict
References
Fluent Python by Luciano Ramalho (OβReilly). Copyright 2015 Luciano Ramalho, 978-1-491-94600-8.
- notnull(chrom='')[source]ο
Get not null genotype SNPs.
- Parameters
chrom (str, optional) β chromosome (e.g., β1β, βXβ, βMTβ)
- Returns
normalized
snps
dataframe- Return type
pandas.DataFrame
- property phasedο
Indicates if genotype is phased.
- Returns
- Return type
bool
- predict_ancestry(output_directory=None, write_predictions=False, models_directory=None, aisnps_directory=None, n_components=None, k=None, thousand_genomes_directory=None, samples_directory=None, algorithm=None, aisnps_set=None)[source]ο
Predict genetic ancestry for SNPs.
Predictions by ezancestry.
Notes
Populations below are described here.
- Parameters
various (optional) β See the available settings for predict at ezancestry.
- Returns
dict with the following keys:
- population_code (str)
max predicted population for the sample
- population_description (str)
descriptive name of the population
- population_percent (float)
predicted probability for the max predicted population
- superpopulation_code (str)
max predicted super population (continental) for the sample
- superpopulation_description (str)
descriptive name of the super population
- superpopulation_percent (float)
predicted probability for the max predicted super population
- ezancestry_df (pandas.DataFrame)
pandas.DataFrame with the following columns:
- component1, component2, component3
The coordinates of the sample in the dimensionality-reduced component space. Can be used as (x, y, z,) coordinates for plotting in a 3d scatter plot.
- predicted_population_population
The max predicted population for the sample.
- ACB, ASW, BEB, CDX, CEU, CHB, CHS, CLM, ESN, FIN, GBR, GIH, GWD, IBS, ITU, JPT, KHV, LWK, MSL, MXL, PEL, PJL, PUR, STU, TSI, YRI
Predicted probabilities for each of the populations. These sum to 1.0.
- predicted_population_superpopulation
The max predicted super population (continental) for the sample.
- AFR, AMR, EAS, EUR, SAS
Predicted probabilities for each of the super populations. These sum to 1.0.
- population_description, superpopulation_name
Descriptive names of the population and super population.
- Return type
dict
- remap(target_assembly, complement_bases=True)[source]ο
Remap SNP coordinates from one assembly to another.
This method uses the assembly map endpoint of the Ensembl REST API service (via
Resources
βsEnsemblRestClient
) to convert SNP coordinates / positions from one assembly to another. After remapping, the coordinates / positions for the SNPs will be that of the target assembly.If the SNPs are already mapped relative to the target assembly, remapping will not be performed.
- Parameters
target_assembly ({βNCBI36β, βGRCh37β, βGRCh38β, 36, 37, 38}) β assembly to remap to
complement_bases (bool) β complement bases when remapping SNPs to the minus strand
- Returns
chromosomes_remapped (list of str) β chromosomes remapped
chromosomes_not_remapped (list of str) β chromosomes not remapped
Notes
An assembly is also know as a βbuild.β For example:
Assembly NCBI36 = Build 36 Assembly GRCh37 = Build 37 Assembly GRCh38 = Build 38
See https://www.ncbi.nlm.nih.gov/assembly for more information about assemblies and remapping.
References
Ensembl, Assembly Map Endpoint, http://rest.ensembl.org/documentation/info/assembly_map
Yates et. al. (doi:10.1093/bioinformatics/btu613), http://europepmc.org/search/?query=DOI:10.1093/bioinformatics/btu613
Zerbino et. al. (doi.org/10.1093/nar/gkx1098), https://doi.org/10.1093/nar/gkx1098
- property sexο
Sex derived from SNPs.
- Returns
βMaleβ or βFemaleβ if detected, else empty str
- Return type
str
- property snpsο
Normalized SNPs.
Notes
Throughout
snps
, the βnormalizedsnps
dataframeβ is defined as follows:Column
Description
pandas dtype
rsid *
SNP ID
object (string)
chrom
Chromosome of SNP
object (string)
pos
Position of SNP (relative to build)
uint32
genotype β
Genotype of SNP
object (string)
- *
Dataframe index
- β
Genotype can be null, length 1, or length 2. Specifically, genotype is null if not called or unavailable. Otherwise, for autosomal chromosomes, genotype is two alleles. For the X and Y chromosomes, male genotypes are one allele in the non-PAR regions (assuming deduplicate_XY_chrom). For the MT chromosome, genotypes are one allele (assuming deduplicate_MT_chrom).
- Returns
normalized
snps
dataframe- Return type
pandas.DataFrame
- property snps_qcο
Normalized SNPs, after quality control.
Any low quality SNPs, identified per
identify_low_quality_snps()
, are not included in the result.- Returns
normalized
snps
dataframe- Return type
pandas.DataFrame
- property sourceο
Summary of the SNP data source(s).
- Returns
Data source(s) for this
SNPs
object, separated by β, β.- Return type
str
- property summaryο
Summary of SNPs.
- Returns
summary info if
SNPs
is valid, else {}- Return type
dict
- to_csv(filename='', atomic=True, **kwargs)[source]ο
Output SNPs as comma-separated values.
- Parameters
filename (str or buffer) β filename for file to save or buffer to write to
atomic (bool) β atomically write output to a file on local filesystem
**kwargs β additional parameters to pandas.DataFrame.to_csv
- Returns
path to file in output directory if SNPs were saved, else empty str
- Return type
str
- to_tsv(filename='', atomic=True, **kwargs)[source]ο
Output SNPs as tab-separated values.
Note that this results in the same default output as save.
- Parameters
filename (str or buffer) β filename for file to save or buffer to write to
atomic (bool) β atomically write output to a file on local filesystem
**kwargs β additional parameters to pandas.DataFrame.to_csv
- Returns
path to file in output directory if SNPs were saved, else empty str
- Return type
str
- to_vcf(filename='', atomic=True, alt_unavailable='.', chrom_prefix='', qc_only=False, qc_filter=False, **kwargs)[source]ο
Output SNPs as Variant Call Format.
- Parameters
filename (str or buffer) β filename for file to save or buffer to write to
atomic (bool) β atomically write output to a file on local filesystem
alt_unavailable (str) β representation of ALT allele when ALT is not able to be determined
chrom_prefix (str) β prefix for chromosomes in VCF CHROM column
qc_only (bool) β output only SNPs that pass quality control
qc_filter (bool) β populate FILTER column based on quality control results
**kwargs β additional parameters to pandas.DataFrame.to_csv
- Returns
path to file in output directory if SNPs were saved, else empty str
- Return type
str
Notes
Parameters qc_only and qc_filter, if true, will identify low quality SNPs per
identify_low_quality_snps()
, if not done already. Moreover, these parameters have no effect if this SNPs object does not map to a cluster percompute_cluster_overlap()
.References
The Variant Call Format (VCF) Version 4.2 Specification, 8 Mar 2019, https://samtools.github.io/hts-specs/VCFv4.2.pdf
- property unannotated_vcfο
Indicates if VCF file is unannotated.
- Returns
- Return type
bool
- property validο
Determine if
SNPs
is valid.SNPs
is valid when the input file has been successfully parsed.- Returns
True if
SNPs
is valid- Return type
bool
snps.ensemblο
Ensembl REST client.
Notes
Modified from https://github.com/Ensembl/ensembl-rest/wiki/Example-Python-Client.
References
Yates et. al. (doi:10.1093/bioinformatics/btu613), http://europepmc.org/search/?query=DOI:10.1093/bioinformatics/btu613
Zerbino et. al. (doi.org/10.1093/nar/gkx1098), https://doi.org/10.1093/nar/gkx1098
snps.ioο
Classes for reading and writing SNPs.
snps.io.readerο
Class for reading SNPs.
- class snps.io.reader.Reader(file='', only_detect_source=False, resources=None, rsids=())[source]ο
Bases:
object
Class for reading and parsing raw data / genotype files.
- __init__(file='', only_detect_source=False, resources=None, rsids=())[source]ο
Initialize a Reader.
- Parameters
file (str or bytes) β path to file to load or bytes to load
only_detect_source (bool) β only detect the source of the data
resources (Resources) β instance of Resources
rsids (tuple, optional) β rsids to extract if loading a VCF file
- read()[source]ο
Read and parse a raw data / genotype file.
- Returns
dict with the following items:
- snps (pandas.DataFrame)
dataframe of parsed SNPs
- source (str)
detected source of SNPs
- phased (bool)
flag indicating if SNPs are phased
- Return type
dict
- read_23andme(file, compression, joined=True)[source]ο
Read and parse 23andMe file.
- Parameters
file (str) β path to file
- Returns
result of read_helper
- Return type
dict
- read_ancestry(file, compression)[source]ο
Read and parse Ancestry.com file.
- Parameters
file (str) β path to file
- Returns
result of read_helper
- Return type
dict
- read_circledna(file, compression)[source]ο
Read and parse CircleDNA file.
Notes
This method attempts to read and parse a whole exome file, optionally compressed with gzip or zip. Some assumptions are made throughout this process:
SNPs that are not annotated with an RSID are skipped
Insertions and deletions are skipped
- Parameters
file (str or bytes) β path to file or bytes to load
- Returns
result of read_helper
- Return type
dict
- read_dnaland(file, compression)[source]ο
Read and parse DNA.land files.
- Parameters
data (str) β data string
- Returns
result of read_helper
- Return type
dict
- read_ftdna(file, compression)[source]ο
Read and parse Family Tree DNA (FTDNA) file.
- Parameters
file (str) β path to file
- Returns
result of read_helper
- Return type
dict
- read_ftdna_famfinder(file, compression)[source]ο
Read and parse Family Tree DNA (FTDNA) βfamfinderβ file.
- Parameters
file (str) β path to file
- Returns
result of read_helper
- Return type
dict
- read_generic(file, compression, skip=1)[source]ο
Read and parse generic CSV or TSV file.
Notes
Assumes columns are βrsidβ, βchromβ / βchromosomeβ, βposβ / βpositionβ, and βgenotypeβ; values are comma separated; unreported genotypes are indicated by βββ; and one header row precedes data. For example:
rsid,chromosome,position,genotype rs1,1,1,AA rs2,1,2,CC rs3,1,3,β
- Parameters
file (str) β path to file
- Returns
result of read_helper
- Return type
dict
- read_genes_for_good(file, compression)[source]ο
Read and parse Genes For Good file.
https://genesforgood.sph.umich.edu/readme/readme1.2.txt
- Parameters
file (str) β path to file
- Returns
result of read_helper
- Return type
dict
- read_gsa(data_or_filename, compresion, comments)[source]ο
Read and parse Illumina Global Screening Array files
- Parameters
data_or_filename (str or bytes) β either the filename to read from or the bytes data itself
- Returns
result of read_helper
- Return type
dict
- read_helper(source, parser)[source]ο
Generic method to help read files.
- Parameters
source (str) β name of data source
parser (func) β parsing function, which returns a tuple with the following items:
- 0 (pandas.DataFrame)
dataframe of parsed SNPs (empty if only detecting source)
- 1 (bool), optional
flag indicating if SNPs are phased
- 2 (int), optional
detected build of SNPs
- Returns
dict with the following items:
- snps (pandas.DataFrame)
dataframe of parsed SNPs
- source (str)
detected source of SNPs
- phased (bool)
flag indicating if SNPs are phased
- build (int)
detected build of SNPs
- Return type
dict
References
Fluent Python by Luciano Ramalho (OβReilly). Copyright 2015 Luciano Ramalho, 978-1-491-94600-8.
- read_livingdna(file, compression)[source]ο
Read and parse LivingDNA file.
- Parameters
file (str) β path to file
- Returns
result of read_helper
- Return type
dict
- read_mapmygenome(file, compression, header)[source]ο
Read and parse Mapmygenome file.
- Parameters
file (str) β path to file
- Returns
result of read_helper
- Return type
dict
- read_myheritage(file, compression)[source]ο
Read and parse MyHeritage file.
- Parameters
file (str) β path to file
- Returns
result of read_helper
- Return type
dict
- read_snps_csv(file, comments, compression)[source]ο
Read and parse CSV file generated by
snps
.https://pypi.org/project/snps/
- Parameters
file (str or buffer) β path to file or buffer to read
comments (str) β comments at beginning of file
- Returns
result of read_helper
- Return type
dict
- read_tellmegen(file, compression)[source]ο
Read and parse tellmeGen files.
- Parameters
data (str) β data string
- Returns
result of read_helper
- Return type
dict
- read_vcf(file, compression, provider, rsids=())[source]ο
Read and parse VCF file.
Notes
This method attempts to read and parse a VCF file or buffer, optionally compressed with gzip. Some assumptions are made throughout this process:
SNPs that are not annotated with an RSID are skipped
If the VCF contains multiple samples, only the first sample is used to lookup the genotype
Insertions and deletions are skipped
If a sample allele is not specified, the genotype is reported as NaN
If a sample allele refers to a REF or ALT allele that is not specified, the genotype is reported as NaN
- Parameters
file (str or bytes) β path to file or bytes to load
rsids (tuple, optional) β rsids to extract if loading a VCF file
- Returns
result of read_helper
- Return type
dict
snps.io.writerο
Class for writing SNPs.
- class snps.io.writer.Writer(snps=None, filename='', vcf=False, atomic=True, vcf_alt_unavailable='.', vcf_chrom_prefix='', vcf_qc_only=False, vcf_qc_filter=False, **kwargs)[source]ο
Bases:
object
Class for writing SNPs to files.
- __init__(snps=None, filename='', vcf=False, atomic=True, vcf_alt_unavailable='.', vcf_chrom_prefix='', vcf_qc_only=False, vcf_qc_filter=False, **kwargs)[source]ο
Initialize a Writer.
- Parameters
snps (SNPs) β SNPs to save to file or write to buffer
filename (str or buffer) β filename for file to save or buffer to write to
vcf (bool) β flag to save file as VCF
atomic (bool) β atomically write output to a file on local filesystem
vcf_alt_unavailable (str) β representation of VCF ALT allele when ALT is not able to be determined
vcf_chrom_prefix (str) β prefix for chromosomes in VCF CHROM column
vcf_qc_only (bool) β for VCF, output only SNPs that pass quality control
vcf_qc_filter (bool) β for VCF, populate VCF FILTER column based on quality control results
**kwargs β additional parameters to pandas.DataFrame.to_csv
snps.resourcesο
Class for downloading and loading required external resources.
References
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature. 2001 Feb 15;409(6822):860-921. http://dx.doi.org/10.1038/35057062
hg19 (GRCh37): Hiram Clawson, Brooke Rhead, Pauline Fujita, Ann Zweig, Katrina Learned, Donna Karolchik and Robert Kuhn, https://genome.ucsc.edu/cgi-bin/hgGateway?db=hg19
Yates et. al. (doi:10.1093/bioinformatics/btu613), http://europepmc.org/search/?query=DOI:10.1093/bioinformatics/btu613
Zerbino et. al. (doi.org/10.1093/nar/gkx1098), https://doi.org/10.1093/nar/gkx1098
- class snps.resources.ReferenceSequence(ID='', url='', path='', assembly='', species='', taxonomy='')[source]ο
Bases:
object
Object used to represent and interact with a reference sequence.
- property IDο
Get reference sequence chromosome.
- Returns
- Return type
str
- __init__(ID='', url='', path='', assembly='', species='', taxonomy='')[source]ο
Initialize a
ReferenceSequence
object.- Parameters
ID (str) β reference sequence chromosome
url (str) β url to Ensembl reference sequence
path (str) β path to local reference sequence
assembly (str) β reference sequence assembly (e.g., βGRCh37β)
species (str) β reference sequence species
taxonomy (str) β reference sequence taxonomy
References
The Variant Call Format (VCF) Version 4.2 Specification, 8 Mar 2019, https://samtools.github.io/hts-specs/VCFv4.2.pdf
- property assemblyο
Get reference sequence assembly.
- Returns
- Return type
str
- property buildο
Get reference sequence build.
- Returns
e.g., βB37β
- Return type
str
- property chromο
Get reference sequence chromosome.
- Returns
- Return type
str
- property endο
Get reference sequence end position (1-based).
- Returns
- Return type
int
- property lengthο
Get reference sequence length.
- Returns
- Return type
int
- property md5ο
Get reference sequence MD5 hash.
- Returns
- Return type
str
- property pathο
Get path to local reference sequence.
- Returns
- Return type
str
- property sequenceο
Get reference sequence.
- Returns
- Return type
np.array(dtype=np.uint8)
- property speciesο
Get reference sequence species.
- Returns
- Return type
str
- property startο
Get reference sequence start position (1-based).
- Returns
- Return type
int
- property taxonomyο
Get reference sequence taxonomy.
- Returns
- Return type
str
- property urlο
Get URL to Ensembl reference sequence.
- Returns
- Return type
str
- class snps.resources.Resources(*args, **kwargs)[source]ο
Bases:
object
Object used to manage resources required by snps.
- __init__(resources_dir='resources')[source]ο
Initialize a
Resources
object.- Parameters
resources_dir (str) β name / path of resources directory
- download_example_datasets()[source]ο
Download example datasets from openSNP.
Per openSNP, βthe data is donated into the public domain using CC0 1.0.β
- Returns
paths β paths to example datasets
- Return type
list of str or empty str
References
Greshake B, Bayer PE, Rausch H, Reda J (2014), βopenSNP-A Crowdsourced Web Resource for Personal Genomics,β PLOS ONE, 9(3): e89204, https://doi.org/10.1371/journal.pone.0089204
- get_all_reference_sequences(**kwargs)[source]ο
Get Homo sapiens reference sequences for Builds 36, 37, and 38 from Ensembl.
Notes
This function can download over 2.5GB of data.
- Returns
dict of ReferenceSequence, else {}
- Return type
dict
- get_all_resources()[source]ο
Get / download all resources used throughout snps.
Notes
This function does not download reference sequences and the openSNP datadump, due to their large sizes.
- Returns
dict of resources
- Return type
dict
- get_assembly_mapping_data(source_assembly, target_assembly)[source]ο
Get assembly mapping data.
- Parameters
source_assembly ({βNCBI36β, βGRCh37β, βGRCh38β}) β assembly to remap from
target_assembly ({βNCBI36β, βGRCh37β, βGRCh38β}) β assembly to remap to
- Returns
dict of json assembly mapping data if loading was successful, else {}
- Return type
dict
- get_chip_clusters()[source]ο
Get resource for identifying deduced genotype / chip array based on chip clusters.
- Returns
- Return type
pandas.DataFrame
References
Chang Lu, Bastian Greshake Tzovaras, Julian Gough, A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research, Computational and Structural Biotechnology Journal, Volume 19, 2021, Pages 3747-3754, ISSN 2001-0370, https://doi.org/10.1016/j.csbj.2021.06.040.
- get_dbsnp_151_37_reverse()[source]ο
Get and load RSIDs that are on the reference reverse (-) strand in dbSNP 151 and lower.
- Returns
- Return type
pandas.DataFrame
References
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001 Jan 1; 29(1):308-11.
Database of Single Nucleotide Polymorphisms (dbSNP). Bethesda (MD): National Center for Biotechnology Information, National Library of Medicine. (dbSNP Build ID: 151). Available from: http://www.ncbi.nlm.nih.gov/SNP/
- get_gsa_chrpos()[source]ο
Get and load GSA chromosome position map.
https://support.illumina.com/downloads/infinium-global-screening-array-v2-0-product-files.html
- Returns
- Return type
pandas.DataFrame
- get_gsa_resources()[source]ο
Get resources for reading Global Screening Array files.
https://support.illumina.com/downloads/infinium-global-screening-array-v2-0-product-files.html
- Returns
- Return type
dict
- get_gsa_rsid()[source]ο
Get and load GSA RSID map.
https://support.illumina.com/downloads/infinium-global-screening-array-v2-0-product-files.html
- Returns
- Return type
pandas.DataFrame
- get_low_quality_snps()[source]ο
Get listing of low quality SNPs for quality control based on chip clusters.
- Returns
- Return type
pandas.DataFrame
References
Chang Lu, Bastian Greshake Tzovaras, Julian Gough, A survey of direct-to-consumer genotype data, and quality control tool (GenomePrep) for research, Computational and Structural Biotechnology Journal, Volume 19, 2021, Pages 3747-3754, ISSN 2001-0370, https://doi.org/10.1016/j.csbj.2021.06.040.
- get_opensnp_datadump_filenames()[source]ο
Get filenames internal to the openSNP datadump zip.
Per openSNP, βthe data is donated into the public domain using CC0 1.0.β
Notes
This function can download over 27GB of data. If the download is not successful, try using a different tool like wget or curl to download the file and move it to the resources directory (see _get_path_opensnp_datadump).
- Returns
filenames β filenames internal to the openSNP datadump
- Return type
list of str
References
Greshake B, Bayer PE, Rausch H, Reda J (2014), βopenSNP-A Crowdsourced Web Resource for Personal Genomics,β PLOS ONE, 9(3): e89204, https://doi.org/10.1371/journal.pone.0089204
- get_reference_sequences(assembly='GRCh37', chroms=('1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', 'X', 'Y', 'MT'))[source]ο
Get Homo sapiens reference sequences for chroms of assembly.
Notes
This function can download over 800MB of data for each assembly.
- Parameters
assembly ({βNCBI36β, βGRCh37β, βGRCh38β}) β reference sequence assembly
chroms (list of str) β reference sequence chromosomes
- Returns
dict of ReferenceSequence, else {}
- Return type
dict
- load_opensnp_datadump_file(filename)[source]ο
Load the specified file from the openSNP datadump.
Per openSNP, βthe data is donated into the public domain using CC0 1.0.β
- Parameters
filename (str) β filename internal to the openSNP datadump
- Returns
content of specified file internal to the openSNP datadump
- Return type
bytes
References
Greshake B, Bayer PE, Rausch H, Reda J (2014), βopenSNP-A Crowdsourced Web Resource for Personal Genomics,β PLOS ONE, 9(3): e89204, https://doi.org/10.1371/journal.pone.0089204
snps.utilsο
Utility classes and functions.
- snps.utils.clean_str(s)[source]ο
Clean a string so that it can be used as a Python variable name.
- Parameters
s (str) β string to clean
- Returns
string that can be used as a Python variable name
- Return type
str
- snps.utils.create_dir(path)[source]ο
Create directory specified by path if it doesnβt already exist.
- Parameters
path (str) β path to directory
- Returns
True if path exists
- Return type
bool
- snps.utils.gzip_file(src, dest)[source]ο
Gzip a file.
- Parameters
src (str) β path to file to gzip
dest (str) β path to output gzip file
- Returns
path to gzipped file
- Return type
str
- snps.utils.save_df_as_csv(df, path, filename, comment='', prepend_info=True, atomic=True, **kwargs)[source]ο
Save dataframe to a CSV file.
- Parameters
df (pandas.DataFrame) β dataframe to save
path (str) β path to directory where to save CSV file
filename (str or buffer) β filename for file to save or buffer to write to
comment (str) β header comment(s); one or more lines starting with β#β
prepend_info (bool) β prepend file generation information as comments
atomic (bool) β atomically write output to a file on local filesystem
**kwargs β additional parameters to pandas.DataFrame.to_csv
- Returns
path to saved file or buffer (empty str if error)
- Return type
str or buffer