General usage¶
General structure¶
The goal of the pipeline is to prepare cases obtained from Face2Gene containing information on genomic variants, phenotypic features and scores inferred from the features and a provided photograph with additional data sources, such as an alternative interpretation of phenotypic features with Phenomizer and the parsing and conversion of variant HGVS strings into VCF files.
This process requires the following steps:
- Downloading of Face2Gene cases
- Parsing of json format cases into internal representation
- Adding phenomizer data
- Creating VCF files from HGVS information
- Checking whether data quality requirements are met*
- for performance and parser implementation reasons an initial check is already done on json level
Downloading¶
Data is acquired from a AWS S3 Bucket, which is updated periodically with new cases and recalculations of old cases.
JSON Parsing¶
The raw json files provided by Face2Gene are processed into an internal case representation which attempts to bundle information from various JSON fields into a simpler structure.
Phenomization¶
Phenomizer information is included into cases by calling the phenomizer object.
VCF Generation¶
VCF files are generated by caling the vcf function of the case. Jannovar can be
run as an individual process or in server mode by calling
run_jannovar.sh first.
Quality Check functions¶
Whether certain criteria have been met for the cases can be checked by using the check function available in the json object and the case object.
The quality check results are saved to a json format log found per default at
quality_check.log
Error correction overrides¶
There are multiple possibilities to provide corrections for erroneous input data.
Complete sections in the raw json file can be replaced by placing corrected
files in the correction folder. (See the config.ini for the location of the
correction directory.) These have to follow to the folder hierarchy of the base
json structure with a separation into cases and genomics_entries.
Alternatively gene information and hgvs strings can be inserted into an error
dictionary, found per default in hgvs_errors.json. This file can be
versioned with version enforcement configurable in lib/constants.py.