Upload files from a DRAGEN run¶
Please note that this script is currently under development.
The cgpclient can be used to upload files (e.g. FASTQs) from NHS Sequencing Centres to the CGP.
We have included an example script in the scripts directory for uploading FASTQ files generated after demultiplexing with the DRAGEN software.
Adapting to each GLH
This is an example script and may need to be modified by Sequencing Centres to make it compatible with the data being uploaded.
Genomics England will verify, as much as possible, that uploaded files and associated resources are compatible with NGIS, but Sequencing Centres are responsible for ensuring input files and parameters supplied are correct.
If you have any questions please contact Genomics England Service Desk here
Example data flow¶
Uploading FASTQ Files after demultiplexing with DRAGEN using the upload_dragen_run
script.
1. Configure CGP Client¶
You will first need to configure your cgpclient. The following is the basic config required:
ods_code: XXXXXXX # your ODS code, this will be used to associate all resources with your organisation
verbose: true # print verbose output to the console, for even more detail you can use --debug or debug: true
dry_run: true # use to test the upload without uploading, you can also use the --dry_run command line argument, exclude or set to false to upload the data
override_api_base_url: true # needed when testing in non-live environments
api_host: XXXXXX # will be shared
api_key: XXXXXXX # will be shared
For full details on configuration options of the cgpclient, see configuration.
2. Demultiplex the Sequencing Run¶
Use the DRAGEN software (version: >=4.*.*
) to demultiplex the entire sequencing run. This will:
- Generate the FASTQ files.
- Create a file named
fastq_list.csv
. - Create a
RunInfo.xml
metadata file. - All these will be stored in a run folder, and we suggest you use the run folder name as the
--run_id
to uniquely identify the sequencing run.
Refer to the offical DRAGEN documentation on the "FASTQ CSV File Format" for details on the fastq_list.csv
file.
The following is a basic example:
RGID,RGSM,RGLB,Lane,Read1File,Read2File
GACTGAGTAG.CACTATCAAC.1,my_sample_id,UnknownLibrary,1,my_sample_id_S1_L001_R1_001.fastq.ora,my_sample_id_S1_L001_R2_001.fastq.ora
3. Upload FASTQ Files¶
Info
You will need the NGIS referral and participant IDs to run the script to associate the files with the correct referral.
It is anticipated the Sequencing Centres will have been sent these when the DNA was sent to them by the GLH ordering the test (this may be the same GLH as the Sequencing Centre)
Use the upload_dragen_run
script with the following command:
cgpclient/scripts/upload_dragen_run \
--run_id {DRAGEN run ID}
--run_info_file {path to DRAGEN RunInfo.xml file} (optional)
--sample_id {someid} \
--fastq_list {path to fastq list csv file from Dragen} \
--participant_id {NGIS participant ID} \
--referral_id {NGIS referral ID} \
--config_file {path to cgpclient config file} (if you keep your config in ~/.cgpclient/config.yaml this file will be read by default and you don't need to specify it here)
- Replace
{someid}
with the value ofRGSM
from thefastq_list.csv
file for the sample you want to upload. If not supplied this script will use the first RGSM value found -
Repeat this command for each unique sample (as listed in the RGSM column) that has files to be uploaded.
-
For a DRAGEN run the {DRAGEN run ID} should be the run folder name, e.g.
240627_M03456_0001_AHCYL3XY
. You can also optionally attach the DRAGENRunInfo.xml
file to the upload using the--run_info_file
argument, in which case the file will be uploaded to the CGP and associated with the sample and run like the FASTQs.
The script will go through each row in the fastq_list.csv
file and upload only the files for the <someid\>
and ignore all the others.
4. Upload Process and Resource Creation¶
Once executed:
- All Read 1 and Read 2 files (gz or ora compressed) for the specified sample will be uploaded to the CGP Object Store.
- Associated HL7 FHIR and GA4GH DRS resources will be created in the Clinical Data Store.
- For ora compressed files, the appropriate ora reference is determined based on the specified DRAGEN version.
At the time of writing there is a single ora reference for humans associated with DRAGEN >= v4 which we will use by default for handling ora compressed files.
See the DRAGEN documentation for more information
5. Upload Results¶
- Large files may take time to upload, log messages will be shown on the terminal.
- Successful uploads will return confirmation messages.
- Errors will be reported with relevant details.
6. Post-Upload Association¶
After upload:
- FASTQ files will be linked to the corresponding NGIS participant and referral.
- The NGIS pipeline will proceed once all required data has been verified.
7. Check uploaded files¶
./cgpclient/scripts/list_files \
--participant_id {NGIS participant ID} \
--referral_id {NGIS referral ID} \
--config_file {path to cgpclient config file} (if you keep your config in ~/.cgpclient/config.yaml this file will be read by default and you don't need to specify it here)
Troubleshooting¶
How do I check if a file already exists for my referral / patient?¶
TBC
How do I remove a file uploaded by mistake?¶
TBC
How do I re-upload files for a patient?¶
Either:
From the same sequencing run I have already uploaded¶
Re-run the upload_dragen_run
script using the same parameters as was used initially. The newly uploaded data will... tbc
From a new sequencing run¶
a) using the same DNA sample
TBC
b) using a new DNA sample
TBC