Skip to content

Uploading files

Please note that this script is currently under development.

The cgpclient can be used to upload files (e.g. FASTQs) from NHS Sequencing Centres to the CGP.

We have included an example script in the scripts directory for uploading FASTQ files generated after demultiplexing with the Dragen software.

Adapting to each GLH

This is an example script and may need to be modified by Sequencing Centres to make it compatible with the data being uploaded.

Genomics England will verify, as much as possible, that uploaded files and associated resources are compatible with NGIS, but Sequencing Centres are responsible for ensuring input files and parameters supplied are correct.

If you have any questions please contact Genomics England Service Desk here

Example data flow

Uploading FASTQ Files after demultiplexing with Dragen using the upload_dragen_fastq_list.py script.

1. Configure CGP Client

You will first need to configure your cgpclient. The following is the basic config required:

debug: true
output_dir: /tmp/output
api_host: XXXXXX # will be shared
override_api_base_url: true # needed when testing in non-live environments
api_key: XXXXXXX # will be shared
dry_run: true # use to test the upload without uploading
ods_code: XXXXXXX # your ODS code

For full details on configuration options of the cgpclient, see configuration.

2. Demultiplex the Sequencing Run

Use the Dragen software (version: >=4.*.*) to demultiplex the entire sequencing run. This will:

  • Generate the FASTQ files.
  • Create a file named fastq_list.csv.

Refer to the offical Dragen documentation on the "FASTQ CSV File Format" for details on the fastq_list.csv file.

3. Upload FASTQ Files

Use the upload_dragen_fastq_list.py script with the following command:

python cgpclient/scripts/upload_dragen_fastq_list.py \
  --fastq_list_sample_id {someid} \
  --fastq_list {path to fastq list csv file from Dragen} \
  --ngis_participant_id {NGIS participant ID} \
  --ngis_referral_id {NGIS referral ID} \
  --config_file {path to cgpclient config file}
  • Replace {someid} with the value of RGSM from the fastq_list.csv file for the sample you want to upload. If not supplied this script will use the first RGSM value found
  • Repeat this command for each unique sample (as listed in the RGSM column) that has files to be uploaded.

The script will go through each row in the fastq_list.csv file and upload only the files for the <someid\> and ignore all the others.

4. Upload Process and Resource Creation

Once executed:

  • All Read 1 and Read 2 files (gz or ora compressed) for the specified sample will be uploaded to the CGP Object Store.
  • Associated HL7 FHIR and GA4GH DRS resources will be created in the Clinical Data Store.
  • For ora compressed files, the appropriate ora reference is determined based on the specified Dragen version.

At the time of writing there is a single ora reference for humans associated with Dragen >= v4 which we will use by default for handling ora compressed files.

See the Dragen documentation for more information

5. Upload Results

  • Large files may take time to upload, log messages will be shown on the terminal.
  • Successful uploads will return confirmation messages.
  • Errors will be reported with relevant details.

6. Post-Upload Association

After upload:

  • FASTQ files will be linked to the corresponding NGIS participant and referral.
  • The NGIS pipeline will proceed once all required data has been verified.