GuidesWhat is the difference between single-genome and multi-genome FASTA formats?

What is the difference between single-genome and multi-genome FASTA formats?

When creating a submission, the first step is to declare how your genomic sequence data is formatted. This choice is critical as it tells the AGARI platform how to correctly link your metadata records to their corresponding sequences. The platform supports two primary formats.

1. Single-Genome FASTA Files

This format involves using one or more separate FASTA files, where each file contains the sequence for a single genome.

When to Use It

This approach is recommended for organisms with larger genomes, such as bacteria or parasites. Pathogens like Cholera, Klebsiella pneumoniae, and Malaria typically use this format.

How It Works

To link the metadata to the correct sequence file, your metadata TSV file must include a column named sequenceFileName. The value in this column for each row must exactly match the name of the corresponding FASTA file you are uploading.

Example

If you upload two FASTA files named cholera_sample_A.fasta and cholera_sample_B.fasta, your metadata file would look like this:

isolateIdsampleId...sequenceFileName
ISO-001SAMP-001...cholera_sample_A.fasta
ISO-002SAMP-002...cholera_sample_B.fasta

2. Multi-Genome FASTA File

This format uses a single FASTA file that contains the sequences for many different isolates.

When to Use It

This is common for organisms with smaller genomes, such as viruses. Pathogens like COVID-19 and Mpox often use this format.

How It Works

To link the metadata, the header line for each sequence record in your FASTA file must start with a > symbol followed immediately by an identifier that exactly matches the value in the isolateId column of your metadata TSV file. Any additional information in the FASTA header after the ID will be ignored.

Example

If you have two isolates, SARSCOV2-001 and SARSCOV2-002, your files would look like this:

Metadata File (e.g., covid_samples.tsv):

isolateIdsampleId...
SARSCOV2-001SAMP-A...
SARSCOV2-002SAMP-B...

Sequence File (e.g., sequences.fasta):

>SARSCOV2-001 some other info...
      ATGC...GATTACA
      >SARSCOV2-002 some other info...
      ATGC...GATTACA

© 2025 AGARI. All rights reserved.