Current Draft
š This draft is now deprecated due to changes in the structure of the MInAS project to better align with the MIxS infrastructure. Please see the latest drafts on the other pages in this section.
The current version is v0.0.2
.
This can be considered a pre-alpha version, and has not been reviewed nor approved by the wider palaeogenomics community nor by the Genomics Standards Consortium.
A commentable early draft of the MInAS checklist written by members of the SPAAM community can be found here, otherwise a simplified current version of only MInAS related columns is rendered below.
For the current release of the base MIxS checklists, please see the GenSC website.
Structured comment name | Item (rdfs:label) | Definition | Expected value | Value syntax | Example | Section | minas | Preferred unit | Occurence | MIXS ID | Modification Suggestion | Requires Further Discussion | Reason for further discussion |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
samp_name | sample name | A local identifier or name that for the material sample used for extracting nucleic acids, and subsequent sequencing. It can refer either to the original material collected or to any derived sub-samples. It can have any format, but we suggest that you make it concise, unique and consistent within your lab, and as informative as possible. INSDC requires every sample name from a single Submitter to be unique. Use of a globally unique identifier for the field source_mat_id is recommended in addition to sample_name. | text | {text} | ISDsoil1 | investigation | M | nan | 1 | MIXS:0001107 | Definition update: clarify it's a DNA lab code (museum ID should go in source_mat_id) | N | nan |
samp_taxon_id | taxonomy ID of DNA sample | NCBI taxon id of the sample. Maybe be a single taxon or mixed taxa sample. Use 'synthetic metagenomeā for mock community/positive controls, or 'blank sample' for negative controls. | Taxonomy ID | {text} [NCBI:txid] | Gut Metagenome [NCBI:txid749906] | investigation | M | nan | 1 | MIXS:0001320 | nan | N | nan |
project_name | project name | Name of the project within which the sequencing was organized | nan | {text} | Forest soil metagenome | investigation | M | nan | 1 | MIXS:0000092 | nan | N | nan |
lat_lon | geographic location (latitude and longitude) | The geographical origin of the sample as defined by latitude and longitude. The values should be reported in decimal degrees and in WGS84 system | decimal degrees, limit to 8 decimal points | {float} {float} | 50.586825 6.408977 | environment | C | nan | 1 | MIXS:0000009 | None - but should be discussed for cases when imprecision may be perferred (e.g. to prevent looting of the site - i.e. is there a minimum level required?) with GSC to find a good solution in cases | Y | What level of imprecision is allowed by GSC? |
depth | depth | The vertical distance below local surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectively. Depth can be reported as an interval for subsurface samples. | measurement value | {float} {unit} | 10 meter | environment | E | meter | 1 | MIXS:0000018 | Environment-dependent, i.e. minas-environmental | N | nan |
elev | elevation | Elevation of the sampling site is its height above a fixed reference point, most commonly the mean sea level. Elevation is mainly used when referring to points on the earth's surface, while altitude is used for points above the surface, such as an aircraft in flight or a spacecraft in orbit. | measurement value | {float} {unit} | 100 meter | environment | X | nan | 1 | MIXS:0000093 | nan | N | nan |
temp | temperature | Temperature of the sample at the time of sampling. | measurement value | {float} {unit} | 25 degree Celsius | environment | X | degree Celsius | 1 | MIXS:0000113 | Definition update: For ancient samples, this can be temperature of marine sediments, burial environment, cave atmosphere | N | nan |
geo_loc_name | geographic location (country and/or sea,region) | The geographical origin of the sample as defined by the country or sea name followed by specific region name. Country or sea names should be chosen from the INSDC country list (http://insdc.org/country.html), or the GAZ ontology (http://purl.bioontology.org/ontology/GAZ) | country or sea name (INSDC or GAZ): region(GAZ), specific location name | {term}: {term}, {text} | USA: Maryland, Bethesda | environment | M | nan | 1 | MIXS:0000010 | Definition update: in cases of ancient locations, use the name of the present day county that the location is based in. | N | nan |
collection_date | collection date | The time of sampling, either as an instance (single point in time) or interval. In case no exact time is available, the date/time can be right truncated i.e. all of these are valid times: 2008-01-23T19:23:10+00:00; 2008-01-23T19:23:10; 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliant | date and time | {timestamp} | 2018-05-11T10:00:00+01:00; 2018-05-11 | environment | C | nan | 1 | MIXS:0000011 | Definition update: For ancient samples, the date of the drilling or subsampling from the main specimen, the sub-sample of which is used for DNA extraction. | N | nan |
neg_cont_type | negative control type | The substance or equipment used as a negative control in an investigation | enumeration or text | [distilled water|phosphate buffer|empty collection device|empty collection tube|DNA-free PCR mix|sterile swab |sterile syringe] | nan | investigation | C | nan | 1 | MIXS:0001321 | nan | N | nan |
env_broad_scale | broad-scale environmental context | Report the major environmental system the sample or specimen came from. The system(s) identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. in the desert or a rainforest). We recommend using subclasses of EnvOās biome class: http://purl.obolibrary.org/obo/ENVO_00000428. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS | The major environment type(s) where the sample was collected. Recommend subclasses of biome [ENVO:00000428]. Multiple terms can be separated by one or more pipes. | {termLabel} {[termID]} | oceanic epipelagic zone biome [ENVO:01000033] for annotating a water sample from the photic zone in middle of the Atlantic Ocean | environment | M | nan | 1 | MIXS:0000012 | Definition update: ENVO terms should be reported. | Y | If proposed as mandatory, what should be done with museum accessions that have limited provenance (e.g. country and date) |
env_local_scale | local environmental context | Report the entity or entities which are in the sample or specimenās local vicinity and which you believe have significant causal influences on your sample or specimen. We recommend using EnvO terms which are of smaller spatial grain than your entry for env_broad_scale. Terms, such as anatomical sites, from other OBO Library ontologies which interoperate with EnvO (e.g. UBERON) are accepted in this field. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS. | Environmental entities having causal influences upon the entity at time of sampling. | {termLabel} {[termID]} | litter layer [ENVO:01000338]; Annotating a pooled sample taken from various vegetation layers in a forest consider: canopy [ENVO:00000047]|herb and fern layer [ENVO:01000337]|litter layer [ENVO:01000338]|understory [01000335]|shrub layer [ENVO:01000336]. | environment | M | nan | 1 | MIXS:0000013 | Definition update: ENVO terms should be reported | Y | If proposed as mandatory, what should be done with museum accessions that have limited provenance (e.g. country and date) |
env_medium | environmental medium | Report the environmental material(s) immediately surrounding the sample or specimen at the time of sampling. We recommend using subclasses of 'environmental material' (http://purl.obolibrary.org/obo/ENVO_00010483). EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS . Terms from other OBO ontologies are permissible as long as they reference mass/volume nouns (e.g. air, water, blood) and not discrete, countable entities (e.g. a tree, a leaf, a table top). | The material displaced by the entity at time of sampling. Recommend subclasses of environmental material [ENVO:00010483]. | {termLabel} {[termID]} | soil [ENVO:00001998]; Annotating a fish swimming in the upper 100 m of the Atlantic Ocean, consider: ocean water [ENVO:00002151]. Example: Annotating a duck on a pond consider: pond water [ENVO:00002228]|air [ENVO_00002005] | environment | M | nan | 1 | MIXS:0000014 | Definition update: ENVO terms should be reported | Y | If proposed as mandatory, what should be done with museum accessions that have limited provenance (e.g. country and date) |
subspecf_gen_lin | subspecific genetic lineage | Information about the genetic distinctness of the sequenced organism below the subspecies level, e.g., serovar, serotype, biotype, ecotype, or any relevant genetic typing schemes like Group I plasmid. Subspecies should not be recorded in this term, but in the NCBI taxonomy. Supply both the lineage name and the lineage rank separated by a colon, e.g., biovar:abc123. | Genetic lineage below lowest rank of NCBI taxonomy, which is subspecies, e.g. serovar, biotype, ecotype, variety, cultivar. | {rank name}:{text} | serovar:Newport | nucleic acid sequence source | X | nan | 1 | MIXS:0000020 | nan | N | nan |
ploidy | ploidy | The ploidy level of the genome (e.g. allopolyploid, haploid, diploid, triploid, tetraploid). It has implications for the downstream study of duplicated gene and regions of the genomes (and perhaps for difficulties in assembly). For terms, please select terms listed under class ploidy (PATO:001374) of Phenotypic Quality Ontology (PATO), and for a browser of PATO (v 2018-03-27) please refer to http://purl.bioontology.org/ontology/PATO | PATO | {termLabel} {[termID]} | allopolyploidy [PATO:0001379] | nucleic acid sequence source | X | nan | 1 | MIXS:0000021 | nan | N | nan |
num_replicons | number of replicons | Reports the number of replicons in a nuclear genome of eukaryotes, in the genome of a bacterium or archaea or the number of segments in a segmented virus. Always applied to the haploid chromosome count of a eukaryote | for eukaryotes and bacteria: chromosomes (haploid count); for viruses: segments | {integer} | 2 | nucleic acid sequence source | X | nan | 1 | MIXS:0000022 | nan | N | nan |
extrachrom_elements | extrachromosomal elements | Do plasmids exist of significant phenotypic consequence (e.g. ones that determine virulence or antibiotic resistance). Megaplasmids? Other plasmids (borrelia has 15+ plasmids) | number of extrachromosmal elements | {integer} | 5 | nucleic acid sequence source | X | nan | 1 | MIXS:0000023 | nan | N | nan |
ref_biomaterial | reference for biomaterial | Primary publication if isolated before genome publication; otherwise, primary genome report. | PMID, DOI or URL | {PMID}|{DOI}|{URL} | doi:10.1016/j.syapm.2018.01.009 | nucleic acid sequence source | X | nan | 1 | MIXS:0000025 | nan | N | nan |
source_mat_id | source material identifiers | A unique identifier assigned to a material sample (as defined by http://rs.tdwg.org/dwc/terms/materialSampleID, and as opposed to a particular digital record of a material sample) used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples. The INSDC qualifiers /specimen_voucher, /bio_material, or /culture_collection may or may not share the same value as the source_mat_id field. For instance, the /specimen_voucher qualifier and source_mat_id may both contain 'UAM:Herps:14' , referring to both the specimen voucher and sampled tissue with the same identifier. However, the /culture_collection qualifier may refer to a value from an initial culture (e.g. ATCC:11775) while source_mat_id would refer to an identifier from some derived culture from which the nucleic acids were extracted (e.g. xatc123 or ark:/2154/R2). | for cultures of microorganisms: identifiers for two culture collections; for other material a unique arbitrary identifer | {text} | MPI012345 | nucleic acid sequence source | M | nan | m | MIXS:0000026 | Definition update: For ancient samples, this is for example where archaeological/museum collection ID goes. May be duplicate with samp_name. | N | nan |
specific_host | host scientific name | Report the host's taxonomic name and/or NCBI taxonomy ID. | host scientific name, taxonomy ID | {text}|{NCBI taxid} | Homo sapiens and/or 9606 | nucleic acid sequence source | C | nan | 1 | MIXS:0000029 | Condition: for host(-associated) samples only | N | nan |
host_disease_stat | host disease status | List of diseases with which the host has been diagnosed; can include multiple diagnoses. The value of the field depends on host; for humans the terms should be chosen from the DO (Human Disease Ontology) at https://www.disease-ontology.org, non-human host diseases are free text | disease name or Disease Ontology term | {termLabel} {[termID]}|{text} | rabies [DOID:11260] | nucleic acid sequence source | X | nan | m | MIXS:0000031 | nan | N | nan |
samp_collec_method | sample collection method | The method employed for collecting the sample. | PMID,DOI,url , or text | {PMID}|{DOI}|{URL}|{text} | swabbing | nucleic acid sequence source | M | nan | 1 | MIXS:0001225 | Have 'novel' as a selection option. | Y | [JAFY] Doesn't 'free text' count for this |
samp_mat_process | sample material processing | A brief description of any processing applied to the sample during or after retrieving the sample from environment, or a link to the relevant protocol(s) performed. | text | {text} | filtering of seawater, storing samples in ethanol | nucleic acid sequence source | ??? | nan | 1 | MIXS:0000016 | nan | N | nan |
size_frac | size fraction selected | Filtering pore size used in sample preparation | filter size value range | {float}-{float} {unit} | 0-0.22 micrometer | nucleic acid sequence source | ??? | nan | 1 | MIXS:0000017 | nan | N | nan |
samp_size | amount or size of sample collected | The total amount or size (volume (ml), mass (g) or area (m2) ) of sample collected. | measurement value | {float} {unit} | 5 liter | nucleic acid sequence source | ??? | millliter, gram, milligram, liter | 1 | MIXS:0000001 | nan | N | nan |
samp_vol_we_dna_ext | sample volume or weight for DNA extraction | Volume (ml) or mass (g) of total collected sample processed for DNA extraction. Note: total sample collected should be entered under the term Sample Size (MIXS:0000001). | measurement value | {float} {unit} | 1500 milliliter | nucleic acid sequence source | M | millliter, gram, milligram, square centimeter | 1 | MIXS:0000111 | nan | N | nan |
virus_enrich_appr | virus enrichment approach | List of approaches used to enrich the sample for viruses, if any | enumeration | [filtration|ultrafiltration|centrifugation|ultracentrifugation|PEG Precipitation|FeCl Precipitation|CsCl density gradient|DNAse|RNAse|targeted sequence capture|other|none] | filtration + FeCl Precipitation + ultracentrifugation + DNAse | nucleic acid sequence source | ??? | nan | 1 | MIXS:0000036 | nan | N | nan |
nucl_acid_ext | nucleic acid extraction | A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the material separation to recover the nucleic acid fraction from a sample | PMID, DOI or URL | {PMID}|{DOI}|{URL} | https://mobio.com/media/wysiwyg/pdfs/protocols/12888.pdf | sequencing | M | nan | 1 | MIXS:0000037 | Have 'novel' as a selection option. | Y | [JAFY] Wouldn't you cite your own paper if it's novel? I think it makes more sense to keep it as C as this is the same for all other MIXS checklists |
nucl_acid_amp | nucleic acid amplification | A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the enzymatic amplification (PCR, TMA, NASBA) of specific nucleic acids | PMID, DOI or URL | {PMID}|{DOI}|{URL} | https://phylogenomics.me/protocols/16s-pcr-protocol/ | sequencing | ??? | nan | 1 | MIXS:0000038 | Have 'novel' as a selection option. | Y | [JAFY] Wouldn't you cite your own paper if it's novel? I think it makes more sense to keep it as C as this is the same for all other MIXS checklists |
lib_reads_seqd | library reads sequenced | Total number of clones sequenced from the library | number of reads sequenced | {integer} | 20 | sequencing | ??? | nan | 1 | MIXS:0000040 | Definition update: Total number of reads sequenced from the library | N | nan |
lib_layout | library layout | Specify whether to expect single, paired, or other configuration of reads | enumeration | [paired|single|vector|other] | paired | sequencing | M | nan | 1 | MIXS:0000041 | nan | N | nan |
lib_screen | library screening strategy | Specific enrichment or screening methods applied before and/or after creating libraries | screening strategy name | {text} | enriched, screened, normalized | sequencing | ??? | nan | 1 | MIXS:0000043 | Definition update: Suggest splitting screening and enrichment, as these mean different things (at least in aDNA) | N | nan |
target_gene | target gene | Targeted gene or locus name for marker gene studies | gene name | {text} | 16S rRNA, 18S rRNA, nif, amoA, rpo | sequencing | X | nan | 1 | MIXS:0000044 | nan | N | nan |
target_subfragment | target subfragment | Name of subfragment of a gene or locus. Important to e.g. identify special regions on marker genes like V6 on 16S rRNA | gene fragment name | {text} | V6, V9, ITS | sequencing | X | nan | 1 | MIXS:0000045 | nan | N | nan |
pcr_primers | pcr primers | PCR primers that were used to amplify the sequence of the targeted gene, locus or subfragment. This field should contain all the primers used for a single PCR reaction if multiple forward or reverse primers are present in a single PCR reaction. The primer sequence should be reported in uppercase letters | FWD: forward primer sequence;REV:reverse primer sequence | FWD:{dna};REV:{dna} | FWD:GTGCCAGCMGCCGCGGTAA;REV:GGACTACHVGGGTWTCTAAT | sequencing | X | nan | 1 | MIXS:0000046 | Definition update: Add in 5' to 3' orientation. | N | nan |
mid | multiplex identifiers | Molecular barcodes, called Multiplex Identifiers (MIDs), that are used to specifically tag unique samples in a sequencing run. Sequence should be reported in uppercase letters | multiplex identifier sequence | {dna} | GTGAATAT | sequencing | ??? | nan | 1 | MIXS:0000047 | Definition update: Specify if single or dual tagged/indexed (following similar structure as adapters?) | N | nan |
adapters | adapters | Adapters provide priming sequences for both amplification and sequencing of the sample-library fragments. Both adapters should be reported; in uppercase letters | adapter A and B sequence | {dna};{dna} | AATGATACGGCGACCACCGAGATCTACACGCT;CAAGCAGAAGACGGCATACGAGAT | sequencing | ??? | nan | 1 | MIXS:0000048 | nan | N | nan |
pcr_cond | pcr conditions | Description of reaction conditions and components of PCR in the form of 'initial denaturation:94degC_1.5min; annealing=...' | initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cycles | initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cycles | initial denaturation:94_3;annealing:50_1;elongation:72_1.5;final elongation:72_10;35 | sequencing | ??? | nan | 1 | MIXS:0000049 | Definition update: Targeted PCR or library PCR?; add a DOI requirement | N | nan |
seq_meth | sequencing method | Sequencing machine used. Where possible the term should be taken from the OBI list of DNA sequencers (http://purl.obolibrary.org/obo/OBI_0400103). | Text or OBI | {termLabel} {[termID]}|{text} | 454 Genome Sequencer FLX [OBI:0000702] | sequencing | M | nan | 1 | MIXS:0000050 | nan | N | nan |
chimera_check | chimera check software | Tool(s) used for chimera checking, including version number and parameters, to discover and remove chimeric sequences. A chimeric sequence is comprised of two or more phylogenetically distinct parent sequences. | name and version of software, parameters used | {software};{version};{parameters} | uchime;v4.1;default parameters | sequencing | C | nan | 1 | MIXS:0000052 | nan | N | nan |
tax_ident | taxonomic identity marker | The phylogenetic marker(s) used to assign an organism name to the SAG or MAG | enumeration | [16S rRNA gene|multi-marker approach|other] | other: rpoB gene | sequencing | C | nan | 1 | MIXS:0000053 | nan | N | nan |
assembly_qual | assembly quality | The assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling ā„ 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimated | enumeration | [Finished genome|High-quality draft genome|Medium-quality draft genome|Low-quality draft genome|Genome fragment(s)] | High-quality draft genome | sequencing | C | nan | 1 | MIXS:0000056 | Condition: Conditional on assembly performed - which type of assembly applies here? | N | nan |
assembly_name | assembly name | Name/version of the assembly provided by the submitter that is used in the genome browsers and in the community | name and version of assembly | {text} {text} | HuRef, JCVI_ISG_i3_1.0 | sequencing | C | nan | 1 | MIXS:0000057 | Condition: Conditional on assembly performed - which type of assembly applies here? | N | nan |
assembly_software | assembly software | Tool(s) used for assembly, including version number and parameters | name and version of software, parameters used | {software};{version};{parameters} | metaSPAdes;3.11.0;kmer set 21,33,55,77,99,121, default parameters otherwise | sequencing | C | nan | 1 | MIXS:0000058 | Condition: Conditional on assembly performed - which type of assembly applies here? | N | nan |
annot | annotation | Tool used for annotation, or for cases where annotation was provided by a community jamboree or model organism database rather than by a specific submitter | name of tool or pipeline used, or annotation source description | {text} | prokka | sequencing | X | nan | 1 | MIXS:0000059 | nan | N | nan |
number_contig | number of contigs | Total number of contigs in the cleaned/submitted assembly that makes up a given genome, SAG, MAG, or UViG | value | {integer} | 40 | sequencing | X | nan | 1 | MIXS:0000060 | nan | N | nan |
feat_pred | feature prediction | Method used to predict UViGs features such as ORFs, integration site, etc. | names and versions of software(s), parameters used | {software};{version};{parameters} | Prodigal;2.6.3;default parameters | sequencing | X | nan | 1 | MIXS:0000061 | nan | N | nan |
ref_db | reference database(s) | List of database(s) used for ORF annotation, along with version number and reference to website or publication | names, versions, and references of databases | {database};{version};{reference} | pVOGs;5;http://dmk-brain.ecn.uiowa.edu/pVOGs/ Grazziotin et al. 2017 doi:10.1093/nar/gkw975 | sequencing | X | nan | 1 | MIXS:0000062 | nan | N | nan |
sim_search_meth | similarity search method | Tool used to compare ORFs with database, along with version and cutoffs used | names and versions of software(s), parameters used | {software};{version};{parameters} | HMMER3;3.1b2;hmmsearch, cutoff of 50 on score | sequencing | X | nan | 1 | MIXS:0000063 | nan | N | nan |
tax_class | taxonomic classification | Method used for taxonomic classification, along with reference database used, classification rank, and thresholds used to classify new genomes | classification method, database name, and other parameters | {text} | vConTACT vContact2 (references from NCBI RefSeq v83, genus rank classification, default parameters) | sequencing | X | nan | 1 | MIXS:0000064 | nan | N | nan |
16s_recover | 16S recovered | Can a 16S gene be recovered from the submitted SAG or MAG? | boolean | {boolean} | yes | sequencing | X | nan | 1 | MIXS:0000065 | nan | N | nan |
16s_recover_software | 16S recovery software | Tools used for 16S rRNA gene extraction | names and versions of software(s), parameters used | {software};{version};{parameters} | rambl;v2;default parameters | sequencing | X | nan | 1 | MIXS:0000066 | nan | N | nan |
trnas | number of standard tRNAs extracted | The total number of tRNAs identified from the SAG or MAG | value from 0-21 | {integer} | 18 | sequencing | X | nan | 1 | MIXS:0000067 | nan | N | nan |
trna_ext_software | tRNA extraction software | Tools used for tRNA identification | names and versions of software(s), parameters used | {software};{version};{parameters} | infernal;v2;default parameters | sequencing | X | nan | 1 | MIXS:0000068 | nan | N | nan |
compl_score | completeness score | Completeness score is typically based on either the fraction of markers found as compared to a database or the percent of a genome found as compared to a closely related reference genome. High Quality Draft: >90%, Medium Quality Draft: >50%, and Low Quality Draft: < 50% should have the indicated completeness scores | quality;percent completeness | [high|med|low];{percentage} | med;60% | sequencing | C | nan | 1 | MIXS:0000069 | Condition: Conditional on assembly performed | N | nan |
compl_software | completeness software | Tools used for completion estimate, i.e. checkm, anvi'o, busco | names and versions of software(s) used | {software};{version} | checkm | sequencing | C | nan | 1 | MIXS:0000070 | Condition: Conditional on assembly performed | N | nan |
compl_appr | completeness approach | The approach used to determine the completeness of a given genomic assembly, which would typically make use of a set of conserved marker genes or a closely related reference genome. For UViG completeness, include reference genome or group used, and contig feature suggesting a complete genome | text | [marker gene|reference based|other] | other: UViG length compared to the average length of reference genomes from the P22virus genus (NCBI RefSeq v83) | sequencing | C | nan | 1 | MIXS:0000071 | Condition: Conditional on assembly performed | N | nan |
contam_score | contamination score | The contamination score is based on the fraction of single-copy genes that are observed more than once in a query genome. The following scores are acceptable for; High Quality Draft: < 5%, Medium Quality Draft: < 10%, Low Quality Draft: < 10%. Contamination must be below 5% for a SAG or MAG to be deposited into any of the public databases | value | {float} percentage | 1% | sequencing | C | nan | 1 | MIXS:0000072 | Condition: Conditional on contamination estimation | N | nan |
contam_screen_input | contamination screening input | The type of sequence data used as input | enumeration | [reads| contigs] | contigs | sequencing | X | nan | 1 | MIXS:0000005 | nan | N | nan |
contam_screen_param | contamination screening parameters | Specific parameters used in the decontamination sofware, such as reference database, coverage, and kmers. Combinations of these parameters may also be used, i.e. kmer and coverage, or reference database and kmer | enumeration;value or name | [ref db|kmer|coverage|combination];{text|integer} | kmer | sequencing | X | nan | 1 | MIXS:0000073 | nan | N | nan |
decontam_software | decontamination software | Tool(s) used in contamination screening | enumeration | [checkm/refinem|anvi'o|prodege|bbtools:decontaminate.sh|acdc|combination] | anvi'o | sequencing | X | nan | 1 | MIXS:0000074 | nan | N | nan |
sort_tech | sorting technology | Method used to sort/isolate cells or particles of interest | enumeration | [flow cytometric cell sorting|microfluidics|lazer-tweezing|optical manipulation|micromanipulation|other] | optical manipulation | sequencing | C | nan | 1 | MIXS:0000075 | nan | N | nan |
single_cell_lysis_appr | single cell or viral particle lysis approach | Method used to free DNA from interior of the cell(s) or particle(s) | enumeration | [chemical|enzymatic|physical|combination] | enzymatic | sequencing | C | nan | 1 | MIXS:0000076 | Use a non-single cell version?; DOI/PMID for custom protocols | Y | [JAFY] Isn't this mutually exclusive |
single_cell_lysis_prot | single cell or viral particle lysis kit protocol | Name of the kit or standard protocol used for cell(s) or particle(s) lysis | kit, protocol name | {text} | ambion single cell lysis kit | sequencing | C | nan | 1 | MIXS:0000054 | Use a non-single cell version? | Y | [JAFY] Isn't this mutually exclusive |
wga_amp_appr | WGA amplification approach | Method used to amplify genomic DNA in preparation for sequencing | enumeration | [pcr based|mda based] | mda based | sequencing | C | nan | 1 | MIXS:0000055 | Use a non-single cell version?; DOI/PMID for custom protocols | Y | [JAFY] Isn't this mutually exclusive |
wga_amp_kit | WGA amplification kit | Kit used to amplify genomic DNA in preparation for sequencing | kit name | {text} | qiagen repli-g | sequencing | C | nan | 1 | MIXS:0000006 | Use a non-single cell version? | Y | [JAFY] Isn't this mutually exclusive |
bin_param | binning parameters | The parameters that have been applied during the extraction of genomes from metagenomic datasets | enumeration | [homology search|kmer|coverage|codon usage|combination] | coverage and kmer | sequencing | C | nan | 1 | MIXS:0000077 | nan | N | nan |
bin_software | binning software | Tool(s) used for the extraction of genomes from metagenomic datasets, where possible include a product ID (PID) of the tool(s) used. | names and versions of software(s) used | {software};{version}{PID} | MetaCluster-TA (RRID:SCR_004599), MaxBin (biotools:maxbin) | sequencing | C | nan | 1 | MIXS:0000078 | nan | N | nan |
reassembly_bin | reassembly post binning | Has an assembly been performed on a genome bin extracted from a metagenomic assembly? | boolean | {boolean} | no | sequencing | C | nan | 1 | MIXS:0000079 | nan | N | nan |
mag_cov_software | MAG coverage software | Tool(s) used to determine the genome coverage if coverage is used as a binning parameter in the extraction of genomes from metagenomic datasets | enumeration | [bwa|bbmap|bowtie|other] | bbmap | sequencing | C | nan | 1 | MIXS:0000080 | nan | N | nan |
vir_ident_software | viral identification software | Tool(s) used for the identification of UViG as a viral genome, software or protocol name including version number, parameters, and cutoffs used | software name, version and relevant parameters | {software};{version};{parameters} | VirSorter; 1.0.4; Virome database, category 2 | sequencing | C | nan | 1 | MIXS:0000081 | nan | N | nan |
pred_genome_type | predicted genome type | Type of genome predicted for the UViG | enumeration | [DNA|dsDNA|ssDNA|RNA|dsRNA|ssRNA|ssRNA (+)|ssRNA (-)|mixed|uncharacterized] | dsDNA | sequencing | C | nan | 1 | MIXS:0000082 | nan | N | nan |
pred_genome_struc | predicted genome structure | Expected structure of the viral genome | enumeration | [segmented|non-segmented|undetermined] | non-segmented | sequencing | C | nan | 1 | MIXS:0000083 | Condition: Ancient virus genome only | N | nan |
detec_type | detection type | Type of UViG detection | enumeration | [independent sequence (UViG)|provirus (UpViG)] | independent sequence (UViG) | sequencing | C | nan | 1 | MIXS:0000084 | Condition: Ancient virus genome only | N | nan |
otu_class_appr | OTU classification approach | Cutoffs and approach used when clustering āspecies-levelā OTUs. Note that results from standard 95% ANI / 85% AF clustering should be provided alongside OTUS defined from another set of thresholds, even if the latter are the ones primarily used during the analysis | cutoffs and method used | {ANI cutoff};{AF cutoff};{clustering method} | 95% ANI;85% AF; greedy incremental clustering | sequencing | C | nan | 1 | MIXS:0000085 | Condition: Ancient virus genome only | N | nan |
otu_seq_comp_appr | OTU sequence comparison approach | Tool and thresholds used to compare sequences when computing "species-level" OTUs | software name, version and relevant parameters | {software};{version};{parameters} | blastn;2.6.0+;e-value cutoff: 0.001 | sequencing | C | nan | 1 | MIXS:0000086 | Condition: Ancient virus genome only | N | nan |
otu_db | OTU database | Reference database (i.e. sequences not generated as part of the current study) used to cluster new genomes in "species-level" OTUs, if any | database and version | {database};{version} | NCBI Viral RefSeq;83 | sequencing | C | nan | 1 | MIXS:0000087 | Condition: Ancient virus genome only | N | nan |
host_pred_appr | host prediction approach | Tool or approach used for host prediction | enumeration | [provirus|host sequence similarity|CRISPR spacer match|kmer similarity|co-occurrence|combination|other] | CRISPR spacer match | sequencing | C | nan | 1 | MIXS:0000088 | Condition: Ancient virus genome only | N | nan |
host_pred_est_acc | host prediction estimated accuracy | For each tool or approach used for host prediction, estimated false discovery rates should be included, either computed de novo or from the literature | false discovery rate | {text} | CRISPR spacer match: 0 or 1 mismatches, estimated 8% FDR at the host genus rank (Edwards et al. 2016 doi:10.1093/femsre/fuv048) | sequencing | C | nan | 1 | MIXS:0000089 | Condition: Ancient virus genome only | N | nan |
associated resource | relevant electronic resources | A related resource that is referenced, cited, or otherwise associated to the sequence. | reference to resource | {PMID} | {DOI} | {URL} | http://www.earthmicrobiome.org/ | sequencing | C | nan | m | MIXS:0000091 | nan | N | nan |
sop | relevant standard operating procedures | Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences | reference to SOP | {PMID}|{DOI}|{URL} | http://press.igsb.anl.gov/earthmicrobiome/protocols-and-standards/its/ | sequencing | C | nan | m | MIXS:0000090 | nan | N | nan |
cultural_era | nan | The cultural era approximating the period in which the individual lived from https://chronontology.dainst.org/ or PeriodO. Specify when no value for 'sample_age' is present, or additionally to it. | Chronotology or PeriodO term; text | {termLabel} {[termID]}|{text} | Copper Age [Chronotology: NW6hofAScJSE] | investigation | C | nan | m | MIXS:XXXXXXX | Question: switch to free-text? Conditional to sample_age | Y | nan |
dna_extraction_date | nan | The date when the nucleic acids was extracted from the sample material. In case no exact time is available, the date can be right truncated i.e. all of these are valid times: 2008-01-23T19:23:10+00:00; 2008-01-23T19:23:10; 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliant | date | {timestamp} | 2015 | nucleic acid sequence source | M | year | 1 | MIXS:XXXXXXX | nan | N | nan |
recovery_date | nan | Date of excavation or retrieval from burial or depositional context, if known | date | {timestamp} | 1930 | investigation | X | date | 1 | MIXS:XXXXXXX | nan | N | nan |
sample_age | nan | The approximate date that individual was living and then died, or the sample was exposed to the surface. Typically inferred from archaeological material, or biological material associated with a sediment, with radiocarbon dating and other chronometric methods. Should be midpoint of calibrated radiocarbon age. | value from BP (1950) | {integer} | 123440 | investigation | C | cal BP | m | MIXS:XXXXXXX | nan | N | nan |
sample_age_inference_methods | nan | The method used to infer the sample age. Method (14C, OLS, etc.) and associated information (lab code, etc). An enumerated list with age/depths model, corralative (relative dating), annual lamination, pollen records, diatom records, physical observation, etc. | enumeration | ??? | C14 | investigation | C | nan | m | MIXS:XXXXXXX | nan | Y | [JAFY/MS] How to define, very heterogenous, maybe split: method/lab code |
site_type | nan | The type of site the sediment cores where taken. E.g. ocean, marine, freshwater, brackish, ice caves, caves. | text | {text} | cave | environment | E | nan | 1 | MIXS:XXXXXXX | nan | Y | [JAFY] can this be expanded to non-seidment like sites? Open air burials? Is there an overlap here with env_*_scale? Otherwise I think this would go to the environmental_packages |
damage_treatment | nan | Indication of whether characteristic ancient DNA damage has been removed in a laboratory | enumeration | [none|partial-udg|full-udg|enriched|other] | none | sequencing | M | nan | 1 | MIXS:XXXXXXX | nan | N | [JAFY] what if people upload BAMs of merged multiple libraries |
experimental_procedures | nan | Provide a DOI to refer to the paper where the procedure is explained in more details | PMID, DOI or URL | {PMID}|{DOI}|{URL} | nan | sequencing | X | nan | 1 | MIXS:XXXXXXX | nan | Y | [JAFY] What procedures does this refer to? Each paper will have many protocols to cite for extraction, library construction, reconditioning, etc. Duplicate of SOP? |
lib_concentration | nan | Concentration of library in copies per Āµl, as inferred by qPCR. | integer | {integer} | 123000000 | sequencing | X | copies/Āµl | 1 | MIXS:XXXXXXX | nan | N | nan |
lib_index_polymerase | nan | The name of polymerase enzyme used to index DNA libraries | text | {text} | Agilent PfuTurbo Cx HotStart | sequencing | C | nan | 1 | MIXS:XXXXXXX | nan | Y | [JAFY] Condition on what? And would including the SKU be useful to include too (as it's a more stable code) |
lib_preparation_protocol | nan | Citation(s) for the DNA library preparation protocol | text | {text} | Meyer and Kircher, 2010 | sequencing | C | nan | 1 | MIXS:XXXXXXX | nan | Y | [JAFY] Condition on what? |
lib_reamplification_polymerase | nan | The name of polymerase enzyme used for reamplifying DNA libraries | text | {text} | KAPA HiFi HotStart Uracil+ | sequencing | C | nan | 1 | MIXS:XXXXXXX | nan | Y | [JAFY] Condition on what? |
lib_type | library type | The type of library created. Amplicon based or non-amplicon based. Amplicon based, is a library that result in rather short DNA fragments while non-amplicon-based referres to non targeted approach. | enumeration | [shotgun|amplicon] | shotgun | sequencing | M | nan | 1 | MIXS:XXXXXXX | nan | Y | [JAFY] I'm not sure about the definition of amplicon here, it could be confusing - 'natural' aDNA is normally short, maybe need to rather change the definition to specify using primers targeting specific regions of the genome instead |
neg_cont_status | negative control status | Specify whether the sample is a negative control or not. | negative control status | {boolean} | nan | investigation | C | nan | 1 | MIXS:XXXXXXX | nan | Y | [JAFY] Wouldn't this be mandatory if it's boolean? A sample is either a negative control or not ... Would this also potentially not apply to all other tables? [AFG/BM] The problem with negative controls is that they come from diff batches so becomes hard to control. (M). Issue: in paleometagenome less negative controls are used as there is a more reliance on damage patterns and this is not the case for sedaDNA where negative controls should be mandatory. |
num_capture_reamp_cycles | nan | Number of amplification cycles after capture enrichment | number of amplification cycles | {integer} | 10 | sequencing | C | nan | 1 | MIXS:XXXXXXX | Condition: if performed | N | nan |
num_reamp_cycles | number of reamplification cycles | Number of amplification cycles after library indexing PCR | number of amplification cycles | {integer} | 8 | sequencing | C | nan | 1 | MIXS:XXXXXXX | Condition: if performed | N | nan |
preservational_treatment | preservation treatment | Description of any treatment applied to samples for the purpose of maximising collection preservation that may influence downstream DNA recovery or library construction, such as storage fluid or reconstructive glue | text | {text} | stored in formalin | nucleic acid sequence source | C | nan | 1 | MIXS:XXXXXXX | Condition: if present | N | nan |
sample_alt_lab_ids | alternative sample IDs | Any alternate sample IDs used in by the research group publishing the paper or other groups. If known. | text | {text} | ABC_24 | investigation | X | nan | 1 | MIXS:XXXXXXX | nan | N | nan |
samp_decontam_pretreat | sample decontamination pretreatment | Method(s) employed for surface decontamination of samples of external modern DNA; Treatment used on the samples. Depends on the sample type. More relevant for bones than environmental samples. E.g. buffers, EDTA, etc. | PMID, DOI or URL | {PMID}|{DOI}|{URL}|{text} | EDTA wash, 10.17504/protocols.io.bidyka7w | sequencing | X | nan | m | MIXS:XXXXXXX | nan | N | nan |
prev_pubs | previous publications | Any publications that report data from the same body/skeleton/individual | text | {DOI}{URL}{PMID} | 10.1126/science.7761839 | investigation | X | nan | m | MIXS:XXXXXXX | nan | Y | [AG/PH] Just DNA data, or also contextual/archaeolgoical publications? |
collection_context_name | context name where location was collected | Name of where sample originated and is typically stored. Typically will be 'owning institution' | text | {text} | Natural History London | environment | X | nan | 1 | MIXS:XXXXXXX | nan | Y | [AG/PH] e.g. Museum, community, field (want to avoid assumption that insitutuions instead of communities control samples) |
ethical_authority | ethical authority | Name of the authority or institution that awarded sampling and analysis (e.g. human remains) and/or export permission (e.g. animal remains) | text | {text} | Federal Foreign Office (Germany) | investigation | C | nan | m | MIXS:XXXXXXX | nan | Y | [JAFY] What is the condition on? |
ethical_date | date of ethical approval date | Date of award of ethical/export permission. The date can be right truncated i.e. all of these are valid times: 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliant | date | {timestamp} | 2018-05-11T10:00:00+01:00; 2018-05-11 | investigation | C | nan | m | MIXS:XXXXXXX | nan | Y | [JAFY] What is the condition on? |
ethical_id | ethical permit/approval ID | The permissions code or ID provided by the authority associated with approval of the analysis of this particular sample | text | {text} | DE-123-JK | investigation | C | nan | m | MIXS:XXXXXXX | nan | Y | [JAFY] What is the condition on? |
storage_conditions | conditions of sample storage | General conditions in which the sample was stored in long-term collection storage, that may influenced DNA recovery or library construction. For example, specify temperature, humidity, presence of microbial overgrowth etc.. | text | {text} | Climate-controlled | environment | X | nan | 1 | MIXS:XXXXXXX | nan | N | nan |
Environmental package | Structured comment name | Package item | Definition | Expected value | Value syntax | Example | Requirement | Preferred unit | Occurrence | MIXS ID | Modification Suggestion | Requires Further Discussion | Reason for further discussion | Completed |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
host-associated | environment | burial_context | Description of the burial context from which sampled indivuals were recovered | enumeration | [primary inhumation|secondary inhumation|multiple inhumation|commingled assemblage|disartuclated remains|information not available] | multiple inhumation | X | nan | 1 | MIXS:XXXXXXX | N | Y | Is there some form of ontology for such a thing? | nan |
host-associated | environment | host_biological_sex | Biological sex of the host individual | enumeration | [male|female|other|unknown] | female | X | nan | 1 | MIXS:XXXXXXX | N | N | nan | nan |
human-oral | environment | samp_sample_site | Specific location in oral cavity | enumeration | {termLabel} {[termID]} | maxilla left 3rd molar [UBERON:0002535] | X | nan | m | MIXS:XXXXXXX | N | N | nan | nan |
sediment | environment | sediment_type | The sediment type. E.g. silt, clay, organic, porous. | text | {text} | cave | X | nan | 1 | MIXS:XXXXXXX | N | Y | [PH/AI] Very important for molecular scientists to be able to distinguish between the different sediment types perhaps a blog post or workshop for the sedaDNA society from a sedimentologists | nan |
sediment | environment | sedimentation_rate | The sedimentation rate calculated from age depths models. | value | {integer} | ??? | X | nan | ??? | MIXS:XXXXXXX | N | Y | [???] but with NA as its not always available. | nan |
sediment | sampling | coring_system | Specify whether the sampling was with open or closed system. An open system is where no corers are used and the samples are exposed, e.g. in caves. A closed system is where a corer is used and the sampling procedure is ācontrolledā. | enumeration | [open|closed] | closed | X | nan | 1 | MIXS:XXXXXXX | N | N | nan | nan |
sediment | sampling | depths | The depths at which the sediments where taken from. This depends on the source of sediments. E.g. water depths. | value | {integer} | ??? | X | cm | 1 | MIXS:XXXXXXX | N | N | [???] Should influence microbial activity (M) But include NA in the options as not applicable for all samples. | nan |
sediment | sampling | stratigraphic_horizon | The depths at which the samples from the sediments were taken from the core. E.g. from a one meter long core, the sample corresponds to depths 50 cm and from caves from the surface. (M) | value | {integer} | ??? | X | ??? | 1 | MIXS:XXXXXXX | N | Y | [JAFY] the description here seems to correspond to depths, I think this needs to be revised | nan |