Current Draft

🛑 This draft is now deprecated due to changes in the structure of the MInAS project to better align with the MIxS infrastructure. Please see the latest drafts on the other pages in this section.

The current version is v0.0.2.

This can be considered a pre-alpha version, and has not been reviewed nor approved by the wider palaeogenomics community nor by the Genomics Standards Consortium.

A commentable early draft of the MInAS checklist written by members of the SPAAM community can be found here, otherwise a simplified current version of only MInAS related columns is rendered below.

For the current release of the base MIxS checklists, please see the GenSC website.

MIxSEnvironmental Packages

Structured comment name	Item (rdfs:label)	Definition	Expected value	Value syntax	Example	Section	minas	Preferred unit	Occurence	MIXS ID	Modification Suggestion	Requires Further Discussion	Reason for further discussion
samp_name	sample name	A local identifier or name that for the material sample used for extracting nucleic acids, and subsequent sequencing. It can refer either to the original material collected or to any derived sub-samples. It can have any format, but we suggest that you make it concise, unique and consistent within your lab, and as informative as possible. INSDC requires every sample name from a single Submitter to be unique. Use of a globally unique identifier for the field source_mat_id is recommended in addition to sample_name.	text	{text}	ISDsoil1	investigation	M	nan	1	MIXS:0001107	Definition update: clarify it's a DNA lab code (museum ID should go in source_mat_id)	N	nan
samp_taxon_id	taxonomy ID of DNA sample	NCBI taxon id of the sample. Maybe be a single taxon or mixed taxa sample. Use 'synthetic metagenome’ for mock community/positive controls, or 'blank sample' for negative controls.	Taxonomy ID	{text} [NCBI:txid]	Gut Metagenome [NCBI:txid749906]	investigation	M	nan	1	MIXS:0001320	nan	N	nan
project_name	project name	Name of the project within which the sequencing was organized	nan	{text}	Forest soil metagenome	investigation	M	nan	1	MIXS:0000092	nan	N	nan
lat_lon	geographic location (latitude and longitude)	The geographical origin of the sample as defined by latitude and longitude. The values should be reported in decimal degrees and in WGS84 system	decimal degrees, limit to 8 decimal points	{float} {float}	50.586825 6.408977	environment	C	nan	1	MIXS:0000009	None - but should be discussed for cases when imprecision may be perferred (e.g. to prevent looting of the site - i.e. is there a minimum level required?) with GSC to find a good solution in cases	Y	What level of imprecision is allowed by GSC?
depth	depth	The vertical distance below local surface, e.g. for sediment or soil samples depth is measured from sediment or soil surface, respectively. Depth can be reported as an interval for subsurface samples.	measurement value	{float} {unit}	10 meter	environment	E	meter	1	MIXS:0000018	Environment-dependent, i.e. minas-environmental	N	nan
elev	elevation	Elevation of the sampling site is its height above a fixed reference point, most commonly the mean sea level. Elevation is mainly used when referring to points on the earth's surface, while altitude is used for points above the surface, such as an aircraft in flight or a spacecraft in orbit.	measurement value	{float} {unit}	100 meter	environment	X	nan	1	MIXS:0000093	nan	N	nan
temp	temperature	Temperature of the sample at the time of sampling.	measurement value	{float} {unit}	25 degree Celsius	environment	X	degree Celsius	1	MIXS:0000113	Definition update: For ancient samples, this can be temperature of marine sediments, burial environment, cave atmosphere	N	nan
geo_loc_name	geographic location (country and/or sea,region)	The geographical origin of the sample as defined by the country or sea name followed by specific region name. Country or sea names should be chosen from the INSDC country list (http://insdc.org/country.html), or the GAZ ontology (http://purl.bioontology.org/ontology/GAZ)	country or sea name (INSDC or GAZ): region(GAZ), specific location name	{term}: {term}, {text}	USA: Maryland, Bethesda	environment	M	nan	1	MIXS:0000010	Definition update: in cases of ancient locations, use the name of the present day county that the location is based in.	N	nan
collection_date	collection date	The time of sampling, either as an instance (single point in time) or interval. In case no exact time is available, the date/time can be right truncated i.e. all of these are valid times: 2008-01-23T19:23:10+00:00; 2008-01-23T19:23:10; 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliant	date and time	{timestamp}	2018-05-11T10:00:00+01:00; 2018-05-11	environment	C	nan	1	MIXS:0000011	Definition update: For ancient samples, the date of the drilling or subsampling from the main specimen, the sub-sample of which is used for DNA extraction.	N	nan
neg_cont_type	negative control type	The substance or equipment used as a negative control in an investigation	enumeration or text	[distilled water\|phosphate buffer\|empty collection device\|empty collection tube\|DNA-free PCR mix\|sterile swab \|sterile syringe]	nan	investigation	C	nan	1	MIXS:0001321	nan	N	nan
env_broad_scale	broad-scale environmental context	Report the major environmental system the sample or specimen came from. The system(s) identified should have a coarse spatial grain, to provide the general environmental context of where the sampling was done (e.g. in the desert or a rainforest). We recommend using subclasses of EnvO’s biome class: http://purl.obolibrary.org/obo/ENVO_00000428. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS	The major environment type(s) where the sample was collected. Recommend subclasses of biome [ENVO:00000428]. Multiple terms can be separated by one or more pipes.	{termLabel} {[termID]}	oceanic epipelagic zone biome [ENVO:01000033] for annotating a water sample from the photic zone in middle of the Atlantic Ocean	environment	M	nan	1	MIXS:0000012	Definition update: ENVO terms should be reported.	Y	If proposed as mandatory, what should be done with museum accessions that have limited provenance (e.g. country and date)
env_local_scale	local environmental context	Report the entity or entities which are in the sample or specimen’s local vicinity and which you believe have significant causal influences on your sample or specimen. We recommend using EnvO terms which are of smaller spatial grain than your entry for env_broad_scale. Terms, such as anatomical sites, from other OBO Library ontologies which interoperate with EnvO (e.g. UBERON) are accepted in this field. EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS.	Environmental entities having causal influences upon the entity at time of sampling.	{termLabel} {[termID]}	litter layer [ENVO:01000338]; Annotating a pooled sample taken from various vegetation layers in a forest consider: canopy [ENVO:00000047]\|herb and fern layer [ENVO:01000337]\|litter layer [ENVO:01000338]\|understory [01000335]\|shrub layer [ENVO:01000336].	environment	M	nan	1	MIXS:0000013	Definition update: ENVO terms should be reported	Y	If proposed as mandatory, what should be done with museum accessions that have limited provenance (e.g. country and date)
env_medium	environmental medium	Report the environmental material(s) immediately surrounding the sample or specimen at the time of sampling. We recommend using subclasses of 'environmental material' (http://purl.obolibrary.org/obo/ENVO_00010483). EnvO documentation about how to use the field: https://github.com/EnvironmentOntology/envo/wiki/Using-ENVO-with-MIxS . Terms from other OBO ontologies are permissible as long as they reference mass/volume nouns (e.g. air, water, blood) and not discrete, countable entities (e.g. a tree, a leaf, a table top).	The material displaced by the entity at time of sampling. Recommend subclasses of environmental material [ENVO:00010483].	{termLabel} {[termID]}	soil [ENVO:00001998]; Annotating a fish swimming in the upper 100 m of the Atlantic Ocean, consider: ocean water [ENVO:00002151]. Example: Annotating a duck on a pond consider: pond water [ENVO:00002228]\|air [ENVO_00002005]	environment	M	nan	1	MIXS:0000014	Definition update: ENVO terms should be reported	Y	If proposed as mandatory, what should be done with museum accessions that have limited provenance (e.g. country and date)
subspecf_gen_lin	subspecific genetic lineage	Information about the genetic distinctness of the sequenced organism below the subspecies level, e.g., serovar, serotype, biotype, ecotype, or any relevant genetic typing schemes like Group I plasmid. Subspecies should not be recorded in this term, but in the NCBI taxonomy. Supply both the lineage name and the lineage rank separated by a colon, e.g., biovar:abc123.	Genetic lineage below lowest rank of NCBI taxonomy, which is subspecies, e.g. serovar, biotype, ecotype, variety, cultivar.	{rank name}:{text}	serovar:Newport	nucleic acid sequence source	X	nan	1	MIXS:0000020	nan	N	nan
ploidy	ploidy	The ploidy level of the genome (e.g. allopolyploid, haploid, diploid, triploid, tetraploid). It has implications for the downstream study of duplicated gene and regions of the genomes (and perhaps for difficulties in assembly). For terms, please select terms listed under class ploidy (PATO:001374) of Phenotypic Quality Ontology (PATO), and for a browser of PATO (v 2018-03-27) please refer to http://purl.bioontology.org/ontology/PATO	PATO	{termLabel} {[termID]}	allopolyploidy [PATO:0001379]	nucleic acid sequence source	X	nan	1	MIXS:0000021	nan	N	nan
num_replicons	number of replicons	Reports the number of replicons in a nuclear genome of eukaryotes, in the genome of a bacterium or archaea or the number of segments in a segmented virus. Always applied to the haploid chromosome count of a eukaryote	for eukaryotes and bacteria: chromosomes (haploid count); for viruses: segments	{integer}	2	nucleic acid sequence source	X	nan	1	MIXS:0000022	nan	N	nan
extrachrom_elements	extrachromosomal elements	Do plasmids exist of significant phenotypic consequence (e.g. ones that determine virulence or antibiotic resistance). Megaplasmids? Other plasmids (borrelia has 15+ plasmids)	number of extrachromosmal elements	{integer}	5	nucleic acid sequence source	X	nan	1	MIXS:0000023	nan	N	nan
ref_biomaterial	reference for biomaterial	Primary publication if isolated before genome publication; otherwise, primary genome report.	PMID, DOI or URL	{PMID}\|{DOI}\|{URL}	doi:10.1016/j.syapm.2018.01.009	nucleic acid sequence source	X	nan	1	MIXS:0000025	nan	N	nan
source_mat_id	source material identifiers	A unique identifier assigned to a material sample (as defined by http://rs.tdwg.org/dwc/terms/materialSampleID, and as opposed to a particular digital record of a material sample) used for extracting nucleic acids, and subsequent sequencing. The identifier can refer either to the original material collected or to any derived sub-samples. The INSDC qualifiers /specimen_voucher, /bio_material, or /culture_collection may or may not share the same value as the source_mat_id field. For instance, the /specimen_voucher qualifier and source_mat_id may both contain 'UAM:Herps:14' , referring to both the specimen voucher and sampled tissue with the same identifier. However, the /culture_collection qualifier may refer to a value from an initial culture (e.g. ATCC:11775) while source_mat_id would refer to an identifier from some derived culture from which the nucleic acids were extracted (e.g. xatc123 or ark:/2154/R2).	for cultures of microorganisms: identifiers for two culture collections; for other material a unique arbitrary identifer	{text}	MPI012345	nucleic acid sequence source	M	nan	m	MIXS:0000026	Definition update: For ancient samples, this is for example where archaeological/museum collection ID goes. May be duplicate with samp_name.	N	nan
specific_host	host scientific name	Report the host's taxonomic name and/or NCBI taxonomy ID.	host scientific name, taxonomy ID	{text}\|{NCBI taxid}	Homo sapiens and/or 9606	nucleic acid sequence source	C	nan	1	MIXS:0000029	Condition: for host(-associated) samples only	N	nan
host_disease_stat	host disease status	List of diseases with which the host has been diagnosed; can include multiple diagnoses. The value of the field depends on host; for humans the terms should be chosen from the DO (Human Disease Ontology) at https://www.disease-ontology.org, non-human host diseases are free text	disease name or Disease Ontology term	{termLabel} {[termID]}\|{text}	rabies [DOID:11260]	nucleic acid sequence source	X	nan	m	MIXS:0000031	nan	N	nan
samp_collec_method	sample collection method	The method employed for collecting the sample.	PMID,DOI,url , or text	{PMID}\|{DOI}\|{URL}\|{text}	swabbing	nucleic acid sequence source	M	nan	1	MIXS:0001225	Have 'novel' as a selection option.	Y	[JAFY] Doesn't 'free text' count for this
samp_mat_process	sample material processing	A brief description of any processing applied to the sample during or after retrieving the sample from environment, or a link to the relevant protocol(s) performed.	text	{text}	filtering of seawater, storing samples in ethanol	nucleic acid sequence source	???	nan	1	MIXS:0000016	nan	N	nan
size_frac	size fraction selected	Filtering pore size used in sample preparation	filter size value range	{float}-{float} {unit}	0-0.22 micrometer	nucleic acid sequence source	???	nan	1	MIXS:0000017	nan	N	nan
samp_size	amount or size of sample collected	The total amount or size (volume (ml), mass (g) or area (m2) ) of sample collected.	measurement value	{float} {unit}	5 liter	nucleic acid sequence source	???	millliter, gram, milligram, liter	1	MIXS:0000001	nan	N	nan
samp_vol_we_dna_ext	sample volume or weight for DNA extraction	Volume (ml) or mass (g) of total collected sample processed for DNA extraction. Note: total sample collected should be entered under the term Sample Size (MIXS:0000001).	measurement value	{float} {unit}	1500 milliliter	nucleic acid sequence source	M	millliter, gram, milligram, square centimeter	1	MIXS:0000111	nan	N	nan
virus_enrich_appr	virus enrichment approach	List of approaches used to enrich the sample for viruses, if any	enumeration	[filtration\|ultrafiltration\|centrifugation\|ultracentrifugation\|PEG Precipitation\|FeCl Precipitation\|CsCl density gradient\|DNAse\|RNAse\|targeted sequence capture\|other\|none]	filtration + FeCl Precipitation + ultracentrifugation + DNAse	nucleic acid sequence source	???	nan	1	MIXS:0000036	nan	N	nan
nucl_acid_ext	nucleic acid extraction	A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the material separation to recover the nucleic acid fraction from a sample	PMID, DOI or URL	{PMID}\|{DOI}\|{URL}	https://mobio.com/media/wysiwyg/pdfs/protocols/12888.pdf	sequencing	M	nan	1	MIXS:0000037	Have 'novel' as a selection option.	Y	[JAFY] Wouldn't you cite your own paper if it's novel? I think it makes more sense to keep it as C as this is the same for all other MIXS checklists
nucl_acid_amp	nucleic acid amplification	A link to a literature reference, electronic resource or a standard operating procedure (SOP), that describes the enzymatic amplification (PCR, TMA, NASBA) of specific nucleic acids	PMID, DOI or URL	{PMID}\|{DOI}\|{URL}	https://phylogenomics.me/protocols/16s-pcr-protocol/	sequencing	???	nan	1	MIXS:0000038	Have 'novel' as a selection option.	Y	[JAFY] Wouldn't you cite your own paper if it's novel? I think it makes more sense to keep it as C as this is the same for all other MIXS checklists
lib_reads_seqd	library reads sequenced	Total number of clones sequenced from the library	number of reads sequenced	{integer}	20	sequencing	???	nan	1	MIXS:0000040	Definition update: Total number of reads sequenced from the library	N	nan
lib_layout	library layout	Specify whether to expect single, paired, or other configuration of reads	enumeration	[paired\|single\|vector\|other]	paired	sequencing	M	nan	1	MIXS:0000041	nan	N	nan
lib_screen	library screening strategy	Specific enrichment or screening methods applied before and/or after creating libraries	screening strategy name	{text}	enriched, screened, normalized	sequencing	???	nan	1	MIXS:0000043	Definition update: Suggest splitting screening and enrichment, as these mean different things (at least in aDNA)	N	nan
target_gene	target gene	Targeted gene or locus name for marker gene studies	gene name	{text}	16S rRNA, 18S rRNA, nif, amoA, rpo	sequencing	X	nan	1	MIXS:0000044	nan	N	nan
target_subfragment	target subfragment	Name of subfragment of a gene or locus. Important to e.g. identify special regions on marker genes like V6 on 16S rRNA	gene fragment name	{text}	V6, V9, ITS	sequencing	X	nan	1	MIXS:0000045	nan	N	nan
pcr_primers	pcr primers	PCR primers that were used to amplify the sequence of the targeted gene, locus or subfragment. This field should contain all the primers used for a single PCR reaction if multiple forward or reverse primers are present in a single PCR reaction. The primer sequence should be reported in uppercase letters	FWD: forward primer sequence;REV:reverse primer sequence	FWD:{dna};REV:{dna}	FWD:GTGCCAGCMGCCGCGGTAA;REV:GGACTACHVGGGTWTCTAAT	sequencing	X	nan	1	MIXS:0000046	Definition update: Add in 5' to 3' orientation.	N	nan
mid	multiplex identifiers	Molecular barcodes, called Multiplex Identifiers (MIDs), that are used to specifically tag unique samples in a sequencing run. Sequence should be reported in uppercase letters	multiplex identifier sequence	{dna}	GTGAATAT	sequencing	???	nan	1	MIXS:0000047	Definition update: Specify if single or dual tagged/indexed (following similar structure as adapters?)	N	nan
adapters	adapters	Adapters provide priming sequences for both amplification and sequencing of the sample-library fragments. Both adapters should be reported; in uppercase letters	adapter A and B sequence	{dna};{dna}	AATGATACGGCGACCACCGAGATCTACACGCT;CAAGCAGAAGACGGCATACGAGAT	sequencing	???	nan	1	MIXS:0000048	nan	N	nan
pcr_cond	pcr conditions	Description of reaction conditions and components of PCR in the form of 'initial denaturation:94degC_1.5min; annealing=...'	initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cycles	initial denaturation:degrees_minutes;annealing:degrees_minutes;elongation:degrees_minutes;final elongation:degrees_minutes;total cycles	initial denaturation:94_3;annealing:50_1;elongation:72_1.5;final elongation:72_10;35	sequencing	???	nan	1	MIXS:0000049	Definition update: Targeted PCR or library PCR?; add a DOI requirement	N	nan
seq_meth	sequencing method	Sequencing machine used. Where possible the term should be taken from the OBI list of DNA sequencers (http://purl.obolibrary.org/obo/OBI_0400103).	Text or OBI	{termLabel} {[termID]}\|{text}	454 Genome Sequencer FLX [OBI:0000702]	sequencing	M	nan	1	MIXS:0000050	nan	N	nan
chimera_check	chimera check software	Tool(s) used for chimera checking, including version number and parameters, to discover and remove chimeric sequences. A chimeric sequence is comprised of two or more phylogenetically distinct parent sequences.	name and version of software, parameters used	{software};{version};{parameters}	uchime;v4.1;default parameters	sequencing	C	nan	1	MIXS:0000052	nan	N	nan
tax_ident	taxonomic identity marker	The phylogenetic marker(s) used to assign an organism name to the SAG or MAG	enumeration	[16S rRNA gene\|multi-marker approach\|other]	other: rpoB gene	sequencing	C	nan	1	MIXS:0000053	nan	N	nan
assembly_qual	assembly quality	The assembly quality category is based on sets of criteria outlined for each assembly quality category. For MISAG/MIMAG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities with a consensus error rate equivalent to Q50 or better. High Quality Draft:Multiple fragments where gaps span repetitive regions. Presence of the 23S, 16S and 5S rRNA genes and at least 18 tRNAs. Medium Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Low Quality Draft:Many fragments with little to no review of assembly other than reporting of standard assembly statistics. Assembly statistics include, but are not limited to total assembly size, number of contigs, contig N50/L50, and maximum contig length. For MIUVIG; Finished: Single, validated, contiguous sequence per replicon without gaps or ambiguities, with extensive manual review and editing to annotate putative gene functions and transcriptional units. High-quality draft genome: One or multiple fragments, totaling ≥ 90% of the expected genome or replicon sequence or predicted complete. Genome fragment(s): One or multiple fragments, totalling < 90% of the expected genome or replicon sequence, or for which no genome size could be estimated	enumeration	[Finished genome\|High-quality draft genome\|Medium-quality draft genome\|Low-quality draft genome\|Genome fragment(s)]	High-quality draft genome	sequencing	C	nan	1	MIXS:0000056	Condition: Conditional on assembly performed - which type of assembly applies here?	N	nan
assembly_name	assembly name	Name/version of the assembly provided by the submitter that is used in the genome browsers and in the community	name and version of assembly	{text} {text}	HuRef, JCVI_ISG_i3_1.0	sequencing	C	nan	1	MIXS:0000057	Condition: Conditional on assembly performed - which type of assembly applies here?	N	nan
assembly_software	assembly software	Tool(s) used for assembly, including version number and parameters	name and version of software, parameters used	{software};{version};{parameters}	metaSPAdes;3.11.0;kmer set 21,33,55,77,99,121, default parameters otherwise	sequencing	C	nan	1	MIXS:0000058	Condition: Conditional on assembly performed - which type of assembly applies here?	N	nan
annot	annotation	Tool used for annotation, or for cases where annotation was provided by a community jamboree or model organism database rather than by a specific submitter	name of tool or pipeline used, or annotation source description	{text}	prokka	sequencing	X	nan	1	MIXS:0000059	nan	N	nan
number_contig	number of contigs	Total number of contigs in the cleaned/submitted assembly that makes up a given genome, SAG, MAG, or UViG	value	{integer}	40	sequencing	X	nan	1	MIXS:0000060	nan	N	nan
feat_pred	feature prediction	Method used to predict UViGs features such as ORFs, integration site, etc.	names and versions of software(s), parameters used	{software};{version};{parameters}	Prodigal;2.6.3;default parameters	sequencing	X	nan	1	MIXS:0000061	nan	N	nan
ref_db	reference database(s)	List of database(s) used for ORF annotation, along with version number and reference to website or publication	names, versions, and references of databases	{database};{version};{reference}	pVOGs;5;http://dmk-brain.ecn.uiowa.edu/pVOGs/ Grazziotin et al. 2017 doi:10.1093/nar/gkw975	sequencing	X	nan	1	MIXS:0000062	nan	N	nan
sim_search_meth	similarity search method	Tool used to compare ORFs with database, along with version and cutoffs used	names and versions of software(s), parameters used	{software};{version};{parameters}	HMMER3;3.1b2;hmmsearch, cutoff of 50 on score	sequencing	X	nan	1	MIXS:0000063	nan	N	nan
tax_class	taxonomic classification	Method used for taxonomic classification, along with reference database used, classification rank, and thresholds used to classify new genomes	classification method, database name, and other parameters	{text}	vConTACT vContact2 (references from NCBI RefSeq v83, genus rank classification, default parameters)	sequencing	X	nan	1	MIXS:0000064	nan	N	nan
16s_recover	16S recovered	Can a 16S gene be recovered from the submitted SAG or MAG?	boolean	{boolean}	yes	sequencing	X	nan	1	MIXS:0000065	nan	N	nan
16s_recover_software	16S recovery software	Tools used for 16S rRNA gene extraction	names and versions of software(s), parameters used	{software};{version};{parameters}	rambl;v2;default parameters	sequencing	X	nan	1	MIXS:0000066	nan	N	nan
trnas	number of standard tRNAs extracted	The total number of tRNAs identified from the SAG or MAG	value from 0-21	{integer}	18	sequencing	X	nan	1	MIXS:0000067	nan	N	nan
trna_ext_software	tRNA extraction software	Tools used for tRNA identification	names and versions of software(s), parameters used	{software};{version};{parameters}	infernal;v2;default parameters	sequencing	X	nan	1	MIXS:0000068	nan	N	nan
compl_score	completeness score	Completeness score is typically based on either the fraction of markers found as compared to a database or the percent of a genome found as compared to a closely related reference genome. High Quality Draft: >90%, Medium Quality Draft: >50%, and Low Quality Draft: < 50% should have the indicated completeness scores	quality;percent completeness	[high\|med\|low];{percentage}	med;60%	sequencing	C	nan	1	MIXS:0000069	Condition: Conditional on assembly performed	N	nan
compl_software	completeness software	Tools used for completion estimate, i.e. checkm, anvi'o, busco	names and versions of software(s) used	{software};{version}	checkm	sequencing	C	nan	1	MIXS:0000070	Condition: Conditional on assembly performed	N	nan
compl_appr	completeness approach	The approach used to determine the completeness of a given genomic assembly, which would typically make use of a set of conserved marker genes or a closely related reference genome. For UViG completeness, include reference genome or group used, and contig feature suggesting a complete genome	text	[marker gene\|reference based\|other]	other: UViG length compared to the average length of reference genomes from the P22virus genus (NCBI RefSeq v83)	sequencing	C	nan	1	MIXS:0000071	Condition: Conditional on assembly performed	N	nan
contam_score	contamination score	The contamination score is based on the fraction of single-copy genes that are observed more than once in a query genome. The following scores are acceptable for; High Quality Draft: < 5%, Medium Quality Draft: < 10%, Low Quality Draft: < 10%. Contamination must be below 5% for a SAG or MAG to be deposited into any of the public databases	value	{float} percentage	1%	sequencing	C	nan	1	MIXS:0000072	Condition: Conditional on contamination estimation	N	nan
contam_screen_input	contamination screening input	The type of sequence data used as input	enumeration	[reads\| contigs]	contigs	sequencing	X	nan	1	MIXS:0000005	nan	N	nan
contam_screen_param	contamination screening parameters	Specific parameters used in the decontamination sofware, such as reference database, coverage, and kmers. Combinations of these parameters may also be used, i.e. kmer and coverage, or reference database and kmer	enumeration;value or name	[ref db\|kmer\|coverage\|combination];{text\|integer}	kmer	sequencing	X	nan	1	MIXS:0000073	nan	N	nan
decontam_software	decontamination software	Tool(s) used in contamination screening	enumeration	[checkm/refinem\|anvi'o\|prodege\|bbtools:decontaminate.sh\|acdc\|combination]	anvi'o	sequencing	X	nan	1	MIXS:0000074	nan	N	nan
sort_tech	sorting technology	Method used to sort/isolate cells or particles of interest	enumeration	[flow cytometric cell sorting\|microfluidics\|lazer-tweezing\|optical manipulation\|micromanipulation\|other]	optical manipulation	sequencing	C	nan	1	MIXS:0000075	nan	N	nan
single_cell_lysis_appr	single cell or viral particle lysis approach	Method used to free DNA from interior of the cell(s) or particle(s)	enumeration	[chemical\|enzymatic\|physical\|combination]	enzymatic	sequencing	C	nan	1	MIXS:0000076	Use a non-single cell version?; DOI/PMID for custom protocols	Y	[JAFY] Isn't this mutually exclusive
single_cell_lysis_prot	single cell or viral particle lysis kit protocol	Name of the kit or standard protocol used for cell(s) or particle(s) lysis	kit, protocol name	{text}	ambion single cell lysis kit	sequencing	C	nan	1	MIXS:0000054	Use a non-single cell version?	Y	[JAFY] Isn't this mutually exclusive
wga_amp_appr	WGA amplification approach	Method used to amplify genomic DNA in preparation for sequencing	enumeration	[pcr based\|mda based]	mda based	sequencing	C	nan	1	MIXS:0000055	Use a non-single cell version?; DOI/PMID for custom protocols	Y	[JAFY] Isn't this mutually exclusive
wga_amp_kit	WGA amplification kit	Kit used to amplify genomic DNA in preparation for sequencing	kit name	{text}	qiagen repli-g	sequencing	C	nan	1	MIXS:0000006	Use a non-single cell version?	Y	[JAFY] Isn't this mutually exclusive
bin_param	binning parameters	The parameters that have been applied during the extraction of genomes from metagenomic datasets	enumeration	[homology search\|kmer\|coverage\|codon usage\|combination]	coverage and kmer	sequencing	C	nan	1	MIXS:0000077	nan	N	nan
bin_software	binning software	Tool(s) used for the extraction of genomes from metagenomic datasets, where possible include a product ID (PID) of the tool(s) used.	names and versions of software(s) used	{software};{version}{PID}	MetaCluster-TA (RRID:SCR_004599), MaxBin (biotools:maxbin)	sequencing	C	nan	1	MIXS:0000078	nan	N	nan
reassembly_bin	reassembly post binning	Has an assembly been performed on a genome bin extracted from a metagenomic assembly?	boolean	{boolean}	no	sequencing	C	nan	1	MIXS:0000079	nan	N	nan
mag_cov_software	MAG coverage software	Tool(s) used to determine the genome coverage if coverage is used as a binning parameter in the extraction of genomes from metagenomic datasets	enumeration	[bwa\|bbmap\|bowtie\|other]	bbmap	sequencing	C	nan	1	MIXS:0000080	nan	N	nan
vir_ident_software	viral identification software	Tool(s) used for the identification of UViG as a viral genome, software or protocol name including version number, parameters, and cutoffs used	software name, version and relevant parameters	{software};{version};{parameters}	VirSorter; 1.0.4; Virome database, category 2	sequencing	C	nan	1	MIXS:0000081	nan	N	nan
pred_genome_type	predicted genome type	Type of genome predicted for the UViG	enumeration	[DNA\|dsDNA\|ssDNA\|RNA\|dsRNA\|ssRNA\|ssRNA (+)\|ssRNA (-)\|mixed\|uncharacterized]	dsDNA	sequencing	C	nan	1	MIXS:0000082	nan	N	nan
pred_genome_struc	predicted genome structure	Expected structure of the viral genome	enumeration	[segmented\|non-segmented\|undetermined]	non-segmented	sequencing	C	nan	1	MIXS:0000083	Condition: Ancient virus genome only	N	nan
detec_type	detection type	Type of UViG detection	enumeration	[independent sequence (UViG)\|provirus (UpViG)]	independent sequence (UViG)	sequencing	C	nan	1	MIXS:0000084	Condition: Ancient virus genome only	N	nan
otu_class_appr	OTU classification approach	Cutoffs and approach used when clustering “species-level” OTUs. Note that results from standard 95% ANI / 85% AF clustering should be provided alongside OTUS defined from another set of thresholds, even if the latter are the ones primarily used during the analysis	cutoffs and method used	{ANI cutoff};{AF cutoff};{clustering method}	95% ANI;85% AF; greedy incremental clustering	sequencing	C	nan	1	MIXS:0000085	Condition: Ancient virus genome only	N	nan
otu_seq_comp_appr	OTU sequence comparison approach	Tool and thresholds used to compare sequences when computing "species-level" OTUs	software name, version and relevant parameters	{software};{version};{parameters}	blastn;2.6.0+;e-value cutoff: 0.001	sequencing	C	nan	1	MIXS:0000086	Condition: Ancient virus genome only	N	nan
otu_db	OTU database	Reference database (i.e. sequences not generated as part of the current study) used to cluster new genomes in "species-level" OTUs, if any	database and version	{database};{version}	NCBI Viral RefSeq;83	sequencing	C	nan	1	MIXS:0000087	Condition: Ancient virus genome only	N	nan
host_pred_appr	host prediction approach	Tool or approach used for host prediction	enumeration	[provirus\|host sequence similarity\|CRISPR spacer match\|kmer similarity\|co-occurrence\|combination\|other]	CRISPR spacer match	sequencing	C	nan	1	MIXS:0000088	Condition: Ancient virus genome only	N	nan
host_pred_est_acc	host prediction estimated accuracy	For each tool or approach used for host prediction, estimated false discovery rates should be included, either computed de novo or from the literature	false discovery rate	{text}	CRISPR spacer match: 0 or 1 mismatches, estimated 8% FDR at the host genus rank (Edwards et al. 2016 doi:10.1093/femsre/fuv048)	sequencing	C	nan	1	MIXS:0000089	Condition: Ancient virus genome only	N	nan
associated resource	relevant electronic resources	A related resource that is referenced, cited, or otherwise associated to the sequence.	reference to resource	{PMID} \| {DOI} \| {URL}	http://www.earthmicrobiome.org/	sequencing	C	nan	m	MIXS:0000091	nan	N	nan
sop	relevant standard operating procedures	Standard operating procedures used in assembly and/or annotation of genomes, metagenomes or environmental sequences	reference to SOP	{PMID}\|{DOI}\|{URL}	http://press.igsb.anl.gov/earthmicrobiome/protocols-and-standards/its/	sequencing	C	nan	m	MIXS:0000090	nan	N	nan
cultural_era	nan	The cultural era approximating the period in which the individual lived from https://chronontology.dainst.org/ or PeriodO. Specify when no value for 'sample_age' is present, or additionally to it.	Chronotology or PeriodO term; text	{termLabel} {[termID]}\|{text}	Copper Age [Chronotology: NW6hofAScJSE]	investigation	C	nan	m	MIXS:XXXXXXX	Question: switch to free-text? Conditional to sample_age	Y	nan
dna_extraction_date	nan	The date when the nucleic acids was extracted from the sample material. In case no exact time is available, the date can be right truncated i.e. all of these are valid times: 2008-01-23T19:23:10+00:00; 2008-01-23T19:23:10; 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliant	date	{timestamp}	2015	nucleic acid sequence source	M	year	1	MIXS:XXXXXXX	nan	N	nan
recovery_date	nan	Date of excavation or retrieval from burial or depositional context, if known	date	{timestamp}	1930	investigation	X	date	1	MIXS:XXXXXXX	nan	N	nan
sample_age	nan	The approximate date that individual was living and then died, or the sample was exposed to the surface. Typically inferred from archaeological material, or biological material associated with a sediment, with radiocarbon dating and other chronometric methods. Should be midpoint of calibrated radiocarbon age.	value from BP (1950)	{integer}	123440	investigation	C	cal BP	m	MIXS:XXXXXXX	nan	N	nan
sample_age_inference_methods	nan	The method used to infer the sample age. Method (14C, OLS, etc.) and associated information (lab code, etc). An enumerated list with age/depths model, corralative (relative dating), annual lamination, pollen records, diatom records, physical observation, etc.	enumeration	???	C14	investigation	C	nan	m	MIXS:XXXXXXX	nan	Y	[JAFY/MS] How to define, very heterogenous, maybe split: method/lab code
site_type	nan	The type of site the sediment cores where taken. E.g. ocean, marine, freshwater, brackish, ice caves, caves.	text	{text}	cave	environment	E	nan	1	MIXS:XXXXXXX	nan	Y	[JAFY] can this be expanded to non-seidment like sites? Open air burials? Is there an overlap here with env_*_scale? Otherwise I think this would go to the environmental_packages
damage_treatment	nan	Indication of whether characteristic ancient DNA damage has been removed in a laboratory	enumeration	[none\|partial-udg\|full-udg\|enriched\|other]	none	sequencing	M	nan	1	MIXS:XXXXXXX	nan	N	[JAFY] what if people upload BAMs of merged multiple libraries
experimental_procedures	nan	Provide a DOI to refer to the paper where the procedure is explained in more details	PMID, DOI or URL	{PMID}\|{DOI}\|{URL}	nan	sequencing	X	nan	1	MIXS:XXXXXXX	nan	Y	[JAFY] What procedures does this refer to? Each paper will have many protocols to cite for extraction, library construction, reconditioning, etc. Duplicate of SOP?
lib_concentration	nan	Concentration of library in copies per µl, as inferred by qPCR.	integer	{integer}	123000000	sequencing	X	copies/µl	1	MIXS:XXXXXXX	nan	N	nan
lib_index_polymerase	nan	The name of polymerase enzyme used to index DNA libraries	text	{text}	Agilent PfuTurbo Cx HotStart	sequencing	C	nan	1	MIXS:XXXXXXX	nan	Y	[JAFY] Condition on what? And would including the SKU be useful to include too (as it's a more stable code)
lib_preparation_protocol	nan	Citation(s) for the DNA library preparation protocol	text	{text}	Meyer and Kircher, 2010	sequencing	C	nan	1	MIXS:XXXXXXX	nan	Y	[JAFY] Condition on what?
lib_reamplification_polymerase	nan	The name of polymerase enzyme used for reamplifying DNA libraries	text	{text}	KAPA HiFi HotStart Uracil+	sequencing	C	nan	1	MIXS:XXXXXXX	nan	Y	[JAFY] Condition on what?
lib_type	library type	The type of library created. Amplicon based or non-amplicon based. Amplicon based, is a library that result in rather short DNA fragments while non-amplicon-based referres to non targeted approach.	enumeration	[shotgun\|amplicon]	shotgun	sequencing	M	nan	1	MIXS:XXXXXXX	nan	Y	[JAFY] I'm not sure about the definition of amplicon here, it could be confusing - 'natural' aDNA is normally short, maybe need to rather change the definition to specify using primers targeting specific regions of the genome instead
neg_cont_status	negative control status	Specify whether the sample is a negative control or not.	negative control status	{boolean}	nan	investigation	C	nan	1	MIXS:XXXXXXX	nan	Y	[JAFY] Wouldn't this be mandatory if it's boolean? A sample is either a negative control or not ... Would this also potentially not apply to all other tables? [AFG/BM] The problem with negative controls is that they come from diff batches so becomes hard to control. (M). Issue: in paleometagenome less negative controls are used as there is a more reliance on damage patterns and this is not the case for sedaDNA where negative controls should be mandatory.
num_capture_reamp_cycles	nan	Number of amplification cycles after capture enrichment	number of amplification cycles	{integer}	10	sequencing	C	nan	1	MIXS:XXXXXXX	Condition: if performed	N	nan
num_reamp_cycles	number of reamplification cycles	Number of amplification cycles after library indexing PCR	number of amplification cycles	{integer}	8	sequencing	C	nan	1	MIXS:XXXXXXX	Condition: if performed	N	nan
preservational_treatment	preservation treatment	Description of any treatment applied to samples for the purpose of maximising collection preservation that may influence downstream DNA recovery or library construction, such as storage fluid or reconstructive glue	text	{text}	stored in formalin	nucleic acid sequence source	C	nan	1	MIXS:XXXXXXX	Condition: if present	N	nan
sample_alt_lab_ids	alternative sample IDs	Any alternate sample IDs used in by the research group publishing the paper or other groups. If known.	text	{text}	ABC_24	investigation	X	nan	1	MIXS:XXXXXXX	nan	N	nan
samp_decontam_pretreat	sample decontamination pretreatment	Method(s) employed for surface decontamination of samples of external modern DNA; Treatment used on the samples. Depends on the sample type. More relevant for bones than environmental samples. E.g. buffers, EDTA, etc.	PMID, DOI or URL	{PMID}\|{DOI}\|{URL}\|{text}	EDTA wash, 10.17504/protocols.io.bidyka7w	sequencing	X	nan	m	MIXS:XXXXXXX	nan	N	nan
prev_pubs	previous publications	Any publications that report data from the same body/skeleton/individual	text	{DOI}{URL}{PMID}	10.1126/science.7761839	investigation	X	nan	m	MIXS:XXXXXXX	nan	Y	[AG/PH] Just DNA data, or also contextual/archaeolgoical publications?
collection_context_name	context name where location was collected	Name of where sample originated and is typically stored. Typically will be 'owning institution'	text	{text}	Natural History London	environment	X	nan	1	MIXS:XXXXXXX	nan	Y	[AG/PH] e.g. Museum, community, field (want to avoid assumption that insitutuions instead of communities control samples)
ethical_authority	ethical authority	Name of the authority or institution that awarded sampling and analysis (e.g. human remains) and/or export permission (e.g. animal remains)	text	{text}	Federal Foreign Office (Germany)	investigation	C	nan	m	MIXS:XXXXXXX	nan	Y	[JAFY] What is the condition on?
ethical_date	date of ethical approval date	Date of award of ethical/export permission. The date can be right truncated i.e. all of these are valid times: 2008-01-23; 2008-01; 2008; Except: 2008-01; 2008 all are ISO8601 compliant	date	{timestamp}	2018-05-11T10:00:00+01:00; 2018-05-11	investigation	C	nan	m	MIXS:XXXXXXX	nan	Y	[JAFY] What is the condition on?
ethical_id	ethical permit/approval ID	The permissions code or ID provided by the authority associated with approval of the analysis of this particular sample	text	{text}	DE-123-JK	investigation	C	nan	m	MIXS:XXXXXXX	nan	Y	[JAFY] What is the condition on?
storage_conditions	conditions of sample storage	General conditions in which the sample was stored in long-term collection storage, that may influenced DNA recovery or library construction. For example, specify temperature, humidity, presence of microbial overgrowth etc..	text	{text}	Climate-controlled	environment	X	nan	1	MIXS:XXXXXXX	nan	N	nan

Environmental package	Structured comment name	Package item	Definition	Expected value	Value syntax	Example	Requirement	Preferred unit	Occurrence	MIXS ID	Modification Suggestion	Requires Further Discussion	Reason for further discussion	Completed
host-associated	environment	burial_context	Description of the burial context from which sampled indivuals were recovered	enumeration	[primary inhumation\|secondary inhumation\|multiple inhumation\|commingled assemblage\|disartuclated remains\|information not available]	multiple inhumation	X	nan	1	MIXS:XXXXXXX	N	Y	Is there some form of ontology for such a thing?	nan
host-associated	environment	host_biological_sex	Biological sex of the host individual	enumeration	[male\|female\|other\|unknown]	female	X	nan	1	MIXS:XXXXXXX	N	N	nan	nan
human-oral	environment	samp_sample_site	Specific location in oral cavity	enumeration	{termLabel} {[termID]}	maxilla left 3rd molar [UBERON:0002535]	X	nan	m	MIXS:XXXXXXX	N	N	nan	nan
sediment	environment	sediment_type	The sediment type. E.g. silt, clay, organic, porous.	text	{text}	cave	X	nan	1	MIXS:XXXXXXX	N	Y	[PH/AI] Very important for molecular scientists to be able to distinguish between the different sediment types perhaps a blog post or workshop for the sedaDNA society from a sedimentologists	nan
sediment	environment	sedimentation_rate	The sedimentation rate calculated from age depths models.	value	{integer}	???	X	nan	???	MIXS:XXXXXXX	N	Y	[???] but with NA as its not always available.	nan
sediment	sampling	coring_system	Specify whether the sampling was with open or closed system. An open system is where no corers are used and the samples are exposed, e.g. in caves. A closed system is where a corer is used and the sampling procedure is ‘controlled’.	enumeration	[open\|closed]	closed	X	nan	1	MIXS:XXXXXXX	N	N	nan	nan
sediment	sampling	depths	The depths at which the sediments where taken from. This depends on the source of sediments. E.g. water depths.	value	{integer}	???	X	cm	1	MIXS:XXXXXXX	N	N	[???] Should influence microbial activity (M) But include NA in the options as not applicable for all samples.	nan
sediment	sampling	stratigraphic_horizon	The depths at which the samples from the sediments were taken from the core. E.g. from a one meter long core, the sample corresponds to depths 50 cm and from caves from the surface. (M)	value	{integer}	???	X	???	1	MIXS:XXXXXXX	N	Y	[JAFY] the description here seems to correspond to depths, I think this needs to be revised	nan