Welcome to GTTN-DR’s documentation!

Introduction

A collaboration between University of Connecticut’s Plant Computational Genomics lab and the Global Timber Tracking Network. Allows for secure and convenient submission of multiple tree descriptors through a web-based interface.

This module is a modified extension of the Tripal Plant PopGen submit Pipeline (TPPS), which can be found here: http://tpps.rtfd.io

This module is currently in demo form, meaning that the data is not submitted to the TreeGenes database, and there is an additional “results” page, which displays the data in short text when the user clicks “Submit”.

The first page of the module prompts the user for information about each species they are uploading data about, as well as a file with location information and unique identifiers for each tree.

The second page of the module asks for information about the sampling and analysis dates of each species, as well as any phenotypic or genotypic data files the users have.

Unlike TPPS, this form can only be accessed by members of the ‘gttn’ or ‘administrator’ groups on the TreeGenes site.

Overview

GTTN-TPPS is a data collection tool developed with the goal of collecting high-quality reference data for the purposes of timber tracking and identification. The module collects 4 different types of data:

_images/overview_diagram.png

Genotype and Phenotype Data

The core of the data that GTTN-TPPS will collect is the Genotype and Phenotype data. The genotype data might include data in the form of SNPs, genotyping assays, SSRs, etc. The phenotype data might include data in the form of DART data, wood anatomy, etc. DART data could include a series of files with peak data for various isotopes found in a DART scan. Wood anatomy data could include images of microscope slides, along with the specific anatomical features found within the slide in question. Several organizations have expressed interest in providing reference data to GTTN, including:

Georeferenced Accessions

When the Genotype and Phenotype Data are submitted through GTTN-TPPS, it will come along with georeferenced accessions of the trees which were sampled in order to obtain the data. Georeferenced Accessions will usually be submitted in the form of an excel table, mapping each tree identifier to a latitude/longitude coordinate. The georeferenced accessions will be integrated with the Genotype and Phenotype Data in the REF Database.

Method-Specific Metadata

In addition to Georeferenced Genotype and Phenotype Data, each GTTN-TPPS submission will also include metadata which is specific to the analysis method used to obtain the data. For example, if the submission includes DART data, then part of the metadata GTTN-TPPS will collect might be the settings of the DART machine used to obtain the data, or if the submission includes Genotyping by Sequencing data, then part of the metadata GTTN-TPPS will collect might include the type of Genotyping by Sequencing: ddRAD, RAD, NextRAD, etc. For more details, you can view this Metadata Document which was put together at the March 2019 GTTN workshop in Koli, Finland.

Data Access Options

The 4th type of data that will be collected in a GTTN-TPPS submission is the Data Access and Authorization Options. Here users will be allowed to select which organizations within the GTTN network are allowed to see the data being submitted, whether the data will be published to TreeGenes, etc.

Features

GTTN-TPPS has many features that make data collection easier for administrators. Here are a few notable ones:

Data Types and Standards

  • Support for genotype and phenotype data and metadata
  • Support for ontology standards, including the Minimum Information About a Plant Phenotyping Experiment (MIAPPE)
  • Support for standard genotyping file formats, such as .VCF
  • Automatically submits data according to the Tripal CHADO database schema

Data Accessibility

  • Data is standardized and stored in the local database so that other tools, for example, CartograTree, can easily collect and analyze it
  • Restricted access to users with the specific gttn user group.
  • The studies can be queried or downloaded (flatfiles) through the Tripal interface
  • Display both complete and incomplete submissions on ‘GTTN-TPPS Submissions’ user profile tab

User Friendliness

  • Map thumbnails for quick visual validation
  • Auto-complete appropriate fields based on information from the user profile
  • Load data from NCBI based on a provided BioProject accession number
  • Automatically parse file contents for submission to the CHADO schema
  • Save user progress on incomplete submissions
  • Form flexibility to ensure only the minimum necessary information is being required, but users may provide additional information if they choose

Administrative Features

  • Administrator panel to manually approve completed submissions
  • Configuration page to specify file upload locations, TPPS Admin email, etc.

User Information

Data Collection Pipeline

This section contains details on how to get access and use the GTTN-TPPS Data Collection Pipeline.

Creating an Account

Before we can start submitting data through the form, we must create an account so that GTTN-TPPS knows who is submitting data and which organization that data is coming from. To create an account, navigate to gttn.treegenesdb.org/user/register.

You will be asked to provide your full name, your email address, and indicate which organizations you are a part of:

_images/register.png

After you have provided all of the required information, an admin will need to approve your account and the primary contacts of each organization you indicated will need to verify that you are part of that organization. Once your account has been approved and verified, you should receive an email notification and you will be able to set a password and log in.

Once you are logged in, you will have access to a variety of new data, depending on which organizations you claimed membership. Data that is public to GTTN organizations, data that has been shared with organizations that you are a member of, and data that has been shared with your user roles will now become available.

User Roles

There are three default user roles that are important to understanding data access, and there are also some special roles that give users special permissions. We will discuss all of the roles currently available on gttn.treegenesdb.org here:

Default user roles:

  • Anonymous: This role is automatically assigned to anyone who is not yet logged in to the GTTN site. This will restrict data access to collections that have been marked as available to the public.
  • Authenticated: This role is assigned to anyone who is now logged in to the GTTN site. Having this role allows the user to see all of the same information as the “Anonymous” role, as well as collections that have been marked as available to the entire GTTN network, and collections that have been marked as available to organizations which the user is a part of.
  • Administrator: This role is only given to site administrators, usually limited to those who work on developing the code of the site. This role allows users to browse all data on the site, regardless of membership in an organization.

Custom user roles:

These roles are for users who serve a special purpose in the GTTN community, and usually result in access to additional data:

  • Law Enforcement: This role describes users who are verified law enforcement officers. Having this role allows the user to see all of the same information as the “Authenticated” role, as well as collections that have been marked as available only to law enforcement users.

Please note that the access that these roles provide is subject to change, and additional custom user roles are likely to be added in the future!

Landing Page and Submission Type

Now that you have created an account and have successfully logged in, you can start submitting data through the data collection pipeline! Navigate to gttn.treegenesdb.org/gttn-tpps in your browser and you will see the landing page. If this is your first time submitting data then your only option will be to create a new GTTN-TPPS Submission, but if you have saved incomplete submissions, you will be able to choose to load one of those submissions:

_images/landing.png

Click “Continue to GTTN-TPPS” and you will see the Submission Type page. This page collects some metadata about the data submission and the higher level project funding, where applicable:

_images/submission_type.png

Required fields will be marked with an asterisk. The Submission name under Project Basic Information is the name of the data collection that you will be submitting, for example if you were submitting a DART analysis, you might name your submission something like “DART Analysis 1”. The Project Name under the Project Background section is the name of the higher level project or the NSF grant number. This field is not necessary but can make your data easier to find in the future.

If you are a member of more than one organization, you will need to indicate which organization you are submitting this data for. This is important for keeping track of the sample inventory of each organization in the GTTN group.

GTTN needs to know the type of trees and samples you will be submitting - if they already exist in the GTTN database or if they are brand new. This information is collected through the “Submission Type” field.

You will then need to select which organizations are allowed to see the data in the “Data Permissions” field. If you opt not to select any organizations, then the data you provide through this submission will be visible only to you.

Species and Data Type Information

This page will allow you to specify the species associated with your data and the data types you will be submitting:

_images/species_information.png

You can add as many species to your submission as you would like. The species fields will autocomplete your entries to species that are already present in the database. If your species is not in the database, that’s ok! When you complete your submission a new species will be added.

You will then need to select all of the data types you will be submitting. Your options are Sample, DART, Isotope, Genetic, and Anatomy data. Your selection here will influence the fields you will see in later in the form.

Location Information

This page will collect location information for the trees you are using in this submission, as well as sample information if you indicated you were providing sample data in the previous page.

_images/location_information.png
Tree Accession Information

This section will require one or more tree accession files:

_images/tree_accession.png

The simplest tree accession file will simply require a tree identifier column, and location columns such as latitude/longitude, or country/state. However, if your accession file contains trees from multiple species, you will also need columns that indicate which species a tree is from. If you do not have the exact locations for your trees and instead have population groups, you can indicate a population group column, and you will be prompted to indicate the location of each population group below the file field.

If you have properly filled out the column types for tree identifier and location, you will be able to view your data in a thumbnail map by clicking “Click here to view trees on map!”. This is useful for verifying that the locations you have uploaded are being interpreted correctly by GTTN-TPPS.

Sample Information

This section requires a sample file:

_images/sample_information.png

This section requires a variety of information about each sample you are submitting. The sample file you provide must contain the following columns:

  • Internal Sample ID or Xylarium ID
  • Sample Source - Which tree or other sample does this sample come from?
  • Sample Dimensions - What are the LxWxH dimensions of the sample?
  • Remaining Volume of Sample - How much of the sample is left?

You will then need to provide this information in either a file column or by filling out the fields below the file upload field:

  • Collection Date - The date the sample was collected
  • Sample Collector - The person who collected the sample
  • Sample Tissue - The type of tissue the sample is. This is usually bark, heartwood, leaf, etc.
  • Sampling Method - The method of collecting the sample. This is either increment core, punch, disc, or cube.
  • Analyzed - Whether or not the sample has already been analyzed.
  • Storage Location - The location where the sample is being stored. This is how GTTN keeps track of the inventory of each organization and the location of each sample.

Finally, you will need to indicate whether the samples can be shared with other organizations.

Reference and Analysis Data

This page contains reference and analysis data. The contents of the form depend on the selections made on the Species and Data Type Information page.

Direct Analysis in Real Time (DART)

This section collects information about a DART analysis:

_images/dart_information.png

To upload DART information, you will need a top-level DART data file, and a compressed DART Raw Data file.

Top-Level DART Data File

The top-level DART file will contain an entry for each sample analyzed in the DART submission, and each entry must include the following information:

  • Sample Internal ID or Xylarium ID
  • Analysis Lab Name - The lab that performed the DART analysis.
  • Analysis Lab Spectra ID
  • Spectra Gatherer
  • Type of DART TOFMS
  • Parameter Settings - The parameter settings on the DART machine used for analysis.
  • Calibration Type

Here is an example top-level DART data file:

_images/example_dart_top.png
Compressed DART Raw Data File

The compressed DART Raw Data file should be a .zip, .tar, or .gz file which is a compressed foler of plain text files. Each text file should contain the DART spectra for one sample. Here is an example of the uncompressed foler:

_images/example_dart_folder.png

Note that each file name is of the format <sample ID>.txt. This is important for identifying which file is associated with which sample. The file itself should be a collection of weights and peaks, which is the raw DART data:

_images/example_dart_raw.png

The file should be in the format:

<Title>
<DART Configuration>

<weight>\t<peak>
[<weight>\t<peak>...]
Isotope

This section collects information about an Isotope analysis:

_images/isotope_information.png

You will be required to indicate which isotopes you used, which isotope standard you used for each isotope, and the type of each isotope (whole wood or cellulose). You will then be required to provide an isotope data file, which contains an entry for each analyzed sample. Each entry must contain a column with the sample ID, and a column with the measurement for each isotope used. Here is a simple example of an isotope data file:

_images/example_isotope.png
Genetics

This section collects information about a Genetic analysis:

_images/genetic_information.png

For all genetic information, you will need to provide the DNA Quality Score.

The contents of the form in this section vary greatly depending on the type of genetic analysis and the type of genetic markers used.

SNPs

For SNP data, you will first need to identify the source of your SNPs, either GBS, Reference Genome, Transcriptome, or Genotype Assay.

If you selected GBS as your source, you will need to provide the following:

  • GBS Type (ddRad, RAD, NextRad, etc)
  • GBS Sequencing Instrument name
  • GBS Intermediate reference file: either select a reference file from the list of existing reference files on the GTTN Server, or upload your own reference file.
  • GBS Alignment file
  • VCF File

If you selected Assay as your source, you will need to provide the following:

  • Assay Source (MassArray, Illumina, Thermo)
  • Assay Design File
  • Assay Genotype Table
SSRs/cpSSRs

For microsatellite data, you will need to provide the name of the Sequencing Instrument you used for your analysis, as well as the ploidy of the organism you are analyzing and an SSR spreadsheet containing all of the raw SSR data.

Wood Anatomy

This section collects information about a Wood Anatomy analysis:

_images/anatomy_information_1.png

We collect metadata for each species based on the IAWA standards: Nomenclature, General, Vessels, Tracheids and fibres, Axial parenchyma, Rays, Storied structures, Mineral inclusions, Physical and chemical tests.

_images/anatomy_information_2.png

You can upload any number of microscope slide images and provide brief descriptions for each.

Submission Review

Finally, once all of the reference and analysis data has been provided, you will be shown a brief summary of the data you are submitting before you mark your submission as complete. This page will contain information provided by the user including the submission name, collection reason, etc. and will allow the user to view previews of the files they have provided to the form. After the user has verified that all of the information is correct, they can click “Submit” and the submission will be sent to administrators for the approval process.

Reference and Sample Data Search Form

To browse and search reference data that has been uploaded through GTTN-TPPS, we use the reference and sample data search form. The form can be found at gttn.treegenesdb.org/reference.

While the form is still a work in progress, the prototype currently available should be able to give an idea of what might be available in the future.

Currently, there are two possible data types to search by: Reference Data Submissions and Sample Data. We plan to expand this search form so that it is possible for users to search for different data types, filter by different criteria, and access detail views which include the full list of data for elements retruned by the search.

Reference Data Submissions

To browse all reference data submissions, select “Reference Data Submission” from the first drop-down menu, then click “Search”. If the text field is left blank, then the search results will include every available Data submission that has been approved from GTTN-TPPS:

_images/data_submissions.png

Here we can see some brief data about the submissions that matched our search.

We can currently filter submissions by project name, species, data type, or submitting organization for example, if we enter something like “DART Test” to the textfield and click search, the form will return only the submissions with the phrase “DART Test” in the project name:

_images/filter_submissions.png

You can also view details about submissions (as long as you have adequate permissions) from this page by clicking the accession number or the project name. You will then be able to browse the fine details of the submission such as information about individual samples and trees, as well as raw data downloads:

_images/submission_details.png

Sample Data

Similar to the Reference Data Submissions, you can browse all samples by selecting “Sample” from the first drop-down menu, then clicking “Search”:

_images/data_sample.png

We can currently filter by Sample ID and species. To search by species, simply select “Species” from the second drop-down menu, and enter something like “Entan” to the textfield and click search, the form will return only the samples which are from a species whose name contains “Entan”:

_images/filter_samples.png

Administrator Information