Federated Research Data Repository (FRDR)¶
We are pleased to announce that we have set up a FRDR collection for the cluster, on which all cluster-associated datasets will appear.
To deposit a dataset into the cluster’s FRDR collection, refer to the Deposit section.
Storage and Backup¶
From FRDR’s About page:
Data submitted to FRDR is housed on Compute Canada managed infrastructure at the University of Victoria, BC or at the University of Waterloo, ON. Research data submitted to FRDR does not leave Canada. The metadata related to datasets is housed in a database the University of Victoria. Most of that metadata is shared with Globus, running on Amazon Web Services services in the USA, to be indexed and made available for discovering datasets. Certain metadata fields, for example, submitter contact information, are not shared with Globus.
Larger uploads are made using Globus. While Globus is hosted in the USA on AWS, only the public metadata is stored there; the datasets themselves are transferred securely between the endpoints so the data does not leave Canada.
There is a theoretical dataset limit of 4TB due to the limitation of the Archivematica data preservation system. There has not been further comment on whether they will impose their own upload size limit.
However, they do have limited resources so they may impose restrictions during curation if datasets of unreasonable magnitude are uploaded.
Only authorised curators can make changes to submitted data. A request must be made via email in case any changes need to be made.
Curators perform tasks such as:
- Checking metadata record for completeness
- Linking DOIs for related publications
- Validation of data, for instance, flagging tabular null data with no explanation, corrupt files
- Checking documentation
- Checking for copyright and ethical violations
They typically take up to 48 hours to complete this process, after which it takes 15 minutes to complete DOI registration, and a further 2 hours for the dataset to appear for download in the FRDR portal.
Data Sharing and Collaboration¶
FRDR includes search functionality for its own datasets and datasets that it harvests from other sources such as the Scholars Portal Dataverse. Users can search for/deposit datasets using the online web interface or by using an API. Note that while depositing data requires authorization, anybody can search for datasets. FRDR is also format agnostic, and allows users to manage the dataset file hierarchy. It also support embargos. They also issue DOIs for all deposited datasets. Other features include data integrity checks using checksums, curation and upload authentication. Through the use of Globus, FRDR also enables secure transfer of large datasets with the ability to make asynchronous and resumable transfers that are automatically managed.
Deposit and Download¶
Globus Connect Personal¶
To download or deposit a dataset, Globus Connect Personal must first be installed then your computer must be set up as an Endpoint.
- Log into FRDR using one of these accounts: Orcid, Compute Canada, Globus ID, Google. Detailed log-in instructions can be found here.
- Click on Endpoints from the toolbar on the left-hand side of the page, then click on Create new endpoint at the top right corner.
- Select “Globus Connect Personal”.
- Follow the three steps on the page.
The endpoint you created will show up in the “Endpoints” page. You are now able to download datasets from FRDR and if you are already an approved depositor, you can also submit datasets.
Any lab in the cluster can deposit datasets into the “UBC Brain Circuits” special storage group in FRDR. PIs can become depositors by sending an email to Jeffrey LeDue with the email address or account that they used to log into FRDR. It can be one of the following (copied from FRDR, Getting Authorization To Submit):
You will receive an email from email@example.com with the subject “You are invited to join FRDR Depositors - Déposants DFDR”. Once you accept the invitation, “FRDR UBC Brain Circuits Depositors” should appear when you click on Groups in the toolbar. You are now eligible to deposit datasets into the cluster’s storage group.
FRDR provides information and instructions on
- Before Depositing
- Depositing (includes Using Globus to Upload Dataset)
- After Depositing
To submit a dataset, log into FRDR and click on “Deposit Dataset” in the toolbar at the top of the page. Click on “Submit a New Dataset” in the “New Submission” box. This will take you to a page titled Submit: Select Storage Group.
Make sure you select “UBC Brain Circuits” under Special Storage Groups!
To download a dataset from FRDR,
- Navigate to the page of the desired dataset. Click on Download Dataset, which is located near the bottom. This will take you to the File Manager page but if you’re not already logged into FRDR, it will first prompt you to log in.
- You should see two columns: the left one contains the dataset under a Collection that is similar to “FRDR-Prod-2”. Select that files you want to download or click select all in the blue toolbar if you want to download the entire dataset.
- Click Transfer or Sync to located in the middle of the two columns.
- Click on -select a collection-. This is where the dataset will be downloaded.
- Choose the Endpoint corresponding to the computer you wish the download to occur in. If you don’t already have an Endpoint set up on the computer you are using, click on Install Globus Connect Personal and follow the instructions.
- Once you’ve selected the endpoint, the File Manager page reappears. Make sure you check out the “Transfer & Sync Options”. To start the download, click on the blue Start button underneath the left column with an arrow pointing towards the column corresponding to the Endpoint you’ve chosen.
- A green banner should appear with the message: “Transfer request submitted successfully” followed by the task ID. You can track the transfer’s progress in the Activity page.
You will receive an email from firstname.lastname@example.org with the subject: “SUCCEEDED” followed by the task id once the transfer is complete.