ARC Data Hub: Bring your ARCs to the cloud

The ARC Data Hub concept applies the software development principles of Continuous Integration (CI) and Continuous Deployment (CD) to the research data management ( RDM ) framework provided by ARCs, making ARCs first-class citizens in the cloud. ARCs can be continuously validated, built, and deployed much like software. By using CI/CD for a set of subsequently defined tasks, many collaborative cloud platforms such as GitLab, GitHub, or Bitbucket can be used to build an ARC Data Hub.

ARC Data Hub leverages CI/CD capabilities to build, deploy, and validate ARCs

#Continuous Deployment

CD can be used to continuously deploy ARC artifacts such as metadata export formats, computational results, etc. to another environment.

ARC Data Hubs use CD to build and deploy the ARC-RO-Crate metadata of each commit to a central package registry. This way, both representations of the ARC are always in sync and accessible, deploying both a user-centric and a machine-readable view on the ARC.

ARC Data Hubs use CD to build and deploy the ARC-RO-Crate metadata

#Continuous Integration

Incremental changes on ARCs can be used to trigger integrations. ARC Data Hubs can run a user-selected set of validation packages on each commit or pull request to verify that an ARC is still valid for the targets of choice after the change is done .

Furthermore, the validation package output can be used to continuously inform the user about the current state of the ARC, for example by creating badges on the ARC page, much like the widely known build and test badges in software development.

#Continuous Quality Control

Continuous Quality Control (CQC) is a combination of CI and CD that integrates external services depending on the result of ARC validation. Successful validation can trigger downstream applications, either automatically or manually via CQC Hooks.

The PLANTdataHUB serves as a reference implementation of an ARC Data Hub, centrally hosted by the NFDI DataPLANT for the plant research community. Beyond its core functionality as an ARC Data Hub, it incorporates CQC within the data publication pipeline, ensuring that all required metadata for publication is complete and accurate.

CQC also supports submissions to various endpoint repositories, provided the corresponding validation package and downstream submission application are available. This flexible system ensures that ARC submissions meet the necessary standards for different repositories, enabling seamless integration and data sharing across platforms.

cqc can be used to submit relevant parts of an ARC to endpoint repositories