A key concern for research undertaken using Australian patient records is personal data privacy. In line with the Department of Health’s Data Access and Release Policy, there is commitment to unlock the greatest possible public value through better research use of datasets. This means that the department may make patient health records available to researchers if done so in an appropriately de-identified and confidentialised form.

The challenge for Agile Digital with partners at Gulanga Group and Vault Systems was how the Health department can realise benefits to the community without risking data breaches nor the re-identifying of de-identified patient data.

The Mission

The following project goals were agreed with the Department of Health for proof by design & implementation:

  • deliver security-certified cloud deployment and secure web access for authorised parties to a federated collection of datasets meeting the My Health Record specification.
  • ensure metadata, networks, management systems and security operations will be maintained in Australia by staff vetted by the Australian Government to appropriate security clearance.
  • provide an elastic storage capacity for effectively unlimited dataset storage & search indexing.
  • demonstrate how research collaborators may share & analyse datasets in the Hub without downloading raw datasets.
  • demonstrate how all upload, view and analyse accesses of datasets will be recorded (notarised) in perpetuity.

The Approach

The key insight was to avoiding distributing bulk data to health researchers. Rather we can provide a powerful data science lab environment to bring researchers to the data. By automatically notarising all data access and data science experiments on public blockchains, we ensure an indelible trail in support of audit and scientific publishing. The engineering approach was to:

  • secure the entire solution in ASD-certified Government cloud infrastucture (by Vault Systems).
  • deploy Hadoop to provide scalable & reliable mass data storage in support of parallel processing of rich data research queries.
  • create a “data science lab” using an open source feature engineering platform.
  • record provenance for each data science experiment. Each derivation of a feature records which data feature(s) it is based upon, how the feature was defined and when it was created.
  • ensure each data experiment (feature) is stored individually so that HDFS security may be used to further restrict access to the data.
  • ensure that all data experiments, data features and data accesses are notarised on indelible public blockchains.