Open Algorithms (OPAL): Key Concepts
The following are the key concepts and principles underlying the open algorithms paradigm:
- Moving the algorithm to the data: Instead of pulling raw data into a centralized location for processing, it is the algorithms that should be sent to the data repositories and be processed there.
- Raw data must never leave its repository: Raw data must never be exported from its repository, and must always be under the control of its owner or the owner of the data repository.
- Vetted algorithms: Algorithms must be vetted to be “safe” from bias, discrimination, privacy violations and other unintended consequences. The data owner (data provider) must ensure that the algorithms which it authors/publishes has been thoroughly analyzed for safety and privacy-preservation (i.e. fairness, accountability and transparency in Machine Learning).
- Provide only safe answers: When executing an algorithm on a data-set, the data-repository must always provide responses that are deemed “safe” from a privacy perspective. Responses must not release or leak personally identifying information (PII) without the consent of the user (subject). This may imply that a data repository return only aggregate answers.
- Trust Networks (Data Federation): In a group-based information sharing configuration – referred to as Data Sharing Federation – algorithms must be vetted collectively by the trust network members. The operational aspects of the federation should be governed by a legal trust framework for data federation.
- Consent for algorithm execution: Data repositories that hold subject data should obtain consent from the subject when the subject’s data is to be included in a given algorithm execution.
- Decentralized Data: By leaving raw data in its repository, the OPAL paradigm points towards a decentralized architecture for data stores.
- Personal Data Stores: Decentralized data architectures should also incorporate the notion of personal data stores (PDS) as a legitimate data repository end-point.