Towards Principles of Modeling the Registry Dynamics

In PARTHENOS, we envision a new research infrastructure paradigm, which acknowledges the necessity of complete monitoring of provenance of knowledge of research data.
The proposed innovation lies in new forms of reasoning on provenance data that can support scientific truth maintenance. The goal of this infrastructure is to recognise that data is a component of a dynamic interaction of researchers about evolving and consolidating knowledge, which cannot be divorced and abstracted from the researchers and their processes.

[pullquote]In PARTHENOS, we envision a new research infrastructure paradigm, which acknowledges the necessity of complete monitoring of provenance of knowledge of research data. [/pullquote]Clearly, we care on maintaining datasets in a consistent state. For doing so, the research infrastructure has to be informed from the institutional environments about updates on the resources. Also, the research infrastructure has to be able to contact the institutional environments for several reasons that have to do with the quality of the datasets, their legal privileges or to know if the datasets can be cited.

Two particular processes of crucial importance for the PARTHENOS infrastructure are the processes of keeping and updating/curating datasets. These processes influence the construction of the infrastructure’s registry that, abstractly, maintains the who and which in the infrastructure, i.e., who keeps and curates which datasets. Specifically, understanding the needed information for these two processes directs the selection of the attributes that will be included in our registry.

In particular, for the keeping process, we need to maintain the state of keeping for a dataset, consisting of the following pieces of information: i) when the keeper starts holding a copy of the dataset on one of his machines (begin of existence), ii) up to when the keeper holds the copy of the dataset at his machine (end of existence – if this information is available), iii) which is the last confirmation time, i.e., the time that the keeper has confirmed that he holds the copy of the dataset at his machine, and iv) which are the methods and policies of access provision. For the curation process, we need to maintain: i) when the curator starts curating the dataset (begin of existence), ii) when the dataset stops to have a curator (end of existence – if this information is available), iii) which is the last confirmation time, i.e., the time that curator has confirmed that he curates the dataset, and iv) which are the curating policies. Note that the non-existence of a keeper for a dataset makes the dataset a lost one, while the non-existence of a curator makes the dataset an orphan.

We see both keeping and curating as services.

We propose to study all situations of maintenance and responsibility as services. This way, we will not have direct properties, such as has-keeper or has-curator for a dataset. Alternative, we can define the service of keeping and curating as subclasses of the class service, which are related to the datasets via relationships, such as is about, pertains, is subject of. Simply, our motivation for this, is that the facts that someone hosts or curates something is a service. In turn, a volatile dataset is specified with respect to the curation service that defines it.

[pullquote align=”right”]We define a service as the continued, declared willingness and ability of an actor to execute on demand by a client certain activities of specific benefit to the client.[/pullquote]

We define a service as the continued, declared willingness and ability of an actor to execute on demand by a client certain activities of specific benefit to the client. The identity of a service therefore, depends on the individual actor, the type of activity and/ or the type of product of such an activity. An instance of a service begins to exist with the ability and willingness of provision by the actor and ends when either one permanently ends, i.e., the ability may temporarily be interrupted, such as the actor being on vacation or a machine is on repair, without meaning that the service as such has ended. The service includes all auxiliary abilities of the same actor to execute the respective activities, but not services provided by third parties in the course of his service provisioning. We further distinguish between services according to the types of objects handled and/or the kind of tools employed, such as providing data computing, data content, network access, and other not IT-determined services, such as handling of physical objects.