Tasks#

The dqa.tasks.tasks.Task class is a crucial core ingredient of the REMORA framework. By deriving subclasses form it, a wide range of operations on data can be implemented. Various operations implemented in this form are already included in the framework (see Tasks). Custom extensions can easily be added by creating new subclasses.

The Task class implements modify_*() methods in multiple levels as described in DQA internal data format. By default, modify_dataset_dict() on the highest level calls modify_dataset() for each Dataset and this again continues in an analogous fashion.

Depending on the type of the operation to be implemented, overriding the modify_*() method of one specific level could be most convenient. For example, an elementary operation such as a logarithm (dqa.tasks.transformations.Log) only works on a data row and ist most conveniently implemented by overriding the modify_data_row() method. On the other hand, for example the class dqa.tasks.data_structure.JoinMachines joins the data from multiple machines into one (within a dataset) and is implemented by overriding modify_dataset().

By default, a task is applied to every dataset, every machine, etc. However, the constructor parameters in the Task class can restrict the parts of the datasets it should be applied to. Specifically, the input_dataset parameter can be given as a list of strings or only one string and specifies that the Task will only be applied to these datasets. By default, it is Null, indicating that the Task will be applied to all datasets. By specifying output_dataset (usually a list with the same length as input_dataset), the output of the Task can also be written to a different dataset. By default, it is written to the same one. Analogously, there are also such parameters for the other levels:

  • input_machine and output_machine specify the Machine names to use as input/output.

  • input_name and output_name specify the names of the input/output data rows.