dqa.tasks.tasks.Task class is a crucial core ingredient of the REMORA framework. By deriving subclasses
form it, a wide range of operations on data can be implemented.
Various operations implemented in this form are already included in the framework (see Tasks).
Custom extensions can easily be added by creating new subclasses.
The Task class implements
modify_*() methods in multiple levels as described in DQA internal data format.
modify_dataset_dict() on the highest level calls
modify_dataset() for each
Dataset and this
again continues in an analogous fashion.
Depending on the type of the operation to be implemented, overriding the
modify_*() method of one specific level
could be most convenient. For example, an elementary operation such as a logarithm
dqa.tasks.transformations.Log) only works on a data row and ist most conveniently implemented by
modify_data_row() method. On the other hand, for example the class
dqa.tasks.data_structure.JoinMachines joins the data from multiple machines into one (within a dataset) and
is implemented by overriding
By default, a task is applied to every dataset, every machine, etc. However, the constructor parameters in the Task
class can restrict the parts of the datasets it should be applied to. Specifically, the
can be given as a list of strings or only one string and specifies that the Task will only be applied to these
datasets. By default, it is
Null, indicating that the Task will be applied to all datasets. By specifying
output_dataset (usually a list with the same length as
input_dataset), the output of the Task can also be
written to a different dataset. By default, it is written to the same one. Analogously, there are also such
parameters for the other levels:
output_machinespecify the Machine names to use as input/output.
output_namespecify the names of the input/output data rows.