DataSampling

Data Sampling

Data Sampling provides a possibility to sample data in DPL workflows based on certain conditions ( 5% randomly, when a payload is greater than 4234 bytes, etc.). The job of passing the right data is done by a data processor called Dispatcher. A desired data stream is specified in form of Data Sampling Policies, configured by JSON structures (example below) or by using dedicated interface methods (for advanced use).

{
  "id": "policy_example1",              # name of the policy
  "active": "false",                    # activation flag
  "machines": [                         # list of machines where the policy should be run (now ignored)
    "aido2flp1",
    "aido2flp2"
  ],                                    # list of data that should be sampled, the format is:
                                        # binding1:origin1/description1/subSpec1[;binding2:...]
  "query": "clusters:TPC/CLUSTERS/0;tracks:TPC/TRACKS/0",
                                        # optional list of outputspecs for sampled data, matching the query
                                        # if not present or specified, the default format is used
  "outputs": "sampled_clusters:DS/CLUSTERS/0;sampled_tracks:DS/TRACKS/0", 
  "samplingConditions": [               # list of sampling conditions
    {
      "condition": "random",            # condition type
      "fraction": "0.1",                # condition-dependent parameter: fraction of data to sample
      "seed": "2112"                    # condition-dependent parameter: seed of PRNG
    }
  ],
  "blocking": "false"                   # should the dispatcher block the main data flow? (now ignored)
}

Usage

One can use Data Sampling either by merging the standalone Data Sampling workflow with other DPL workflows:

o2-workflow-abc | o2-datasampling-standalone --config json://path/to/config.json | o2-workflow-xyz

...or by incorporating the code below into a DPL workflow which needs sampling:

#include "DataSampling/DataSampling.h"
using namespace o2::framework;
using namespace o2::utilities;
void customize(std::vector<CompletionPolicy>& policies)
{
  DataSampling::CustomizeInfrastructure(policies);
}

void customize(std::vector<ChannelConfigurationPolicy>& policies)
{
  DataSampling::CustomizeInfrastructure(policies);
}

#include "Framework/runDataProcessing.h"

std::vector<DataProcessorSpec> defineDataProcessing(ConfigContext &ctx)
{

 WorkflowSpec workflow;
// <declaration of other DPL processors>

 DataSampling::GenerateInfrastructure(workflow, "json:///absolute/path/to/config/file.json");

 return workflow;
}

Sampled data can be subscribed to by adding InputSpecs provided by std::vector<InputSpec> DataSampling::InputSpecsForPolicy(const std::string& policiesSource, const std::string& policyName) to a chosen data processor. Then, they can be accessed by the bindings specified in the configuration file. Dispatcher adds a DataSamplingHeader to the header stack, which contains statistics like total number of evaluated/accepted messages for a given Policy or the sampling time since epoch. If no sampling policies are specified, Dispatcher will not be spawned.

The o2-datasampling-pod-and-root workflow can serve as a usage example.

Data Sampling Conditions

The following sampling conditions are available. When more than one is used, a positive decision is taken when all the conditions are fulfilled.

DataSamplingConditionRandom - pseudo-randomly accepts specified fraction of incoming messages.

{
  "condition": "random",
  "fraction": "0.1",
  "seed": "22222"
}

DataSamplingConditionNConsecutive - approves n consecutive samples in defined cycle. It assumes that timesliceID always increments by one.

{
  "condition": "nConsecutive",
  "samplesNumber": "3",
  "cycleSize": "100"
}

DataSamplingConditionPayloadSize - approves messages having payload size within specified boundaries.

{
  "condition": "payloadSize",
  "lowerLimit": "300",
  "upperLimit": "500"
}

DataSamplingConditionCustom - loads a custom condition, which should inherit from DataSamplingCondition, from a specified library.

{
  "condition": "custom",
  "moduleName": "QcExample",
  "className": "o2::quality_control_modules::example::ExampleCondition",
  "customParam": "value"
}

Name		Name	Last commit message	Last commit date
parent directory ..
etc		etc
include/DataSampling		include/DataSampling
scripts		scripts
src		src
test		test
CMakeLists.txt		CMakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Data Sampling

Usage

Data Sampling Conditions

FilesExpand file tree

DataSampling

Directory actions

More options

Directory actions

More options

Latest commit

History

DataSampling

Folders and files

parent directory

README.md

Data Sampling

Usage

Data Sampling Conditions