radiometry.dataset module

Datasets: units containing data and metadata.

The dataset is one key concept of the ASpecD framework and hence the radiometry package derived from it, consisting of the data as well as the corresponding metadata. Storing metadata in a structured way is a prerequisite for a semantic understanding within the routines. Furthermore, a history of every processing, analysis and annotation step is recorded as well, aiming at a maximum of reproducibility. This is part of how the ASpecD framework and therefore the radiometry package tries to support good scientific practice.

Therefore, each processing and analysis step of data should always be performed using the respective methods of a dataset, at least as long as it can be performed on a single dataset.

Datasets

Generally, there are two types of datasets: Those containing experimental data and those containing calculated data.

However, in case of the radiometry package dealing with many different kinds of datasets due to the very general nature of the measurement program (eve) used at the PTB in Berlin and the quite different kinds of measurements, the hierarchy of datasets deviates from the typical scenario in ASpecD.

For the time being, there is one dataset representing all information that can possibly be contained in a measurement file (eve HDF5 file):

Dataset factory

Particularly in case of recipe-driven data analysis (c.f. aspecd.tasks), there is a need to automatically retrieve datasets using nothing more than a source string that can be, e.g., a path or LOI. This is where the DatasetFactory comes in. This is a factory in the sense of the factory pattern described by the “Gang of Four” in their seminal work, “Design Patterns” (Gamma et al., 1995):

  • DatasetFactory

Module documentation

class radiometry.dataset.EveDataset

Bases: Dataset

Representation of the data and metadata contained in an eve HDF5 file.

The idea behind this class is to represent all possible data and metadata contained in an eve HDF5 file, regardless of the kind of measurement actually performed. Therefore, dedicated other dataset classes need to be developed that are purpose-built for more specific kinds of measurements.

Note

This class should become the central interface between eve HDF5 files and processing and analysis routines, together with the corresponding importer. Hence, these two classes need to reflect any further update of the eve HDF5 format. In accord with the open-closed principle, the EveDataset should only be extended, but not change existing structures to not impair backward compatibility.

While representing all possible data and metadata, this class is not a mere reimplementation of the eve HDF5 file structure, but an abstraction taking into account the concepts of the ASpecD framework.

For a convenient overview of the structure of this dataset, see the dataset structure.

metadata

Hierarchical structure containing all relevant metadata

Type:

radiometry.metadata.EveDatasetMetadata

add_reference(dataset=None)

Add a reference to another dataset to the list of references.

A reference is always an object of type aspecd.dataset.DatasetReference that will be automatically created from the dataset provided.

Parameters:

dataset (aspecd.dataset.Dataset) – dataset a reference for should be added to the list of references

Raises:

aspecd.exceptions.MissingDatasetError – Raised if no dataset was provided

analyse(analysis_step=None)

Apply analysis to dataset.

Every analysis step is an object of type aspecd.analysis.SingleAnalysisStep and is passed as an argument to analyse().

The information necessary to reproduce an analysis is stored in the analyses attribute as object of class aspecd.dataset.AnalysisHistoryRecord. This record contains as well a (deep) copy of the complete history of the dataset stored in history.

Parameters:

analysis_step (aspecd.analysis.SingleAnalysisStep) – analysis step to apply to the dataset

Returns:

analysis_step – analysis step applied to the dataset

Return type:

aspecd.analysis.SingleAnalysisStep

analyze(analysis_step=None)

Apply analysis to dataset.

Same method as analyse(), but for those preferring AE over BE.

annotate(annotation_=None)

Add annotation to dataset.

Parameters:

annotation (aspecd.annotation.DatasetAnnotation) – annotation to add to the dataset

append_history_record(history_record)

Append history record to dataset history.

This method should never be called manually, but only from within classes of the ASpecD framework, at least as long as you are not interested in Orwellian History.

Parameters:

history_record (aspecd.history.HistoryRecord) – History record (of a processing step) to be appended.

Changed in version 0.2: Converted into a public method, due to needs of aspecd.processing.MultiProcessingStep

delete_analysis(index=None)

Remove analysis step record from dataset.

Parameters:

index (int) – Number of analysis in analyses to delete

delete_annotation(index=None)

Remove annotation record from dataset.

Parameters:

index (int) – Number of analysis in analyses to delete

delete_representation(index=None)

Remove representation record from dataset.

Parameters:

index (int) – Number of analysis in analyses to delete

export_to(exporter=None)

Export data and metadata.

This requires initialising an aspecd.io.DatasetImporter object first that is provided as an argument for this method.

Note

The same operation can be performed by calling the export_from() method of an aspecd.io.Exporter object taking an aspecd.dataset.Dataset object as argument.

However, as usually the dataset is already at hand, first creating an instance of a respective exporter and then calling export_to() of the dataset is the preferred way.

Parameters:

exporter (aspecd.io.DatasetExporter) – Exporter writing data and metadata to specific output format

from_dict(dict_=None)

Set properties from dictionary.

Only parameters in the dictionary that are valid properties of the class are set accordingly.

Note

In conjunction with the aspecd.dataset.to_dict() method, this method allows to serialise and deserialise dataset objects, i.e. all kinds of storage to the persistence layer.

Parameters:

dict (dict) – Dictionary containing properties to set

import_from(importer=None)

Import data and metadata contained in importer object.

This requires initialising an aspecd.io.Importer object first that is provided as an argument for this method.

Note

The same operation can be performed by calling the import_into() method of an aspecd.io.Importer object taking an aspecd.dataset.Dataset object as argument.

However, as usually one wants to continue working with a dataset, first creating an instance of a dataset and a respective importer and then calling import_from() of the dataset is the preferred way.

Parameters:

importer (aspecd.io.DatasetImporter) – Importer containing data and metadata read from some source

load(filename=None)

Load dataset object from persistence layer.

The dataset will be loaded from a file conforming to the ASpecD dataset format (adf). For details, see the aspecd.io.AdfExporter class.

property package_name

Return package name.

The name of the package the dataset is implemented in is a crucial detail for writing the history. The value is set automatically and is read-only.

plot(plotter=None)

Perform plot with data of current dataset.

Every plotter is an object of type aspecd.plotting.Plotter and is passed as an argument to plot().

The information necessary to reproduce a plot is stored in the representations attribute as object of class aspecd.dataset.PlotHistoryRecord. This record contains as well a (deep) copy of the complete history of the dataset stored in history. Besides being a necessary prerequisite to reproduce a plot, this allows to automatically recreate plots requiring different incompatible preprocessing steps in arbitrary order.

Parameters:

plotter (aspecd.plotting.Plotter) – plot to perform with data of current dataset

Returns:

plotter – plot performed on the current dataset

Return type:

aspecd.plotting.Plotter

Raises:

aspecd.exceptions.MissingPlotterError – Raised when trying to plot without plotter

process(processing_step=None)

Apply processing step to dataset.

Every processing step is an object of type aspecd.processing.SingleProcessingStep and is passed as argument to process().

Calling this function ensures that the history record is added to the dataset as well as a few basic checks are performed such as for leading history, meaning that the _history_pointer is not set to the current tip of the history of the dataset. In this case, an error is raised.

Note

If processing_step is undoable, all previous plots stored in the list of representations will be removed, as these plots cannot be reproduced due to a change in _origdata.

Parameters:

processing_step (aspecd.processing.SingleProcessingStep) – processing step to apply to the dataset

Returns:

processing_step – processing step applied to the dataset

Return type:

aspecd.processing.SingleProcessingStep

Raises:

aspecd.exceptions.ProcessingWithLeadingHistoryError – Raised when trying to process with leading history

redo()

Reapply previously undone processing step.

Raises:

aspecd.exceptions.RedoAlreadyAtLatestChangeError – Raised when trying to redo with empty history

remove_reference(dataset_id=None)

Remove a reference to another dataset from the list of references.

A reference is always an object of type aspecd.dataset.DatasetReference that was automatically created from the respective dataset when adding the reference.

Parameters:

dataset_id (string) – ID of the dataset the reference should be removed for

Raises:

aspecd.exceptions.MissingDatasetError – Raised if no dataset ID was provided

save(filename=None)

Save dataset to persistence layer.

The dataset will be saved in ASpecD dataset format (adf). For details, see the aspecd.io.AdfExporter class.

strip_history()

Remove leading history, if any.

If a dataset has a leading history, i.e., its history pointer does not point to the last entry of the history, and you want to perform a processing step on this very dataset, you need first to strip its history, as otherwise, a ProcessingWithLeadingHistoryError will be raised.

tabulate(table=None)

Create table from data of current dataset.

Every table is an object of type aspecd.table.Table and is passed as an argument to tabulate().

The information necessary to reproduce a table is stored in the representations attribute as object of class aspecd.dataset.TableHistoryRecord.

Parameters:

table (aspecd.table.Table) – table created from the data of the current dataset

Returns:

table – table created from the data of the current dataset

Return type:

aspecd.table.Table

Raises:

TypeError – Raised when trying to tabulate without table

to_dict(remove_empty=False)

Create dictionary containing public attributes of an object.

Parameters:

remove_empty (bool) –

Whether to remove keys with empty values

Default: False

Returns:

public_attributes – Ordered dictionary containing the public attributes of the object

The order of attribute definition is preserved

Return type:

collections.OrderedDict

Changed in version 0.6: New parameter remove_empty

Changed in version 0.9: Settings for properties to exclude and include are not traversed

Changed in version 0.9.1: Dictionaries get copied before traversing, as otherwise, the special variables __dict__ and __0dict__ are modified, what may result in strange behaviour.

undo()

Revert last processing step.

Actually, the history pointer is decremented and starting from the _origdata, all processing steps are reapplied to the data up to this point in history.

Raises:
class radiometry.dataset.DeviceData

Bases: DeviceData

One sentence (on one line) describing the class.

More description comes here…

metadata

Metadata of the device used to record the additional data

Type:

radiometry.metadata.Device

property axes

Get or set axes.

If you set axes, they will be checked for consistency with the data. Therefore, first set the data and only afterwards the axes, with values corresponding to the dimensions of the data.

Raises:
property data

Get or set (numeric) data.

Note

If you set data that have different dimensions to the data previously stored in the dataset, the axes values will be set to an array with indices corresponding to the size of the respective data dimension. You will most probably assign proper axis values afterwards. On the other hand, all other information stored in the axis object will be retained, namely quantity, unit, and label.

from_dict(dict_=None)

Set properties from dictionary, e.g., from serialised dataset.

Only parameters in the dictionary that are valid properties of the class are set accordingly.

The list of axes is handled appropriately.

Parameters:

dict (dict) – Dictionary containing properties to set

to_dict(remove_empty=False)

Create dictionary containing public attributes of an object.

Parameters:

remove_empty (bool) –

Whether to remove keys with empty values

Default: False

Returns:

public_attributes – Ordered dictionary containing the public attributes of the object

The order of attribute definition is preserved

Return type:

collections.OrderedDict

Changed in version 0.6: New parameter remove_empty

Changed in version 0.9: Settings for properties to exclude and include are not traversed

Changed in version 0.9.1: Dictionaries get copied before traversing, as otherwise, the special variables __dict__ and __0dict__ are modified, what may result in strange behaviour.