CPSC2021Dataset

class torch_ecg.databases.datasets.CPSC2021Dataset(config: CFG, task: str, training: bool = True, lazy: bool = True, **reader_kwargs: Any)[source]

Bases: ReprMixin, Dataset

Data generator for feeding data into pytorch models using the CPSC2021 database.

Strategies for generating data and labels: 1. ECGs are preprocessed and stored in one folder 2. preprocessed ECGs are sliced with overlap to generate data and label for different tasks:

  • the data files stores segments of fixed length of preprocessed ECGs,

  • the annotation files contain “qrs_mask”, and “af_mask”

The returned values (tuple) of __getitem__() depends on the task:

  1. “qrs_detection”: (data, qrs_mask, None)

  2. “rr_lstm”: (rr_seq, rr_af_mask, rr_weight_mask)

  3. “main”: (data, af_mask, weight_mask)

where

  • data shape: (n_lead, n_sample)

  • qrs_mask shape: (n_sample, 1)

  • af_mask shape: (n_sample, 1)

  • weight_mask shape: (n_sample, 1)

  • rr_seq shape: (n_rr, 1)

  • rr_af_mask shape: (n_rr, 1)

  • rr_weight_mask shape: (n_rr, 1)

Typical values of n_sample and n_rr are 6000 and 30, respectively.

n_lead is typically 2, which is the number of leads in the ECG signal of the CPSC2021 database.

Parameters:
  • config (dict) –

    Configurations for the dataset, ref. CPSC2021TrainCfg. A simple example is as follows:

    >>> config = deepcopy(CPSC2021TrainCfg)
    >>> config.db_dir = "some/path/to/db"
    >>> dataset = CPSC2021Dataset(config, task="main", training=True, lazy=False)
    

  • training (bool, default True) – If True, the training set will be loaded, otherwise the test (val) set will be loaded.

  • lazy (bool, default True) – If True, the data will not be loaded immediately, instead, it will be loaded on demand.

  • **reader_kwargs (dict, optional) – Keyword arguments for the database reader class.

extra_repr_keys() List[str][source]

Extra keys for __repr__() and __str__().

load_preprocessed_data(rec: str) ndarray[source]

Load the preprocessed data of the record.

Parameters:

rec (str) – Name of the record.

Returns:

The pre-computed preprocessed ECG data of the record.

Return type:

numpy.ndarray

persistence(force_recompute: bool = False, verbose: int = 0) None[source]

Save the preprocessed data to disk.

Parameters:
  • force_recompute (bool, default False) – Whether to force recompute the preprocessed data.

  • verbose (int, default 0) – Verbosity level for printing the progress.

Return type:

None

plot_seg(seg: str, ticks_granularity: int = 0) None[source]

Plot the segment.

Parameters:
  • seg (str) – Name of the segment, of pattern like “S_1_1_0000193”.

  • ticks_granularity (int, default 0) – Granularity to plot axis ticks, the higher the more ticks. 0 (no ticks) –> 1 (major ticks) –> 2 (major + minor ticks)

Return type:

None

reset_task(task: str, lazy: bool = True) None[source]

Reset the task of the data generator.

Parameters:
  • task (str) – The task to be set.

  • lazy (bool, optional) – If True, the data will not be loaded immediately, instead, it will be loaded on demand.

Return type:

None