CACHET_CADB

class torch_ecg.databases.CACHET_CADB(db_dir: str | bytes | PathLike | None = None, working_dir: str | bytes | PathLike | None = None, verbose: int = 1, **kwargs: Any)[source]

Bases: _DataBase

CACHET-CADB: A Contextualized Ambulatory Electrocardiography Arrhythmia Dataset

ABOUT

  1. The database has 259 days of contextualized ECG recordings from 24 patients and 1,602 manually annotated 10 s heart-rhythm samples.

  2. The length of the ECG records in the CACHET-CADB varies from 24 h to 3 weeks.

  3. The patient’s ambulatory context information (activities, movement acceleration, body position, etc.) is extracted for every 10 s interval cumulatively.

  4. nearly 11% of the ECG data in the database is found to be noisy.

  5. Webpages for downloading the database [1] and the short-format database [2], see also the GitHub repository [3].

Usage

  1. ECG arrhythmia detection

  2. Self-Supervised Learning

References

Citation

10.3389/fcvm.2022.893090 10.11583/DTU.14547264 10.11583/DTU.14547330

Parameters:
  • db_dir (path-like, optional) – Storage path of the database. If not specified, data will be fetched from Physionet.

  • working_dir (path-like, optional) – Working directory, to store intermediate files and log files.

  • verbose (int, default 1) – Level of logging verbosity.

  • kwargs (dict, optional) – Auxilliary key word arguments

property all_subjects: List[str]

List of all subject IDs.

property database_info: DataBaseInfo

The DataBaseInfo object of the database.

property df_metadata: DataFrame

The table of metadata of the records.

download(files: str | Sequence[str] | None) None[source]

Download the database from the DTU website.

Parameters:

files (str or Sequence[str], optional) – Files to download, can be subset of “CACHET-CADB.zip”, “cachet-cadb_short_format_without_context.hdf5.zip”. If is None, all files will be downloaded.

get_absolute_path(rec: str | int, extension: str = 'signal-ecg') Path[source]

Get the absolute path of the signal folder of the record.

Parameters:
  • rec (str or int) – Record name or index of the record in all_records.

  • extension (str, default "signal-ecg") – Extension of the file, can be one of “header”, “annotation”, “signal”, “annotation-context”, “signal-ecg”, “signal-acc”, “signal-angularrate”, “signal-hr_live”, “signal-hrvrmssd_live”, etc.

Returns:

Absolute path of the file.

Return type:

pathlib.Path

get_record_metadata(rec: str | int) Dict[str, str][source]

Get metadata of the record.

Parameters:

rec (str or int) – Record name or index of the record in all_records, or “short_format” (-1) to load data from the short format file.

Returns:

metadata – Metadata of the record

Return type:

dict

get_subject_id(rec: str | int) str[source]

Attach a unique subject ID for the record.

Parameters:

rec (str or int) – Record name or index of the record in all_records.

Returns:

sid – Subject ID attached to the record.

Return type:

str

get_subject_info(rec_or_sid: str | int, items: List[str] | None = None) Dict[str, str][source]

Read auxiliary information of a subject (a record) stored in the header files.

Parameters:
  • rec (str or int) – Record name, or index of the record in all_records, or the subject ID.

  • items (List[str], optional) – Items of the subject”s information (e.g. sex, age, etc.).

Returns:

subject_info – Information about the subject, including “age”, “gender”, “height”, “weight”.

Return type:

dict

load_ann(rec: str | int, ann_format: str = 'pd') DataFrame | ndarray | Dict[int | str, ndarray][source]

Load annotation from the metadata file.

Parameters:
  • rec (str or int) – Record name or index of the record in all_records.

  • ann_format (str, default "pd") – Format of the annotation, currently only “pd” is supported.

Returns:

ann – The annotation of the record.

Return type:

pandas.DataFrame or numpy.ndarray or dict

load_context_ann(rec: str | int, sheet_name: str | None = None) DataFrame | Dict[str, DataFrame][source]

Load context annotation.

Parameters:
  • rec (str or int) – Record name or index of the record in all_records.

  • sheet_name (str, optional) – Sheet name of the context annotation file, can be one of “movisens DataAnalyzer Parameter”, “movisens DataAnalyzer Results”. If is None, all sheets will be loaded.

Returns:

context_ann – Context annotations of the record.

Return type:

pandas.DataFrame or dict

load_context_data(rec: str | int, context_name: str, sampfrom: int | None = None, sampto: int | None = None, channels: str | int | List[str] | List[int] | None = None, units: str | None = None, fs: Real | None = None) ndarray | DataFrame[source]

Load context data (e.g. accelerometer, heart rate, etc.).

Parameters:
  • rec (str or int) – Record name or index of the record in all_records.

  • context_name (str) – Context name, can be one of “acc”, “angularrate”, “hr_live”, “hrvrmssd_live”, “movementacceleration_live”, “press”, “marker”.

  • sampfrom (int, optional) – Start index of the data to be loaded.

  • sampto (int, optional) – End index of the data to be loaded.

  • channels (str or int or List[str] or List[int], optional) – Channels (names or indices) to be loaded. If is None, all channels will be loaded.

  • units (str, optional) – Units of the output signal, currently can only be “default”; None for digital data, without digital-to-physical conversion.

  • fs (numbers.Real, optional) – Sampling frequency of the output signal. If not None, the loaded data will be resampled to this frequency, otherwise, the original sampling frequency will be used.

Returns:

context_data – Context data in the “channel_first” format.

Return type:

numpy.ndarray or pandas.DataFrame

Note

If the record does not have the specified context data, empty array or DataFrame will be returned.

load_data(rec: str | int, sampfrom: int | None = None, sampto: int | None = None, data_format: str = 'channel_first', units: str | None = 'mV', fs: Real | None = None, return_fs: bool = False) ndarray | Tuple[ndarray, Real][source]

Load physical (converted from digital) ECG data, or load digital signal directly.

Parameters:
  • rec (str or int) – Record name or index of the record in all_records, or “short_format” (-1) to load data from the short format file.

  • sampfrom (int, optional) – Start index of the data to be loaded.

  • sampto (int, optional) – End index of the data to be loaded.

  • data_format (str, default "channel_first") – Format of the ECG data, “channel_last” (alias “lead_last”), or “channel_first” (alias “lead_first”), or “flat” (alias “plain”).

  • units (str or None, default "mV") – Units of the output signal, can also be “μV” (aliases “uV”, “muV”); None for digital data, without digital-to-physical conversion.

  • fs (numbers.Real, optional) – Sampling frequency of the output signal. If not None, the loaded data will be resampled to this frequency, otherwise, the original sampling frequency will be used.

  • return_fs (bool, default False) – Whether to return the sampling frequency of the output signal.

Returns:

  • data (numpy.ndarray) – The loaded ECG data.

  • data_fs (numbers.Real, optional) – Sampling frequency of the output signal. Returned if return_fs is True.

plot(rec: str | int, **kwargs: Any) None[source]

Not implemented.

property subject_records: Dict[str, List[str]]

Dict of subject IDs and their corresponding records.

property url: Dict[str, str]

URL(s) for downloading the database.