CACHET_CADB¶

Bases: _DataBase

CACHET-CADB: A Contextualized Ambulatory Electrocardiography Arrhythmia Dataset

ABOUT

The database has 259 days of contextualized ECG recordings from 24 patients and 1,602 manually annotated 10 s heart-rhythm samples.
The length of the ECG records in the CACHET-CADB varies from 24 h to 3 weeks.
The patient’s ambulatory context information (activities, movement acceleration, body position, etc.) is extracted for every 10 s interval cumulatively.
nearly 11% of the ECG data in the database is found to be noisy.
Webpages for downloading the database [1] and the short-format database [2], see also the GitHub repository [3].

Usage

ECG arrhythmia detection
Self-Supervised Learning

References

Citation

10.3389/fcvm.2022.893090 10.11583/DTU.14547264 10.11583/DTU.14547330

Parameters:

db_dir (path-like, optional) – Storage path of the database. If not specified, data will be fetched from Physionet.
working_dir (path-like, optional) – Working directory, to store intermediate files and log files.
verbose (int, default 1) – Level of logging verbosity.
kwargs (dict, optional) – Auxilliary key word arguments

property all_subjects: List[str]¶: List of all subject IDs.

property database_info: DataBaseInfo¶: The DataBaseInfo object of the database.

property df_metadata: DataFrame¶: The table of metadata of the records.

download(files: str | Sequence[str] | None) → None[source]¶

Download the database from the DTU website.

Parameters:: files (str or Sequence[str], optional) – Files to download, can be subset of “CACHET-CADB.zip”, “cachet-cadb_short_format_without_context.hdf5.zip”. If is None, all files will be downloaded.

get_absolute_path(rec: str | int, extension: str = 'signal-ecg') → Path[source]¶

Get the absolute path of the signal folder of the record.

Parameters:

rec (str or int) – Record name or index of the record in all_records.
extension (str, default "signal-ecg") – Extension of the file, can be one of “header”, “annotation”, “signal”, “annotation-context”, “signal-ecg”, “signal-acc”, “signal-angularrate”, “signal-hr_live”, “signal-hrvrmssd_live”, etc.

Returns:

Absolute path of the file.

Return type:

pathlib.Path

get_record_metadata(rec: str | int) → Dict[str, str][source]¶

Get metadata of the record.

Parameters:: rec (str or int) – Record name or index of the record in all_records, or “short_format” (-1) to load data from the short format file.
Returns:: metadata – Metadata of the record
Return type:: dict

get_subject_id(rec: str | int) → str[source]¶

Attach a unique subject ID for the record.

Parameters:: rec (str or int) – Record name or index of the record in all_records.
Returns:: sid – Subject ID attached to the record.
Return type:: str

get_subject_info(rec_or_sid: str | int, items: List[str] | None = None) → Dict[str, str][source]¶

Read auxiliary information of a subject (a record) stored in the header files.

Parameters:

rec (str or int) – Record name, or index of the record in all_records, or the subject ID.
items (List[str], optional) – Items of the subject”s information (e.g. sex, age, etc.).

Returns:

subject_info – Information about the subject, including “age”, “gender”, “height”, “weight”.

Return type:

dict

load_ann(rec: str | int, ann_format: str = 'pd') → DataFrame | ndarray | Dict[int | str, ndarray][source]¶

Load annotation from the metadata file.

Parameters:

rec (str or int) – Record name or index of the record in all_records.
ann_format (str, default "pd") – Format of the annotation, currently only “pd” is supported.

Returns:

ann – The annotation of the record.

Return type:

pandas.DataFrame or numpy.ndarray or dict

load_context_ann(rec: str | int, sheet_name: str | None = None) → DataFrame | Dict[str, DataFrame][source]¶

Load context annotation.

Parameters:

rec (str or int) – Record name or index of the record in all_records.
sheet_name (str, optional) – Sheet name of the context annotation file, can be one of “movisens DataAnalyzer Parameter”, “movisens DataAnalyzer Results”. If is None, all sheets will be loaded.

Returns:

context_ann – Context annotations of the record.

Return type:

pandas.DataFrame or dict

Load context data (e.g. accelerometer, heart rate, etc.).

Parameters:

rec (str or int) – Record name or index of the record in all_records.
context_name (str) – Context name, can be one of “acc”, “angularrate”, “hr_live”, “hrvrmssd_live”, “movementacceleration_live”, “press”, “marker”.
sampfrom (int, optional) – Start index of the data to be loaded.
sampto (int, optional) – End index of the data to be loaded.
channels (str or int or List[str] or List[int], optional) – Channels (names or indices) to be loaded. If is None, all channels will be loaded.
units (str, optional) – Units of the output signal, currently can only be “default”; None for digital data, without digital-to-physical conversion.
fs (numbers.Real, optional) – Sampling frequency of the output signal. If not None, the loaded data will be resampled to this frequency, otherwise, the original sampling frequency will be used.

Returns:

context_data – Context data in the “channel_first” format.

Return type:

numpy.ndarray or pandas.DataFrame

Note

If the record does not have the specified context data, empty array or DataFrame will be returned.

Load physical (converted from digital) ECG data, or load digital signal directly.

Parameters:

rec (str or int) – Record name or index of the record in all_records, or “short_format” (-1) to load data from the short format file.
sampfrom (int, optional) – Start index of the data to be loaded.
sampto (int, optional) – End index of the data to be loaded.
data_format (str, default "channel_first") – Format of the ECG data, “channel_last” (alias “lead_last”), or “channel_first” (alias “lead_first”), or “flat” (alias “plain”).
units (str or None, default "mV") – Units of the output signal, can also be “μV” (aliases “uV”, “muV”); None for digital data, without digital-to-physical conversion.
fs (numbers.Real, optional) – Sampling frequency of the output signal. If not None, the loaded data will be resampled to this frequency, otherwise, the original sampling frequency will be used.
return_fs (bool, default False) – Whether to return the sampling frequency of the output signal.

Returns:

data (numpy.ndarray) – The loaded ECG data.
data_fs (numbers.Real, optional) – Sampling frequency of the output signal. Returned if return_fs is True.

plot(rec: str | int, **kwargs: Any) → None[source]¶: Not implemented.

property subject_records: Dict[str, List[str]]¶: Dict of subject IDs and their corresponding records.

property url: Dict[str, str]¶: URL(s) for downloading the database.