CPSC2021

class torch_ecg.databases.CPSC2021(db_dir: str | bytes | PathLike | None = None, working_dir: str | bytes | PathLike | None = None, verbose: int = 1, **kwargs: Any)[source]

Bases: PhysioNetDataBase

The 4th China Physiological Signal Challenge 2021: Paroxysmal Atrial Fibrillation Events Detection from Dynamic ECG Recordings

ABOUT

  1. source ECG data are recorded from 12-lead Holter or 3-lead wearable ECG monitoring devices

  2. dataset provides variable-length ECG fragments extracted from lead I and lead II of the long-term source ECG data, each sampled at 200 Hz

  3. AF event is limited to be no less than 5 heart beats

  4. training set in the 1st stage consists of 730 records, extracted from the Holter records from 12 AF patients and 42 non-AF patients (usually including other abnormal and normal rhythms); training set in the 2nd stage consists of 706 records from 37 AF patients (18 PAF patients) and 14 non-AF patients

  5. test set comprises data from the same source as the training set as well as DIFFERENT data source, which are NOT to be released at any point

  6. annotations are standardized according to PhysioBank Annotations (Ref. [2] or PhysioNetDataBase.helper()), and include the beat annotations (R peak location and beat type), the rhythm annotations (rhythm change flag and rhythm type) and the diagnosis of the global rhythm

  7. classification of a record is stored in corresponding .hea file, which can be accessed via the attribute comments of a wfdb Record obtained using wfdb.rdheader(), wfdb.rdrecord(), and wfdb.rdsamp(); beat annotations and rhythm annotations can be accessed using the attributes symbol, aux_note of a wfdb Annotation obtained using wfdb.rdann(), corresponding indices in the signal can be accessed via the attribute sample

  8. challenge task:

    • clasification of rhythm types: non-AF rhythm (N), persistent AF rhythm (AFf) and paroxysmal AF rhythm (AFp)

    • locating of the onset and offset for any AF episode prediction

  9. challenge metrics:

    • metrics (Ur, scoring matrix) for classification:

      Figure made with TikZ

      The scoring matrix for the recording-level classification result.

    • metric (Ue) for detecting onsets and offsets for AF events (episodes): +1 if the detected onset (or offset) is within ±1 beat of the annotated position, and +0.5 if within ±2 beats.

    • final score (U):

      \[U = \dfrac{1}{N} \sum\limits_{i=1}^N \left( Ur_i + \dfrac{Ma_i}{\max\{Mr_i, Ma_i\}} \right)\]

      where \(N\) is the number of records, \(Ma\) is the number of annotated AF episodes, \(Mr\) is the number of predicted AF episodes.

  10. Challenge official website [1]. Webpage of the database on PhysioNet [2].

Note

  1. if an ECG record is classified as AFf, the provided onset and offset locations should be the first and last record points. If an ECG record is classified as N, the answer should be an empty list

  2. it can be inferred from the classification scoring matrix that the punishment of false negatives of AFf is very heavy, while mixing-up of AFf and AFp is not punished

  3. flag of atrial fibrillation and atrial flutter (“AFIB” and “AFL”) in annotated information are seemed as the same type when scoring the method

  4. the 3 classes can coexist in ONE subject (not one record). For example, subject 61 has 6 records with label “N”, 1 with label “AFp”, and 2 with label “AFf”

  5. rhythm change annotations (“(AFIB”, “(AFL”, “(N” in the aux_note field or “+” in the symbol field of the annotation files) are inserted 0.15s ahead of or behind (onsets or offset resp.) of corresponding R peaks.

  6. some records are revised if there are heart beats of the AF episode or the pause between adjacent AF episodes less than 5. The id numbers of the revised records are summarized in the attached REVISED_RECORDS.

Usage

  1. AF (event, fine) detection

References

Citation

10.13026/ksya-qw89

Parameters:
  • db_dir (path-like, optional) – Storage path of the database. If not specified, data will be fetched from Physionet.

  • working_dir (path-like, optional) – Working directory, to store intermediate files and log files.

  • verbose (int, default 1) – Level of logging verbosity.

  • kwargs (dict, optional) – Auxilliary key word arguments

property database_info: DataBaseInfo

The DataBaseInfo object of the database.

gen_endpoint_score_mask(rec: str | int, bias: dict = {1: 1, 2: 0.5}, verbose: int | None = None) Tuple[ndarray, ndarray][source]

Generate the scoring mask for the onsets and offsets of af episodes.

Parameters:
  • rec (str or int) – Record name or index of the record in all_records.

  • bias (dict, default {1: 1, 2: 0.5}) – Bias for the scoring of the onsets and offsets of af episodes. Keys are bias (with ±) in terms of number of rpeaks, and values are corresponding scores.

  • verbose (int, optional) – Verbosity level. If is None, self.verbose will be used.

Returns:

onset_score_mask, offset_score_mask – 2-tuple of ndarray, which are the scoring mask for the onset and offsets predictions of af episodes.

Return type:

Tuple[numpy.ndarray]

Note

The onsets in af_intervals are 0.15s ahead of the corresponding R peaks, while the offsets in af_intervals are 0.15s behind the corresponding R peaks.

get_absolute_path(rec: str | int, extension: str | None = None) Path[source]

Get the absolute path of the record.

Parameters:
  • rec (str or int) – Record name or index of the record in all_records.

  • extension (str, optional) – Extension of the file.

Returns:

abs_path – Absolute path of the file.

Return type:

pathlib.Path

get_subject_id(rec: str | int) str[source]

Attach a unique subject ID to the record.

Parameters:

rec (str or int) – Record name or index of the record in all_records.

Returns:

sid – Subject ID corresponding to the record.

Return type:

str

load_af_episodes(rec: str | int, ann: Annotation | None = None, sampfrom: int | None = None, sampto: int | None = None, keep_original: bool = False, fs: Real | None = None, fmt: str = 'intervals') List[List[int]] | ndarray[source]

Load the episodes of atrial fibrillation, in terms of intervals or mask.

Parameters:
  • rec (str or int) – Record name or index of the record in all_records.

  • ann (wfdb.Annotation, optional) – The wfdb Annotation of the record. If is None, corresponding annotation file will be read.

  • sampfrom (int, optional) – Start index of the AF episodes to be loaded. Not used when fmt is “c_intervals”.

  • sampto (int, optional) – End index of the AF episodes to be loaded. Not used when fmt is “c_intervals”.

  • keep_original (bool, default False) – If True, indices will keep the same with the annotation file, otherwise subtract sampfrom if specified. Valid only when fmt is not “c_intervals”.

  • fs (numbers.Real, optional) – If not None, positions of the loaded intervals or mask will be ajusted according to this sampling frequency. Otherwise, the sampling frequency of the record will be used.

  • fmt ({"intervals", "mask", "c_intervals"}, optional) – Format of the episodes of atrial fibrillation, by default “intervals”.

Returns:

af_episodes – Episodes of atrial fibrillation, in terms of intervals or mask.

Return type:

list or numpy.ndarray

load_ann(rec: str | int, field: str | None = None, sampfrom: int | None = None, sampto: int | None = None, **kwargs: Any) dict | ndarray | List[List[int]] | str[source]

Load annotations of the record.

Parameters:
  • rec (str or int) – Record name or index of the record in all_records.

  • field ({"rpeaks", "af_episodes", "label", "raw", "wfdb"}, optional) – Field of the annotation. If is None, all fields of the annotation will be returned in the form of a dict. If is “raw” or “wfdb”, then the corresponding wfdb “Annotation” will be returned.

  • sampfrom (int, optional) – Start index of the annotation to be loaded.

  • sampto (int, optional) – End index of the annotation to be loaded.

  • kwargs (dict) –

    Key word arguments for functions loading rpeaks, af_episodes, and label respectively, including:

    • fs: int, optional, the resampling frequency

    • fmt: str, format of af_episodes, or format of label, for more details, ref. corresponding functions.

    Used only when field is specified (not None).

Returns:

ann – Annotaton of the record.

Return type:

dict or list or numpy.ndarray or str

load_label(rec: str | int, ann: Annotation | None = None, sampfrom: int | None = None, sampto: int | None = None, fmt: str = 'a') str[source]

Load (classifying) label of the record.

The three classes are:

  • “non atrial fibrillation”,

  • “paroxysmal atrial fibrillation”,

  • “persistent atrial fibrillation”.

Parameters:
  • rec (str or int) – Record name or index of the record in all_records.

  • ann (wfdb.Annotation, optional) – Not used, to keep in accordance with other methods.

  • sampfrom (int, optional) – Not used, to keep in accordance with other methods.

  • sampto (int, optional) – Not used, to keep in accordance with other methods.

  • fmt (str, default "a") –

    Format of the label, case in-sensitive, can be one of

    • ”f”, “fullname”: the full name of the label

    • ”a”, “abbr”, “abbrevation”: abbreviation for the label

    • ”n”, “num”, “number”: class number of the label (in accordance with the settings of the offical class map)

Returns:

label – Classifying label of the record.

Return type:

str

load_rpeak_indices(rec: str | int, ann: Annotation | None = None, sampfrom: int | None = None, sampto: int | None = None, keep_original: bool = False, valid_only: bool = True, fs: Real | None = None) ndarray[source]

Load position (in terms of samples) of rpeaks.

Parameters:
  • rec (str or int) – Record name or index of the record in all_records.

  • ann (wfdb.Annotation, optional) – The wfdb Annotation of the record. If is None, corresponding annotation file will be read.

  • sampfrom (int, optional) – Start index of the rpeak positions to be loaded.

  • sampto (int, optional) – End index of the rpeak positions to be loaded.

  • keep_original (bool, default False) – If True, indices will keep the same with the annotation file, otherwise subtract sampfrom if specified.

  • valid_only (bool, default True) – If True, only valid rpeaks will be returned, otherwise, all indices in the sample field of the annotation will be returned. Valid rpeaks are those with symbol in WFDB_Beat_Annotations. Symbols in WFDB_Non_Beat_Annotations are considered as invalid rpeaks

  • fs (numbers.Real, optional) – If not None, positions of the loaded rpeaks will be ajusted according to this sampling frequency.

Returns:

rpeaks – Position (in terms of samples) of rpeaks of the record.

Return type:

numpy.ndarray

load_rpeaks(rec: str | int, ann: Annotation | None = None, sampfrom: int | None = None, sampto: int | None = None, keep_original: bool = False, valid_only: bool = True, fs: Real | None = None) ndarray[source]

Load position (in terms of samples) of rpeaks.

Parameters:
  • rec (str or int) – Record name or index of the record in all_records.

  • ann (wfdb.Annotation, optional) – The wfdb Annotation of the record. If is None, corresponding annotation file will be read.

  • sampfrom (int, optional) – Start index of the rpeak positions to be loaded.

  • sampto (int, optional) – End index of the rpeak positions to be loaded.

  • keep_original (bool, default False) – If True, indices will keep the same with the annotation file, otherwise subtract sampfrom if specified.

  • valid_only (bool, default True) – If True, only valid rpeaks will be returned, otherwise, all indices in the sample field of the annotation will be returned. Valid rpeaks are those with symbol in WFDB_Beat_Annotations. Symbols in WFDB_Non_Beat_Annotations are considered as invalid rpeaks

  • fs (numbers.Real, optional) – If not None, positions of the loaded rpeaks will be ajusted according to this sampling frequency.

Returns:

rpeaks – Position (in terms of samples) of rpeaks of the record.

Return type:

numpy.ndarray

plot(rec: str | int, data: ndarray | None = None, ann: Dict[str, ndarray] | None = None, ticks_granularity: int = 0, sampfrom: int | None = None, sampto: int | None = None, leads: str | int | List[str | int] | None = None, waves: Dict[str, Sequence[int]] | None = None, **kwargs) None[source]

Plot the signals of a record.

plot the signals of a record or external signals (units in μV), with metadata (labels, episodes of atrial fibrillation, etc.), possibly also along with wave delineations.

Parameters:
  • rec (str or int) – Record name or index of the record in all_records.

  • data (numpy.ndarray, optional) – (2-lead) ECG signal to plot. Should be of the format “channel_first”, and compatible with leads. If given, data of rec will not be used. This is useful when plotting filtered data.

  • ann (dict, optional) – Annotations for data. Ignored if data is None.

  • ticks_granularity (int, default 0) – Granularity to plot axis ticks, the higher the more ticks. 0 (no ticks) –> 1 (major ticks) –> 2 (major + minor ticks)

  • sampfrom (int, optional) – Start index of the data to plot.

  • sampto (int, optional) – End index of the data to plot.

  • leads (str or List[str], optional) – Names of the leads to plot.

  • waves (dict, optional) – Indices of the wave critical points, including “p_onsets”, “p_peaks”, “p_offsets”, “q_onsets”, “q_peaks”, “r_peaks”, “s_peaks”, “s_offsets”, “t_onsets”, “t_peaks”, “t_offsets”

  • kwargs (dict, optional) – Additional keyword arguments to pass to matplotlib.pyplot.plot().

TODO

  1. Slice too long records, and plot separately for each segment.

  2. Plot waves using axvspan().

Note

  1. Locator of plt has default MAXTICKS of 1000. If not modifying this number, at most 40 seconds of signal could be plotted once.

  2. Raw data usually have very severe baseline drifts, hence the isoelectric line is not plotted.

Contributors: Jeethan, and WEN Hao

property url_: str

URL of the compressed database file