CPSC2018#

class torch_ecg.databases.CPSC2018(db_dir: str | bytes | PathLike | None = None, working_dir: str | bytes | PathLike | None = None, verbose: int = 1, **kwargs: Any)[source]#

Bases: CPSCDataBase

The China Physiological Signal Challenge 2018: Automatic identification of the rhythm/morphology abnormalities in 12-lead ECGs.

ABOUT

  1. training set contains 6,877 (female: 3178; male: 3699) 12 leads ECG recordings lasting from 6 s to just 60 s.

  2. ECG recordings were sampled as 500 Hz.

  3. the training data can be downloaded using links in [1], but the link in [2] is recommended. File structure will be assumed to follow [2].

  4. the training data are in the channel first format.

  5. types of abnormal rhythm/morphology + normal in the training set:

    No.

    name

    abbr.

    number of records

    0

    Normal

    N

    918

    1

    Atrial fibrillation

    AF

    1098

    2

    First-degree atrioventricular block

    I-AVB

    704

    3

    Left bundle brunch block

    LBBB

    207

    4

    Right bundle brunch block

    RBBB

    1695

    5

    Premature atrial contraction

    PAC

    556

    6

    Premature ventricular contraction

    PVC

    672

    7

    ST-segment depression

    STD

    825

    8

    ST-segment elevated

    STE

    202

  6. ordering of the leads in the data of all the records are

    ["I", "II", "III", "aVR", "aVL", "aVF", "V1", "V2", "V3", "V4", "V5", "V6"]
    
  7. meanings in the .hea files: to write

  8. knowledge about the abnormal rhythms: ref. get_disease_knowledge().

  9. Challenge official website [1], see also [2].

Note

  1. Ages of records A0608, A1549, A1876, A2299, A5990 are “NaN”.

  2. CINC2020 (ref. [2]) released totally 3453 unused training data of CPSC2018, whose filenames start with “Q”. These file names are not “continuous”. The last record is “Q3581”.

Usage

  1. ECG arrythmia detection

References

Citation

10.1166/jmihi.2018.2442

Parameters:
  • db_dir (path-like, optional) – Storage path of the database. If not specified, data will be fetched from Physionet.

  • working_dir (path-like, optional) – Working directory, to store intermediate files and log files.

  • verbose (int, default 1) – Level of logging verbosity.

  • kwargs (dict, optional) – Auxilliary key word arguments.

property database_info: DataBaseInfo#

The DataBaseInfo object of the database.

download() None[source]#

Download the database from self.url.

get_labels(rec: str | int, ann_format: str = 'n') List[str][source]#

Load labels (diagnoses or arrhythmias) of a record.

Parameters:
  • rec (str or int) – Record name or index of the record in all_records.

  • ann_format (str, default "n") –

    Format of labels, one of the following (case insensitive):

    • ”a”, abbreviations

    • ”f”, full names

    • ”n”, numeric codes

Returns:

labels – The list of labels.

Return type:

List[str]

get_subject_id(rec: int | str) int[source]#

Attach a unique subject ID for the record.

Parameters:

rec (str or int) – Record name or index of the record in all_records.

Returns:

Subject ID associated with the record.

Return type:

int

get_subject_info(rec: int | str, items: List[str] | None = None) dict[source]#

Get subject information (e.g sex, age, etc.).

Parameters:
  • rec (int or str) – Record name or index of the record in all_records.

  • items (List[str], optional) – Items of the subject information (e.g. sex, age, etc.).

Returns:

subject_info – The subject information.

Return type:

dict

load_ann(rec: str | int, ann_format: str = 'n') List[str][source]#

Load labels (diagnoses or arrhythmias) of a record.

Parameters:
  • rec (str or int) – Record name or index of the record in all_records.

  • ann_format (str, default "n") –

    Format of labels, one of the following (case insensitive):

    • ”a”, abbreviations

    • ”f”, full names

    • ”n”, numeric codes

Returns:

labels – The list of labels.

Return type:

List[str]

load_data(rec: int | str, leads: str | int | Sequence[int | str] | None = None, data_format='channel_first', units: str = 'mV', return_fs: bool = False) ndarray | Tuple[ndarray, Real][source]#

Load the ECG data of a record.

Parameters:
  • rec (int or str) – Record name or index of the record in all_records.

  • leads (str or int or Sequence[str] or Sequence[int], optional) – The leads to load, None or “all” for all leads.

  • data_format (str, default "channel_first") – Format of the ECG data, “channel_last” or “channel_first” (original)

  • units (str, default "mV") – Units of the output signal, can also be “μV” (with an alias “uV”), case insensitive.

  • return_fs (bool, default False) – Whether to return the sampling frequency of the output signal.

Returns:

  • data (numpy.ndarray) – The loaded ECG data.

  • data_fs (numbers.Real, optional) – Sampling frequency of the output signal. Returned if return_fs is True.

plot(rec: int | str, ticks_granularity: int = 0, leads: str | List[str] | None = None, **kwargs: Any) None[source]#

Plot the ECG data of a record.

Parameters:
  • rec (int or str) – Record name or index of the record in all_records.

  • ticks_granularity (int, default 0) – Granularity to plot axis ticks, the higher the more ticks. 0 (no ticks) –> 1 (major ticks) –> 2 (major + minor ticks)

  • leads (str or List[str], optional) – The leads to plot

  • kwargs (dict, optional) – Auxilliary key word arguments to pass to matplotlib.pyplot.subplots().

property url: List[str]#

URL(s) for downloading the database.

Added in version 0.0.4.