PTBXL#

class torch_ecg.databases.PTBXL(db_dir: str | bytes | PathLike, working_dir: str | bytes | PathLike | None = None, verbose: int = 1, feature_db_dir: str | bytes | PathLike | None = None, **kwargs: Any)[source]#

Bases: PhysioNetDataBase

PTB-XL, a large publicly available electrocardiography dataset

ABOUT

  1. The PTB-XL database [1] is a large database of 21799 clinical 12-lead ECGs from 18869 patients of 10 second length collected with devices from Schiller AG over the course of nearly seven years between October 1989 and June 1996.

  2. The raw waveform data of the PTB-XL database was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each recording which were converted into a standardized set of SCP-ECG statements (scp_codes).

  3. The PTB-XL database contains 71 different ECG statements conforming to the SCP-ECG standard, including diagnostic, form, and rhythm statements.

  4. The waveform files of the PTB-XL database are stored in WaveForm DataBase (WFDB) format with 16 bit precision at a resolution of 1μV/LSB and a sampling frequency of 500Hz. A downsampled versions of the waveform data at a sampling frequency of 100Hz is also provided.

  5. In the metadata file (ptbxl_database.csv), each record of the PTB-XL database is identified by a unique ecg_id. The corresponding patient is encoded via patient_id. The paths to the original record (500 Hz) and a downsampled version of the record (100 Hz) are stored in filename_hr and filename_lr. The report field contains the diagnostic statements assigned to the record by the cardiologists. The scp_codes field contains the SCP-ECG statements assigned to the record which are formed as a dictionary with entries of the form statement: likelihood, where likelihood is set to 0 if unknown).

  6. The PTB-XL database underwent a 10-fold train-test splits (stored in the strat_fold field of the metadata file) obtained via stratified sampling while respecting patient assignments, i.e. all records of a particular patient were assigned to the same fold. Records in fold 9 and 10 underwent at least one human evaluation and are therefore of a particularly high label quality. It is proposed to use folds 1-8 as training set, fold 9 as validation set and fold 10 as test set.

Note

  1. A new comprehensive feature database named PTB-XL+ [2] was created to supplement the PTB-XL database.

Usage

  1. Classification of ECG images

Issues

References

Citation

https://doi.org/10.1038/s41597-023-02153-8 https://doi.org/10.1038/s41597-020-0495-6 https://doi.org/10.13026/nqsf-pc74 https://doi.org/10.13026/6sec-a640

Parameters:
  • db_dir (path-like) – Storage path of the database. If not specified, data will be fetched from Physionet.

  • working_dir (path-like, optional) – Working directory, to store intermediate files and log files.

  • verbose (int, default 1) – Level of logging verbosity.

  • feature_db_dir (path-like, optional) – Whether to include the feature database (the PTB-XL+ database).

  • kwargs (dict, optional) – Auxilliary key word arguments.

property database_info: DataBaseInfo#

The DataBaseInfo object of the database.

load_ann(rec: str | int, with_interpretation: bool = False) Dict[str, float | Dict[str, Any]][source]#

Load the annotation (the “scp_codes” field) of a record.

Parameters:
  • rec (str or int) – The record name (ecg_id) or the index of the record.

  • with_interpretation (bool, default False) – Whether to include the interpretation of the statement.

Returns:

ann – The annotation of the record, of the form {statement: likelihood}. If with_interpretation is True, the form is {statement: {"likelihood": likelihood, ...}}, where ... are other information of the statement.

Return type:

dict

load_metadata(rec: str | int, items: str | List[str] | None = None) Dict[str, str | int | float] | str | int | float[source]#

Load the metadata of a record.

Parameters:
  • rec (str or int) – The record name (ecg_id) or the index of the record.

  • items (str or list of str, optional) – The items to load.

Returns:

metadata – The metadata of the record.

Return type:

dict

reset_fs(fs: int) None[source]#

Reset the default sampling frequency.

Parameters:

fs (int) – The new sampling frequency.