PTBXL#
- class torch_ecg.databases.PTBXL(db_dir: str | bytes | PathLike, working_dir: str | bytes | PathLike | None = None, verbose: int = 1, feature_db_dir: str | bytes | PathLike | None = None, **kwargs: Any)[source]#
Bases:
PhysioNetDataBase
PTB-XL, a large publicly available electrocardiography dataset
ABOUT
The PTB-XL database [1] is a large database of 21799 clinical 12-lead ECGs from 18869 patients of 10 second length collected with devices from Schiller AG over the course of nearly seven years between October 1989 and June 1996.
The raw waveform data of the PTB-XL database was annotated by up to two cardiologists, who assigned potentially multiple ECG statements to each recording which were converted into a standardized set of SCP-ECG statements (scp_codes).
The PTB-XL database contains 71 different ECG statements conforming to the SCP-ECG standard, including diagnostic, form, and rhythm statements.
The waveform files of the PTB-XL database are stored in WaveForm DataBase (WFDB) format with 16 bit precision at a resolution of 1μV/LSB and a sampling frequency of 500Hz. A downsampled versions of the waveform data at a sampling frequency of 100Hz is also provided.
In the metadata file (ptbxl_database.csv), each record of the PTB-XL database is identified by a unique ecg_id. The corresponding patient is encoded via patient_id. The paths to the original record (500 Hz) and a downsampled version of the record (100 Hz) are stored in filename_hr and filename_lr. The report field contains the diagnostic statements assigned to the record by the cardiologists. The scp_codes field contains the SCP-ECG statements assigned to the record which are formed as a dictionary with entries of the form statement: likelihood, where likelihood is set to 0 if unknown).
The PTB-XL database underwent a 10-fold train-test splits (stored in the strat_fold field of the metadata file) obtained via stratified sampling while respecting patient assignments, i.e. all records of a particular patient were assigned to the same fold. Records in fold 9 and 10 underwent at least one human evaluation and are therefore of a particularly high label quality. It is proposed to use folds 1-8 as training set, fold 9 as validation set and fold 10 as test set.
Note
A new comprehensive feature database named PTB-XL+ [2] was created to supplement the PTB-XL database.
Usage
Classification of ECG images
Issues
References
Citation
https://doi.org/10.1038/s41597-023-02153-8 https://doi.org/10.1038/s41597-020-0495-6 https://doi.org/10.13026/nqsf-pc74 https://doi.org/10.13026/6sec-a640
- Parameters:
db_dir (path-like) – Storage path of the database. If not specified, data will be fetched from Physionet.
working_dir (path-like, optional) – Working directory, to store intermediate files and log files.
verbose (int, default 1) – Level of logging verbosity.
feature_db_dir (path-like, optional) – Whether to include the feature database (the PTB-XL+ database).
kwargs (dict, optional) – Auxilliary key word arguments.
- property database_info: DataBaseInfo#
The
DataBaseInfo
object of the database.
- load_ann(rec: str | int, with_interpretation: bool = False) Dict[str, float | Dict[str, Any]] [source]#
Load the annotation (the “scp_codes” field) of a record.
- Parameters:
- Returns:
ann – The annotation of the record, of the form
{statement: likelihood}
. Ifwith_interpretation
isTrue
, the form is{statement: {"likelihood": likelihood, ...}}
, where...
are other information of the statement.- Return type: