CPSC2021¶
- class torch_ecg.databases.CPSC2021(db_dir: str | bytes | PathLike | None = None, working_dir: str | bytes | PathLike | None = None, verbose: int = 1, **kwargs: Any)[source]¶
Bases:
PhysioNetDataBase
The 4th China Physiological Signal Challenge 2021: Paroxysmal Atrial Fibrillation Events Detection from Dynamic ECG Recordings
ABOUT
source ECG data are recorded from 12-lead Holter or 3-lead wearable ECG monitoring devices
dataset provides variable-length ECG fragments extracted from lead I and lead II of the long-term source ECG data, each sampled at 200 Hz
AF event is limited to be no less than 5 heart beats
training set in the 1st stage consists of 730 records, extracted from the Holter records from 12 AF patients and 42 non-AF patients (usually including other abnormal and normal rhythms); training set in the 2nd stage consists of 706 records from 37 AF patients (18 PAF patients) and 14 non-AF patients
test set comprises data from the same source as the training set as well as DIFFERENT data source, which are NOT to be released at any point
annotations are standardized according to PhysioBank Annotations (Ref. [2] or
PhysioNetDataBase.helper()
), and include the beat annotations (R peak location and beat type), the rhythm annotations (rhythm change flag and rhythm type) and the diagnosis of the global rhythmclassification of a record is stored in corresponding .hea file, which can be accessed via the attribute comments of a wfdb Record obtained using
wfdb.rdheader()
,wfdb.rdrecord()
, andwfdb.rdsamp()
; beat annotations and rhythm annotations can be accessed using the attributes symbol, aux_note of awfdb
Annotation obtained usingwfdb.rdann()
, corresponding indices in the signal can be accessed via the attribute samplechallenge task:
clasification of rhythm types: non-AF rhythm (N), persistent AF rhythm (AFf) and paroxysmal AF rhythm (AFp)
locating of the onset and offset for any AF episode prediction
challenge metrics:
metrics (Ur, scoring matrix) for classification:
metric (Ue) for detecting onsets and offsets for AF events (episodes): +1 if the detected onset (or offset) is within ±1 beat of the annotated position, and +0.5 if within ±2 beats.
final score (U):
\[U = \dfrac{1}{N} \sum\limits_{i=1}^N \left( Ur_i + \dfrac{Ma_i}{\max\{Mr_i, Ma_i\}} \right)\]where \(N\) is the number of records, \(Ma\) is the number of annotated AF episodes, \(Mr\) is the number of predicted AF episodes.
Challenge official website [1]. Webpage of the database on PhysioNet [2].
Note
if an ECG record is classified as AFf, the provided onset and offset locations should be the first and last record points. If an ECG record is classified as N, the answer should be an empty list
it can be inferred from the classification scoring matrix that the punishment of false negatives of AFf is very heavy, while mixing-up of AFf and AFp is not punished
flag of atrial fibrillation and atrial flutter (“AFIB” and “AFL”) in annotated information are seemed as the same type when scoring the method
the 3 classes can coexist in ONE subject (not one record). For example, subject 61 has 6 records with label “N”, 1 with label “AFp”, and 2 with label “AFf”
rhythm change annotations (“(AFIB”, “(AFL”, “(N” in the aux_note field or “+” in the symbol field of the annotation files) are inserted 0.15s ahead of or behind (onsets or offset resp.) of corresponding R peaks.
some records are revised if there are heart beats of the AF episode or the pause between adjacent AF episodes less than 5. The id numbers of the revised records are summarized in the attached REVISED_RECORDS.
Usage
AF (event, fine) detection
References
Citation
10.13026/ksya-qw89
- Parameters:
db_dir (path-like, optional) – Storage path of the database. If not specified, data will be fetched from Physionet.
working_dir (path-like, optional) – Working directory, to store intermediate files and log files.
verbose (int, default 1) – Level of logging verbosity.
kwargs (dict, optional) – Auxilliary key word arguments
- property database_info: DataBaseInfo¶
The
DataBaseInfo
object of the database.
- gen_endpoint_score_mask(rec: str | int, bias: dict = {1: 1, 2: 0.5}, verbose: int | None = None) Tuple[ndarray, ndarray] [source]¶
Generate the scoring mask for the onsets and offsets of af episodes.
- Parameters:
rec (str or int) – Record name or index of the record in
all_records
.bias (dict, default {1: 1, 2: 0.5}) – Bias for the scoring of the onsets and offsets of af episodes. Keys are bias (with ±) in terms of number of rpeaks, and values are corresponding scores.
verbose (int, optional) – Verbosity level. If is None,
self.verbose
will be used.
- Returns:
onset_score_mask, offset_score_mask – 2-tuple of
ndarray
, which are the scoring mask for the onset and offsets predictions of af episodes.- Return type:
Tuple[numpy.ndarray]
Note
The onsets in af_intervals are 0.15s ahead of the corresponding R peaks, while the offsets in af_intervals are 0.15s behind the corresponding R peaks.
- get_absolute_path(rec: str | int, extension: str | None = None) Path [source]¶
Get the absolute path of the record.
- Parameters:
- Returns:
abs_path – Absolute path of the file.
- Return type:
- load_af_episodes(rec: str | int, ann: Annotation | None = None, sampfrom: int | None = None, sampto: int | None = None, keep_original: bool = False, fs: Real | None = None, fmt: str = 'intervals') List[List[int]] | ndarray [source]¶
Load the episodes of atrial fibrillation, in terms of intervals or mask.
- Parameters:
rec (str or int) – Record name or index of the record in
all_records
.ann (wfdb.Annotation, optional) – The wfdb Annotation of the record. If is None, corresponding annotation file will be read.
sampfrom (int, optional) – Start index of the AF episodes to be loaded. Not used when fmt is “c_intervals”.
sampto (int, optional) – End index of the AF episodes to be loaded. Not used when fmt is “c_intervals”.
keep_original (bool, default False) – If True, indices will keep the same with the annotation file, otherwise subtract sampfrom if specified. Valid only when fmt is not “c_intervals”.
fs (numbers.Real, optional) – If not None, positions of the loaded intervals or mask will be ajusted according to this sampling frequency. Otherwise, the sampling frequency of the record will be used.
fmt ({"intervals", "mask", "c_intervals"}, optional) – Format of the episodes of atrial fibrillation, by default “intervals”.
- Returns:
af_episodes – Episodes of atrial fibrillation, in terms of intervals or mask.
- Return type:
- load_ann(rec: str | int, field: str | None = None, sampfrom: int | None = None, sampto: int | None = None, **kwargs: Any) dict | ndarray | List[List[int]] | str [source]¶
Load annotations of the record.
- Parameters:
rec (str or int) – Record name or index of the record in
all_records
.field ({"rpeaks", "af_episodes", "label", "raw", "wfdb"}, optional) – Field of the annotation. If is None, all fields of the annotation will be returned in the form of a dict. If is “raw” or “wfdb”, then the corresponding wfdb “Annotation” will be returned.
sampfrom (int, optional) – Start index of the annotation to be loaded.
sampto (int, optional) – End index of the annotation to be loaded.
kwargs (dict) –
Key word arguments for functions loading rpeaks, af_episodes, and label respectively, including:
fs: int, optional, the resampling frequency
fmt: str, format of af_episodes, or format of label, for more details, ref. corresponding functions.
Used only when field is specified (not None).
- Returns:
ann – Annotaton of the record.
- Return type:
dict or list or numpy.ndarray or str
- load_label(rec: str | int, ann: Annotation | None = None, sampfrom: int | None = None, sampto: int | None = None, fmt: str = 'a') str [source]¶
Load (classifying) label of the record.
The three classes are:
“non atrial fibrillation”,
“paroxysmal atrial fibrillation”,
“persistent atrial fibrillation”.
- Parameters:
rec (str or int) – Record name or index of the record in
all_records
.ann (wfdb.Annotation, optional) – Not used, to keep in accordance with other methods.
sampfrom (int, optional) – Not used, to keep in accordance with other methods.
sampto (int, optional) – Not used, to keep in accordance with other methods.
fmt (str, default "a") –
Format of the label, case in-sensitive, can be one of
”f”, “fullname”: the full name of the label
”a”, “abbr”, “abbrevation”: abbreviation for the label
”n”, “num”, “number”: class number of the label (in accordance with the settings of the offical class map)
- Returns:
label – Classifying label of the record.
- Return type:
- load_rpeak_indices(rec: str | int, ann: Annotation | None = None, sampfrom: int | None = None, sampto: int | None = None, keep_original: bool = False, valid_only: bool = True, fs: Real | None = None) ndarray [source]¶
Load position (in terms of samples) of rpeaks.
- Parameters:
rec (str or int) – Record name or index of the record in
all_records
.ann (wfdb.Annotation, optional) – The wfdb Annotation of the record. If is None, corresponding annotation file will be read.
sampfrom (int, optional) – Start index of the rpeak positions to be loaded.
sampto (int, optional) – End index of the rpeak positions to be loaded.
keep_original (bool, default False) – If True, indices will keep the same with the annotation file, otherwise subtract sampfrom if specified.
valid_only (bool, default True) – If True, only valid rpeaks will be returned, otherwise, all indices in the sample field of the annotation will be returned. Valid rpeaks are those with symbol in WFDB_Beat_Annotations. Symbols in WFDB_Non_Beat_Annotations are considered as invalid rpeaks
fs (numbers.Real, optional) – If not None, positions of the loaded rpeaks will be ajusted according to this sampling frequency.
- Returns:
rpeaks – Position (in terms of samples) of rpeaks of the record.
- Return type:
- load_rpeaks(rec: str | int, ann: Annotation | None = None, sampfrom: int | None = None, sampto: int | None = None, keep_original: bool = False, valid_only: bool = True, fs: Real | None = None) ndarray [source]¶
Load position (in terms of samples) of rpeaks.
- Parameters:
rec (str or int) – Record name or index of the record in
all_records
.ann (wfdb.Annotation, optional) – The wfdb Annotation of the record. If is None, corresponding annotation file will be read.
sampfrom (int, optional) – Start index of the rpeak positions to be loaded.
sampto (int, optional) – End index of the rpeak positions to be loaded.
keep_original (bool, default False) – If True, indices will keep the same with the annotation file, otherwise subtract sampfrom if specified.
valid_only (bool, default True) – If True, only valid rpeaks will be returned, otherwise, all indices in the sample field of the annotation will be returned. Valid rpeaks are those with symbol in WFDB_Beat_Annotations. Symbols in WFDB_Non_Beat_Annotations are considered as invalid rpeaks
fs (numbers.Real, optional) – If not None, positions of the loaded rpeaks will be ajusted according to this sampling frequency.
- Returns:
rpeaks – Position (in terms of samples) of rpeaks of the record.
- Return type:
- plot(rec: str | int, data: ndarray | None = None, ann: Dict[str, ndarray] | None = None, ticks_granularity: int = 0, sampfrom: int | None = None, sampto: int | None = None, leads: str | int | List[str | int] | None = None, waves: Dict[str, Sequence[int]] | None = None, **kwargs) None [source]¶
Plot the signals of a record.
plot the signals of a record or external signals (units in μV), with metadata (labels, episodes of atrial fibrillation, etc.), possibly also along with wave delineations.
- Parameters:
rec (str or int) – Record name or index of the record in
all_records
.data (numpy.ndarray, optional) – (2-lead) ECG signal to plot. Should be of the format “channel_first”, and compatible with leads. If given, data of rec will not be used. This is useful when plotting filtered data.
ann (dict, optional) – Annotations for data. Ignored if data is None.
ticks_granularity (int, default 0) – Granularity to plot axis ticks, the higher the more ticks. 0 (no ticks) –> 1 (major ticks) –> 2 (major + minor ticks)
sampfrom (int, optional) – Start index of the data to plot.
sampto (int, optional) – End index of the data to plot.
leads (str or List[str], optional) – Names of the leads to plot.
waves (dict, optional) – Indices of the wave critical points, including “p_onsets”, “p_peaks”, “p_offsets”, “q_onsets”, “q_peaks”, “r_peaks”, “s_peaks”, “s_offsets”, “t_onsets”, “t_peaks”, “t_offsets”
kwargs (dict, optional) – Additional keyword arguments to pass to
matplotlib.pyplot.plot()
.
TODO
Slice too long records, and plot separately for each segment.
Plot waves using
axvspan()
.
Note
Locator of
plt
has default MAXTICKS of 1000. If not modifying this number, at most 40 seconds of signal could be plotted once.Raw data usually have very severe baseline drifts, hence the isoelectric line is not plotted.
Contributors: Jeethan, and WEN Hao