CPSC2020¶
- class torch_ecg.databases.CPSC2020(db_dir: str | bytes | PathLike | None = None, working_dir: str | bytes | PathLike | None = None, verbose: int = 1, **kwargs: Any)[source]¶
Bases:
CPSCDataBase
The 3rd China Physiological Signal Challenge 2020: Searching for Premature Ventricular Contraction (PVC) and Supraventricular Premature Beat (SPB) from Long-term ECGs
ABOUT
training data consists of 10 single-lead ECG recordings collected from arrhythmia patients, each of the recording last for about 24 hours
data and annotations are stored in v5 .mat files
A02, A03, A08 are patient with atrial fibrillation
sampling frequency = 400 Hz
Detailed information:
rec
?AF
Length(h)
# N beats
# V beats
# S beats
# Total beats
A01
No
25.89
109,062
0
24
109,086
A02
Yes
22.83
98,936
4,554
0
103,490
A03
Yes
24.70
137,249
382
0
137,631
A04
No
24.51
77,812
19,024
3,466
100,302
A05
No
23.57
94,614
1
25
94,640
A06
No
24.59
77,621
0
6
77,627
A07
No
23.11
73,325
15,150
3,481
91,956
A08
Yes
25.46
115,518
2,793
0
118,311
A09
No
25.84
88,229
2
1,462
89,693
A10
No
23.64
72,821
169
9,071
82,061
challenging factors for accurate detection of SPB and PVC: amplitude variation; morphological variation; noise
Challenge official website [1].
Note
the records can roughly be classified into 4 groups:
N
A01, A03, A05, A06
V
A02, A08
S
A09, A10
VS
A04, A07
as premature beats and atrial fibrillation can co-exists (via the following code, and data from CINC2020), the situation becomes more complicated.
>>> from utils.scoring_aux_data import dx_cooccurrence_all >>> dx_cooccurrence_all.loc["AF", ["PAC","PVC","SVPB","VPB"]] PAC 20 PVC 19 SVPB 4 VPB 20 Name: AF, dtype: int64
this could also be seen from this dataset, via the following code as an example:
>>> from data_reader import CPSC2020Reader as CR >>> db_dir = "/media/cfs/wenhao71/data/CPSC2020/TrainingSet/" >>> dr = CR(db_dir) >>> rec = dr.all_records[1] >>> dr.plot(rec, sampfrom=0, sampto=4000, ticks_granularity=2)
PVC and SPB can also co-exist, as illustrated via the following code (from CINC2020):
>>> from utils.scoring_aux_data import dx_cooccurrence_all >>> dx_cooccurrence_all.loc[["PVC","VPB"], ["PAC","SVPB",]] PAC SVPB PVC 14 1 VPB 27 0 and also from the following code: >>> for rec in dr.all_records: >>> ann = dr.load_ann(rec) >>> spb = ann["SPB_indices"] >>> pvc = ann["PVC_indices"] >>> if len(np.diff(spb)) > 0: >>> print(f"{rec}: min dist among SPB = {np.min(np.diff(spb))}") >>> if len(np.diff(pvc)) > 0: >>> print(f"{rec}: min dist among PVC = {np.min(np.diff(pvc))}") >>> diff = [s-p for s,p in product(spb, pvc)] >>> if len(diff) > 0: >>> print(f"{rec}: min dist between SPB and PVC = {np.min(np.abs(diff))}") A01: min dist among SPB = 630 A02: min dist among SPB = 696 A02: min dist among PVC = 87 A02: min dist between SPB and PVC = 562 A03: min dist among SPB = 7044 A03: min dist among PVC = 151 A03: min dist between SPB and PVC = 3750 A04: min dist among SPB = 175 A04: min dist among PVC = 156 A04: min dist between SPB and PVC = 178 A05: min dist among SPB = 182 A05: min dist between SPB and PVC = 22320 A06: min dist among SPB = 455158 A07: min dist among SPB = 603 A07: min dist among PVC = 153 A07: min dist between SPB and PVC = 257 A08: min dist among SPB = 2903029 A08: min dist among PVC = 106 A08: min dist between SPB and PVC = 350 A09: min dist among SPB = 180 A09: min dist among PVC = 7719290 A09: min dist between SPB and PVC = 1271 A10: min dist among SPB = 148 A10: min dist among PVC = 708 A10: min dist between SPB and PVC = 177
Usage
ECG arrhythmia (PVC, SPB) detection
Issues
currently, using xqrs as qrs detector, a lot more (more than 1000) rpeaks would be detected for A02, A07, A08, which might be caused by motion artefacts (or AF?); a lot less (more than 1000) rpeaks would be detected for A04. numeric details are as follows:
rec
?AF
# beats by xqrs
# Total beats
A01
No
109,502
109,086
A02
Yes
119,562
103,490
A03
Yes
135,912
137,631
A04
No
92,746
100,302
A05
No
94,674
94,640
A06
No
77,955
77,627
A07
No
98,390
91,956
A08
Yes
126,908
118,311
A09
No
89,972
89,693
A10
No
83,509
82,061
(fixed by an official update) A04 has duplicate “PVC_indices” (13534856,27147621,35141190 all appear twice): before correction of load_ann
>>> from collections import Counter >>> db_dir = "/mnt/wenhao71/data/CPSC2020/TrainingSet/" >>> data_gen = CPSC2020Reader(db_dir=db_dir,working_dir=db_dir) >>> rec = 4 >>> ann = data_gen.load_ann(rec) >>> Counter(ann["PVC_indices"]).most_common()[:4] [(13534856, 2), (27147621, 2), (35141190, 2), (848, 1)]
when extracting morphological features using augmented rpeaks for A04,
RuntimeWarning: invalid value encountered in double_scalars
would raise for
\[R\_value = (R\_value - y_min) / (y\_max - y\_min)\]and for
\[y\_values[n] = (y\_values[n] - y\_min) / (y\_max - y\_min).\]This is caused by the 13882273-th sample, which is contained in “PVC_indices”, however, whether it is a PVC beat, or just motion artefact, is in doubt!
References
Citation
10.1166/jmihi.2020.3289
- Parameters:
db_dir (path-like, optional) – Storage path of the database. If not specified, data will be fetched from Physionet.
working_dir (path-like, optional) – Working directory, to store intermediate files and log files.
verbose (int, default 1) – Level of logging verbosity.
kwargs (dict, optional) – Auxilliary key word arguments
- property database_info: DataBaseInfo¶
The
DataBaseInfo
object of the database.
- get_absolute_path(rec: str | int, extension: str | None = None, ann: bool = False) Path [source]¶
Get the absolute path of the record rec.
- Parameters:
- Returns:
abs_path – Absolute path of the file.
- Return type:
- load_ann(rec: int | str, sampfrom: int | None = None, sampto: int | None = None) Dict[str, ndarray] [source]¶
Load the annotations of the record rec.
- Parameters:
- Returns:
ann – Annotation dictionary with items (
ndarray
) “SPB_indices” and “PVC_indices”, which record the indices of SPBs and PVCs.- Return type:
- load_data(rec: int | str, sampfrom: int | None = None, sampto: int | None = None, data_format: str = 'channel_first', units: str = 'mV', fs: Real | None = None, return_fs: bool = False) ndarray | Tuple[ndarray, Real] [source]¶
Load the ECG data of the record rec.
- Parameters:
rec (str or int) – Record name or index of the record in
all_records
.sampfrom (int, optional) – Start index of the data to be loaded.
sampto (int, optional) – End index of the data to be loaded.
data_format (str, default "channel_first") – Format of the ECG data, “channel_last” (alias “lead_last”), or “channel_first” (alias “lead_first”), or “flat” (alias “plain”).
units (str or None, default "mV") – Units of the output signal, can also be “μV” (with aliases “uV”, “muV”).
fs (numbers.Real, optional) – Frequency of the output signal. if not None, the loaded data will be resampled to this frequency; if None, the loaded data will be returned as is.
return_fs (bool, default False) – Whether to return the sampling frequency of the output signal.
- Returns:
data (numpy.ndarray) – The loaded ECG data.
data_fs (numbers.Real, optional) – Sampling frequency of the output signal. Returned if return_fs is True.
- locate_premature_beats(rec: int | str, premature_type: str | None = None, window: Real = 10, sampfrom: int | None = None, sampto: int | None = None) List[List[int]] [source]¶
Locate the sample indices of premature beats in a record.
The locations are in the form of a list of lists, and each list contains the interval of sample indices of premature beats.
- Parameters:
rec (str or int) – Record name or index of the record in
all_records
.premature_type (str, optional) – Premature beat type, can be one of “SPB”, “PVC”. If not specified, both SPBs and PVCs will be located.
window (numbers.Real, default 10) – Window length of each premature beat, with units in seconds.
sampfrom (int, optional) – Start index of the premature beats to locate.
sampto (int, optional) – End index of the premature beats to locate.
- Returns:
premature_intervals – List of intervals of premature beats.
- Return type:
- plot(rec: int | str, data: ndarray | None = None, ann: Dict[str, ndarray] | None = None, ticks_granularity: int = 0, sampfrom: int | None = None, sampto: int | None = None, rpeak_inds: Sequence[int] | ndarray | None = None) None [source]¶
Plot the ECG signal of a record.
- Parameters:
rec (str or int) – Record name or index of the record in
all_records
.data (numpy.ndarray, optional) – ECG signal to plot. If given, data of rec will not be used. This is useful when plotting filtered data.
ann (dict, optional) – Annotations for data, covering those from annotation files, with items “SPB_indices”, “PVC_indices”, each of which is a
ndarray
. Ignored if data is None.ticks_granularity (int, default 0) – Granularity to plot axis ticks, the higher the more ticks. 0 (no ticks) –> 1 (major ticks) –> 2 (major + minor ticks)
sampfrom (int, optional) – Start index of the data to plot.
sampto (int, optional) – End index of the data to plot.
rpeak_inds (array_like, optional) – Indices of R peaks. If data is None, then indices should be the absolute indices in the record.