CPSC2020¶

Bases: CPSCDataBase

The 3rd China Physiological Signal Challenge 2020: Searching for Premature Ventricular Contraction (PVC) and Supraventricular Premature Beat (SPB) from Long-term ECGs

ABOUT

training data consists of 10 single-lead ECG recordings collected from arrhythmia patients, each of the recording last for about 24 hours
data and annotations are stored in v5 .mat files
A02, A03, A08 are patient with atrial fibrillation
sampling frequency = 400 Hz

Detailed information:

rec

?AF

Length(h)

# N beats

# V beats

# S beats

# Total beats

A01

No

25.89

109,062

0

24

109,086

A02

Yes

22.83

98,936

4,554

0

103,490

A03

Yes

24.70

137,249

382

0

137,631

A04

No

24.51

77,812

19,024

3,466

100,302

A05

No

23.57

94,614

1

25

94,640

A06

No

24.59

77,621

0

6

77,627

A07

No

23.11

73,325

15,150

3,481

91,956

A08

Yes

25.46

115,518

2,793

0

118,311

A09

No

25.84

88,229

2

1,462

89,693

A10

No

23.64

72,821

169

9,071

82,061

challenging factors for accurate detection of SPB and PVC: amplitude variation; morphological variation; noise
Challenge official website [1].

Note

the records can roughly be classified into 4 groups:

N

A01, A03, A05, A06

V

A02, A08

S

A09, A10

VS

A04, A07

as premature beats and atrial fibrillation can co-exists (via the following code, and data from CINC2020), the situation becomes more complicated.

>>> from utils.scoring_aux_data import dx_cooccurrence_all
>>> dx_cooccurrence_all.loc["AF", ["PAC","PVC","SVPB","VPB"]]
PAC     20
PVC     19
SVPB     4
VPB     20
Name: AF, dtype: int64

this could also be seen from this dataset, via the following code as an example:

>>> from data_reader import CPSC2020Reader as CR
>>> db_dir = "/media/cfs/wenhao71/data/CPSC2020/TrainingSet/"
>>> dr = CR(db_dir)
>>> rec = dr.all_records[1]
>>> dr.plot(rec, sampfrom=0, sampto=4000, ticks_granularity=2)

PVC and SPB can also co-exist, as illustrated via the following code (from CINC2020):

>>> from utils.scoring_aux_data import dx_cooccurrence_all
>>> dx_cooccurrence_all.loc[["PVC","VPB"], ["PAC","SVPB",]]
PAC SVPB
PVC 14 1
VPB 27 0
and also from the following code:
>>> for rec in dr.all_records:
>>>     ann = dr.load_ann(rec)
>>>     spb = ann["SPB_indices"]
>>>     pvc = ann["PVC_indices"]
>>>     if len(np.diff(spb)) > 0:
>>>         print(f"{rec}: min dist among SPB = {np.min(np.diff(spb))}")
>>>     if len(np.diff(pvc)) > 0:
>>>         print(f"{rec}: min dist among PVC = {np.min(np.diff(pvc))}")
>>>     diff = [s-p for s,p in product(spb, pvc)]
>>>     if len(diff) > 0:
>>>         print(f"{rec}: min dist between SPB and PVC = {np.min(np.abs(diff))}")
A01: min dist among SPB = 630
A02: min dist among SPB = 696
A02: min dist among PVC = 87
A02: min dist between SPB and PVC = 562
A03: min dist among SPB = 7044
A03: min dist among PVC = 151
A03: min dist between SPB and PVC = 3750
A04: min dist among SPB = 175
A04: min dist among PVC = 156
A04: min dist between SPB and PVC = 178
A05: min dist among SPB = 182
A05: min dist between SPB and PVC = 22320
A06: min dist among SPB = 455158
A07: min dist among SPB = 603
A07: min dist among PVC = 153
A07: min dist between SPB and PVC = 257
A08: min dist among SPB = 2903029
A08: min dist among PVC = 106
A08: min dist between SPB and PVC = 350
A09: min dist among SPB = 180
A09: min dist among PVC = 7719290
A09: min dist between SPB and PVC = 1271
A10: min dist among SPB = 148
A10: min dist among PVC = 708
A10: min dist between SPB and PVC = 177

Usage

ECG arrhythmia (PVC, SPB) detection

Issues

currently, using xqrs as qrs detector, a lot more (more than 1000) rpeaks would be detected for A02, A07, A08, which might be caused by motion artefacts (or AF?); a lot less (more than 1000) rpeaks would be detected for A04. numeric details are as follows:

rec

?AF

# beats by xqrs

# Total beats

A01

No

109,502

109,086

A02

Yes

119,562

103,490

A03

Yes

135,912

137,631

A04

No

92,746

100,302

A05

No

94,674

94,640

A06

No

77,955

77,627

A07

No

98,390

91,956

A08

Yes

126,908

118,311

A09

No

89,972

89,693

A10

No

83,509

82,061

(fixed by an official update) A04 has duplicate “PVC_indices” (13534856,27147621,35141190 all appear twice): before correction of load_ann

>>> from collections import Counter
>>> db_dir = "/mnt/wenhao71/data/CPSC2020/TrainingSet/"
>>> data_gen = CPSC2020Reader(db_dir=db_dir,working_dir=db_dir)
>>> rec = 4
>>> ann = data_gen.load_ann(rec)
>>> Counter(ann["PVC_indices"]).most_common()[:4]
[(13534856, 2), (27147621, 2), (35141190, 2), (848, 1)]

when extracting morphological features using augmented rpeaks for A04,
```
RuntimeWarning: invalid value encountered in double_scalars
```
would raise for

\[R\_value = (R\_value - y_min) / (y\_max - y\_min)\]

and for

\[y\_values[n] = (y\_values[n] - y\_min) / (y\_max - y\_min).\]

This is caused by the 13882273-th sample, which is contained in “PVC_indices”, however, whether it is a PVC beat, or just motion artefact, is in doubt!

References

Citation

10.1166/jmihi.2020.3289

Parameters:

db_dir (path-like, optional) – Storage path of the database. If not specified, data will be fetched from Physionet.
working_dir (path-like, optional) – Working directory, to store intermediate files and log files.
verbose (int, default 1) – Level of logging verbosity.
kwargs (dict, optional) – Auxilliary key word arguments

property database_info: DataBaseInfo¶: The DataBaseInfo object of the database.

get_absolute_path(rec: str | int, extension: str | None = None, ann: bool = False) → Path[source]¶

Get the absolute path of the record rec.

Parameters:

rec (str or int) – Record name or index of the record in all_records.
extension (str, optional) – Extension of the file.
ann (bool, default False) – Whether to get the annotation file path or not.

Returns:

abs_path – Absolute path of the file.

Return type:

pathlib.Path

get_subject_id(rec: int | str) → int[source]¶

Attach a unique subject ID to the record.

Parameters:: rec (str or int) – Record name or index of the record in all_records.
Returns:: pid – the subject_id corr. to rec.
Return type:: int

load_ann(rec: int | str, sampfrom: int | None = None, sampto: int | None = None) → Dict[str, ndarray][source]¶

Load the annotations of the record rec.

Parameters:

rec (str or int) – Record name or index of the record in all_records.
sampfrom (int, optional) – Start index of the data to be loaded.
sampto (int, optional) – End index of the data to be loaded.

Returns:

ann – Annotation dictionary with items (ndarray) “SPB_indices” and “PVC_indices”, which record the indices of SPBs and PVCs.

Return type:

dict

Load the ECG data of the record rec.

Parameters:

rec (str or int) – Record name or index of the record in all_records.
sampfrom (int, optional) – Start index of the data to be loaded.
sampto (int, optional) – End index of the data to be loaded.
data_format (str, default "channel_first") – Format of the ECG data, “channel_last” (alias “lead_last”), or “channel_first” (alias “lead_first”), or “flat” (alias “plain”).
units (str or None, default "mV") – Units of the output signal, can also be “μV” (with aliases “uV”, “muV”).
fs (numbers.Real, optional) – Frequency of the output signal. if not None, the loaded data will be resampled to this frequency; if None, the loaded data will be returned as is.
return_fs (bool, default False) – Whether to return the sampling frequency of the output signal.

Returns:

data (numpy.ndarray) – The loaded ECG data.
data_fs (numbers.Real, optional) – Sampling frequency of the output signal. Returned if return_fs is True.

locate_premature_beats(rec: int | str, premature_type: str | None = None, window: Real = 10, sampfrom: int | None = None, sampto: int | None = None) → List[List[int]][source]¶

Locate the sample indices of premature beats in a record.

The locations are in the form of a list of lists, and each list contains the interval of sample indices of premature beats.

Parameters:

rec (str or int) – Record name or index of the record in all_records.
premature_type (str, optional) – Premature beat type, can be one of “SPB”, “PVC”. If not specified, both SPBs and PVCs will be located.
window (numbers.Real, default 10) – Window length of each premature beat, with units in seconds.
sampfrom (int, optional) – Start index of the premature beats to locate.
sampto (int, optional) – End index of the premature beats to locate.

Returns:

premature_intervals – List of intervals of premature beats.

Return type:

list

Plot the ECG signal of a record.

Parameters:

rec (str or int) – Record name or index of the record in all_records.
data (numpy.ndarray, optional) – ECG signal to plot. If given, data of rec will not be used. This is useful when plotting filtered data.
ann (dict, optional) – Annotations for data, covering those from annotation files, with items “SPB_indices”, “PVC_indices”, each of which is a ndarray. Ignored if data is None.
ticks_granularity (int, default 0) – Granularity to plot axis ticks, the higher the more ticks. 0 (no ticks) –> 1 (major ticks) –> 2 (major + minor ticks)
sampfrom (int, optional) – Start index of the data to plot.
sampto (int, optional) – End index of the data to plot.
rpeak_inds (array_like, optional) – Indices of R peaks. If data is None, then indices should be the absolute indices in the record.

train_test_split_rec(test_rec_num: int = 2) → Dict[str, List[str]][source]¶

Split the records into train set and test (val) set.

Parameters:: test_rec_num (int, default 2) – Number of records for the test (val) set.
Returns:: split_res – Split result dictionary, with items “train”, “test”, both of which are lists of record names.
Return type:: dict

property url: str¶: URL(s) for downloading the database.

N	A01, A03, A05, A06
V	A02, A08
S	A09, A10
VS	A04, A07