torch_ecg.utils.stratified_train_test_split¶
- torch_ecg.utils.stratified_train_test_split(df: DataFrame, stratified_cols: Sequence[str], test_ratio: float = 0.2, reset_index: bool = False) Tuple[DataFrame, DataFrame] [source]¶
Perform stratified train-test split on the dataframe.
For example, if one has a dataframe with columns sex, nationality, etc., assuming sex includes male, female; nationality includes Chinese, American, and sets stratified_cols = [“sex”, “nationality”] with test_ratio = 0.2, then approximately 20% of the male and 20% of the female subjects will be put into the test set, and at the same time, approximately 20% of the Chinese and 20% of the Americans lie in the test set as well.
- Parameters:
df (pandas.DataFrame) – The dataframe to be split.
stratified_cols (Sequence[str]) – Columns to be stratified, assuming each column is a categorical variable. Each class in any of the columns will be split into train and test sets with an approximate ratio of test_ratio.
test_ratio (float, default 0.2) – Ratio of test set to the whole dataframe.
reset_index (bool, default False) – Whether to reset the index of the dataframes.
- Returns:
df_train (pandas.DataFrame) – The dataframe of the train set.
df_test (pandas.DataFrame) – The dataframe of the test set.