FedNLPDataset#
- class fl_sim.data_processing.FedNLPDataset(datadir: Path | str | None = None, seed: int = 0, **extra_config: Any)[source]#
Bases:
FedDataset
,ABC
Base class for all federated NLP datasets.
Methods that have to be implemented by subclasses:
get_dataloader
_preload
evaluate
get_word_dict
Properties that have to be implemented by subclasses:
url
candidate_models
doi
- Parameters:
datadir (Union[str, pathlib.Path], optional) – The directory to store the dataset. If
None
, use default directory.seed (int, default 0) – The random seed.
**extra_config (dict, optional) – Extra configurations.
- abstract get_dataloader(train_bs: int, test_bs: int, client_idx: int | None = None) Tuple[DataLoader, DataLoader] [source]#
Get dataloader for client client_idx or get global dataloader.
- load_partition_data(batch_size: int | None = None) tuple [source]#
Partition data into all local clients.
- Parameters:
batch_size (int, optional) – Batch size for dataloader. If
None
, use default batch size.- Returns:
- train_clients_num:
int
Number of training clients.
- train_clients_num:
- train_data_num:
int
Number of training data.
- train_data_num:
- test_data_num:
int
Number of testing data.
- test_data_num:
- train_data_global:
torch.utils.data.DataLoader
Global training dataloader.
- train_data_global:
- test_data_global:
torch.utils.data.DataLoader
Global testing dataloader.
- test_data_global:
- data_local_num_dict:
dict
Number of local training data for each client.
- data_local_num_dict:
- train_data_local_dict:
dict
Local training dataloader for each client.
- train_data_local_dict:
- test_data_local_dict:
dict
Local testing dataloader for each client.
- test_data_local_dict:
- vocab_len:
int
Length of the vocabulary.
- vocab_len:
- Return type:
- load_partition_data_distributed(process_id: int, batch_size: int | None = None) tuple [source]#
Get local dataloader at client process_id or get global dataloader.
- Parameters:
- Returns:
- train_clients_num:
int
Number of training clients.
- train_clients_num:
- train_data_num:
int
Number of training data.
- train_data_num:
- train_data_global:
torch.utils.data.DataLoader
or None Global training dataloader.
- train_data_global:
- test_data_global:
torch.utils.data.DataLoader
or None Global testing dataloader.
- test_data_global:
- local_data_num:
int
Number of local training data.
- local_data_num:
- train_data_local:
torch.utils.data.DataLoader
or None Local training dataloader.
- train_data_local:
- test_data_local:
torch.utils.data.DataLoader
or None Local testing dataloader.
- test_data_local:
- vocab_len:
int
Length of the vocabulary.
- vocab_len:
- Return type: