FedShakespeare#
- class fl_sim.data_processing.FedShakespeare(datadir: Path | str | None = None, seed: int = 0, **extra_config: Any)[source]#
Bases:
FedNLPDataset
Federated Shakespeare dataset.
Shakespeare dataset is built from the collective works of William Shakespeare. This dataset is used to perform tasks of next character prediction. FedML [1] loaded data from TensorFlow Federated (TFF) shakespeare load_data API [2] and saved the unzipped data into hdf5 files.
Data partition is the same as TFF, with the following statistics.
DATASET
TRAIN CLIENTS
TRAIN EXAMPLES
TEST CLIENTS
TEST EXAMPLES
SHAKESPEARE
715
16,068
715
2356
Each client corresponds to a speaking role with at least two lines.
- Parameters:
datadir (Union[str, pathlib.Path], optional) – The directory to store the dataset. If
None
, use default directory.seed (int, default 0) – The random seed.
**extra_config (dict, optional) – Extra configurations.
References
- evaluate(probs: Tensor, truths: Tensor) Dict[str, float] [source]#
Evaluation using predictions and ground truth.
- Parameters:
probs (torch.Tensor) – Predicted probabilities.
truths (torch.Tensor) – Ground truth labels.
- Returns:
Evaluation results.
- Return type:
- get_dataloader(train_bs: int | None = None, test_bs: int | None = None, client_idx: int | None = None) Tuple[DataLoader, DataLoader] [source]#
Get local dataloader at client client_idx or get the global dataloader.
- Parameters:
train_bs (int, optional) – Batch size for training dataloader. If
None
, use default batch size.test_bs (int, optional) – Batch size for testing dataloader. If
None
, use default batch size.client_idx (int, optional) – Index of the client to get dataloader. If
None
, get the dataloader containing all data. Usually used for centralized training.
- Returns:
train_dl (
torch.utils.data.DataLoader
) – Training dataloader.test_dl (
torch.utils.data.DataLoader
) – Testing dataloader.
- preprocess(sentences: Sequence[str], max_seq_len: int | None = None) List[List[int]] [source]#
Preprocess a list of sentences.