FedShakespeare

FedShakespeare#

class fl_sim.data_processing.FedShakespeare(datadir: Path | str | None = None, seed: int = 0, **extra_config: Any)#

Bases: FedNLPDataset

Federated Shakespeare dataset.

Shakespeare dataset is built from the collective works of William Shakespeare. This dataset is used to perform tasks of next character prediction. FedML [1] loaded data from TensorFlow Federated (TFF) shakespeare load_data API [2] and saved the unzipped data into hdf5 files.

Data partition is the same as TFF, with the following statistics.

DATASET	TRAIN CLIENTS	TRAIN EXAMPLES	TEST CLIENTS	TEST EXAMPLES
SHAKESPEARE	715	16,068	715	2356

Each client corresponds to a speaking role with at least two lines.

Parameters:

datadir (Union[str, pathlib.Path], optional) – The directory to store the dataset. If None, use default directory.
seed (int, default 0) – The random seed.
**extra_config (dict, optional) – Extra configurations.

References

property candidate_models: Dict[str, Module]#: A set of candidate models.

char_to_id(char: str) → int[source]#: Convert a character to an integer index.

property doi: List[str]#: DOI(s) related to the dataset.

evaluate(probs: Tensor, truths: Tensor) → Dict[str, float][source]#

Evaluation using predictions and ground truth.

Parameters:

probs (torch.Tensor) – Predicted probabilities.
truths (torch.Tensor) – Ground truth labels.

Returns:

Evaluation results.

Return type:

Dict[str, float]

get_dataloader(train_bs: int | None = None, test_bs: int | None = None, client_idx: int | None = None) → Tuple[DataLoader, DataLoader][source]#

Get local dataloader at client client_idx or get the global dataloader.

Parameters:

train_bs (int, optional) – Batch size for training dataloader. If None, use default batch size.
test_bs (int, optional) – Batch size for testing dataloader. If None, use default batch size.
client_idx (int, optional) – Index of the client to get dataloader. If None, get the dataloader containing all data. Usually used for centralized training.

Returns:

train_dl (torch.utils.data.DataLoader) – Training dataloader.
test_dl (torch.utils.data.DataLoader) – Testing dataloader.

get_word_dict() → Dict[str, int][source]#: Get the word dictionary.

id_to_word(idx: int) → str[source]#: Convert an integer index to a character.

preprocess(sentences: Sequence[str], max_seq_len: int | None = None) → List[List[int]][source]#

Preprocess a list of sentences.

Parameters:

sentences (Sequence[str]) – List of sentences to be preprocessed.
max_seq_len (int, optional) – Maximum sequence length. If None, use default sequence length.

Returns:

List of tokenized sentences.

Return type:

List[List[int]]

property url: str#: URL for downloading the dataset.

view_sample(client_idx: int, sample_idx: int | None = None) → None[source]#

View a sample from the dataset.

Parameters:

client_idx (int) – Index of the client on which the sample is located.
sample_idx (int) – Index of the sample in the client.

Return type:

None

property words: List[str]#: Get the word list.

FedShakespeare

Contents

FedShakespeare#