plismbench.engine.extract.extract_from_h5 module#

Download PLISM tiles dataset as h5 files and extract features for a given model.

class plismbench.engine.extract.extract_from_h5.H5Dataset(file_path: Path)[source]#

Bases: Dataset

Dataset wrapper iterating over a .h5 file content.

Parameters:

file_path (pathlib.Path) – pathlib.Path to the .h5 file.

plismbench.engine.extract.extract_from_h5.collate(batch: list[tuple[str, torch.Tensor]], transform: Callable[[np.ndarray], torch.Tensor]) tuple[list[str], torch.Tensor][source]#

Return tile ids and transformed images.

Parameters:
  • batch (list[dict[str, Any]]) – List of length batch_size made of tuples. Each tuple represents a tile_id and the corresponding image. The image is a torch.float32 tensor (between 0 and 1).

  • transform (collections.abc.Callable[[numpy.ndarray], torch.Tensor]) – Transform function taking numpy.ndarray image as inputs.

Returns:

output – A tuple made of tiles ids and transformed input images.

Return type:

tuple[list[str], torch.Tensor]

plismbench.engine.extract.extract_from_h5.get_dataloader(slide_h5_path: Path, transform: Callable[[np.ndarray], torch.Tensor], batch_size: int = 32, workers: int = 8) DataLoader[source]#

Get PLISM tiles dataset dataloader transformed with transform function.

Parameters:
  • slide_h5_path (pathlib.Path) – pathlib.Path to the .h5 containing tiles for a given slide.

  • transform (collections.abc.Callable[[numpy.ndarray], torch.Tensor]) – Transform function taking numpy.ndarray image as inputs.

  • batch_size (int = 32) – Batch size for features extraction.

  • workers (int = 8) – Number of workers to load images.

Returns:

dataloader – DataLoader returning (tile_ids, images). See collate function for details.

Return type:

DataLoader

plismbench.engine.extract.extract_from_h5.run_extract_h5(feature_extractor_name: str, batch_size: int, device: int, export_dir: Path, download_dir: Path, overwrite: bool = False, workers: int = 8) None[source]#

Run features extraction.