plismbench.engine.extract.extract_from_h5 module#
Download PLISM tiles dataset as h5 files and extract features for a given model.
- class plismbench.engine.extract.extract_from_h5.H5Dataset(file_path: Path)[source]#
Bases:
Dataset
Dataset wrapper iterating over a .h5 file content.
- Parameters:
file_path (pathlib.Path) – pathlib.Path to the .h5 file.
- plismbench.engine.extract.extract_from_h5.collate(batch: list[tuple[str, torch.Tensor]], transform: Callable[[np.ndarray], torch.Tensor]) tuple[list[str], torch.Tensor] [source]#
Return tile ids and transformed images.
- Parameters:
batch (list[dict[str, Any]]) – List of length
batch_size
made of tuples. Each tuple represents a tile_id and the corresponding image. The image is a torch.float32 tensor (between 0 and 1).transform (collections.abc.Callable[[numpy.ndarray], torch.Tensor]) – Transform function taking
numpy.ndarray
image as inputs.
- Returns:
output – A tuple made of tiles ids and transformed input images.
- Return type:
- plismbench.engine.extract.extract_from_h5.get_dataloader(slide_h5_path: Path, transform: Callable[[np.ndarray], torch.Tensor], batch_size: int = 32, workers: int = 8) DataLoader [source]#
Get PLISM tiles dataset dataloader transformed with
transform
function.- Parameters:
slide_h5_path (pathlib.Path) – pathlib.Path to the .h5 containing tiles for a given slide.
transform (collections.abc.Callable[[numpy.ndarray], torch.Tensor]) – Transform function taking
numpy.ndarray
image as inputs.batch_size (int = 32) – Batch size for features extraction.
workers (int = 8) – Number of workers to load images.
- Returns:
dataloader – DataLoader returning (tile_ids, images). See
collate
function for details.- Return type:
DataLoader