plismbench.engine.extract.extract_from_png module#

Stream PLISM tiles dataset and extract features on-the-fly for a given model.

plismbench.engine.extract.extract_from_png.collate(batch: list[dict[str, str | Image.Image]], transform: Callable[[np.ndarray], torch.Tensor]) tuple[list[str], list[str], torch.Tensor][source]#

Return slide ids, tile ids and transformed images.

Parameters:
  • batch (list[dict[str, str | Image.Image]],) – List of length batch_size made of dictionnaries. Each dictionnary is a single input with keys: ‘slide_id’, ‘tile_id’ and ‘png’. The image is a PIL.Image.Image with type unit8 (0-255)

  • transform (collections.abc.Callable[[numpy.ndarray], torch.Tensor]) – Transform function taking numpy.ndarray image as inputs. Prior to calling this transform function, conversion from a PIL.Image.Image to an array is performed.

Returns:

output – A tuple made of slides ids, tiles ids and transformed input images.

Return type:

tuple[list[str], list[str], torch.Tensor]

plismbench.engine.extract.extract_from_png.run_extract_streaming(feature_extractor_name: str, batch_size: int, device: int, export_dir: Path, overwrite: bool = False) None[source]#

Run features extraction with streaming.