rikai.contrib.datasets package

Submodules

rikai.contrib.datasets.coco module

Convert Coco dataset into Rikai format.

https://cocodataset.org/#home

rikai.contrib.datasets.coco.convert(spark: SparkSession, dataset_root: str, limit: int = 0, asset_dir: Optional[str] = None) DataFrame

Convert a Coco Dataset into Rikai dataset.

This function expects the COCO datasets are stored in directory with the following structure:

  • dataset
    • annotations - captions_train2017.json - instances_train2017.json - …

    • train2017

    • val2017

    • test2017

Parameters
  • spark (SparkSession) – A live spark session

  • dataset_root (str) – The directory of dataset

  • limit (int, optional) – The number of images of each split to be converted.

  • asset_dir (str, optional) – The asset directory to store images, can be a s3 directory.

Returns

Returns a Spark DataFrame

Return type

DataFrame

Module contents

Convert common datasets into Rikai formats