rikai package

Subpackages

Submodules

rikai.conf module

This is a temporary mechanism to wrap around pandas option machinery. We’ll need a permanent solution later on for two main reasons: 1. Users can accidentally clear options via pandas api 2. We need better type handling to help bridge jvm-python communication (GH134)

rikai.exceptions module

exception rikai.exceptions.ColumnNotFoundError

Bases: OSError

rikai.io module

rikai.io.copy(source: str, dest: str) str

Copy a file from source to destination, and return the URI of the copied file.

Parameters
  • source (str) – The source URI to copy from

  • dest (str) – The destination uri or the destination directory. If dest is a URI ends with a “/”, it represents a directory.

Returns

Return the URI of destination.

Return type

str

rikai.io.exists(uri: Union[str, Path], http_auth: Optional[Union[AuthBase, Tuple[str, str]]] = None, http_headers: Optional[Dict] = None) bool

Returns True if the URI/file exists.

rikai.io.open_output_stream(uri: str) BinaryIO
rikai.io.open_uri(uri: Union[str, Path], mode: str = 'rb', http_auth: Optional[Union[AuthBase, Tuple[str, str]]] = None, http_headers: Optional[Dict] = None) IO

Open URI for read.

It supports the following URI pattens:

  • File System: /path/to/file or file:///path/to/file

  • AWS S3: s3://

  • Google Cloud Storage: gs://

  • Http(s): http:// or https://

Parameters
  • uri (str or Path) – URI of the object

  • mode (str) – the file model to open an URI

  • http_auth (requests.auth.AuthBase or a tuple of (user, pass), optional) – Http credentials / auth provider when downloading via http(s) protocols.

  • http_headers (Dict, optional) – Http headers.

Returns

A file-like object for sequential read.

Return type

File

rikai.logging module

rikai.mixin module

Mixins

class rikai.mixin.Asset(data: Optional[bytes] = None, uri: Optional[Union[str, Path]] = None)

Bases: ABC

cloud asset Mixin.

Rikai uses asset to store certain blob on the cloud storage, to facilitate the functionality like fast query, example inspections, and etc.

An asset is also a cell in a DataFrame for analytics. It offers both fast query on columnar format and easy tooling to access the actual data.

data

Embedded data

Type

bytes, optional

uri

URI of the external storage.

Type

str

property is_embedded: bool

Returns True if this Asset has embedded data.

open(mode='rb') BinaryIO

Open the asset and returned as random-accessible file object.

class rikai.mixin.Displayable

Bases: ABC

Mixin for notebook viz

abstract display(**kwargs) IPython.display.DisplayObject

Return an IPython.display.DisplayObject

class rikai.mixin.Drawable

Bases: ABC

Mixin for a class that is drawable

class rikai.mixin.Pretrained

Bases: ABC

Mixin for pretrained model

abstract pretrained_model() Any
class rikai.mixin.ToDict

Bases: ABC

ToDict Mixin

abstract to_dict() dict
class rikai.mixin.ToNumpy

Bases: ABC

ToNumpy Mixin.

abstract to_numpy() ndarray

Returns the content as a numpy ndarray.

rikai.numpy module

This module makes numpy.ndarray inter-operatable with rikai from feature engineerings in Spark to be trained in Tensorflow and Pytorch.

>>> # Feature Engineering in Spark
>>> from rikai import numpy as np
>>> df = spark.createDataFrame([Row(mask=np.array([1, 2, 3, 4]))])
>>> df.write.format("rikai").save("s3://path/to/features")

When use the rikai data in training, the serialized numpy data will be automatically converted into the appropriate format, i.e., torch.Tensor in Pytorch:

>>> from rikai.pytorch.data import Dataset
>>> data_loader = Dataset("s3://path/to/features")
>>> next(data_loader)
{"mask": tensor([1, 2, 3])}
rikai.numpy.array(obj, *args, **kwargs) ndarray

Create an numpy array using the same API as numpy.array().

See also

numpy.array()

rikai.numpy.empty(shape, dtype=<class 'float'>, order='C') ndarray

Return an empty np.ndarray.

The returned array can be directly used in a Spark DataFrame.

See also

numpy.empty()

rikai.numpy.view(data: ndarray) ndarray

Create a Spark/Parquet compatible view for a numpy array.

Parameters

data (np.ndarray) – A raw numpy array

Returns

A Numpy array view that is compatible with Spark User Defined Type.

Return type

np.ndarray

Example

>>> import numpy as np
>>> from rikai.numpy import view
>>>
>>> arr = np.array([1, 2, 3], dtype=np.int64)
>>> df = spark.createDataFrame([Row(id=1, mask=view(arr))])
>>> df.write.format("rikai").save("s3://foo/bar")

rikai.viz module

class rikai.viz.Style(**kwarg)

Bases: Drawable

Styling a drawable-component.

Examples

>>> from rikai.viz import Style
>>> from rikai.types import Box2d, Image
...
>>> img = Image(uri="s3://....")
>>> bbox1, bbox2 = Box2d(1, 2, 3, 4), Box2d(3, 4, 5, 6)
>>> bbox_style = Style(color="yellow", width=4)
>>> image | bbox_style(bbox1) | bbox_style(bbox2)
class rikai.viz.Text(text: str, xy: Tuple[int, int], color: str = 'red')

Bases: Drawable

Render a Text

Parameters
  • text (str) – The text content to be rendered

  • xy (Tuple[int, int]) – The location to render the text

  • color (str, optional) – The RGB color string to render the text

Module contents

Rikai Feature Store