rikai.spark.sql.codegen package

Submodules

rikai.spark.sql.codegen.base module

class rikai.spark.sql.codegen.base.Registry

Bases: ABC

Base class of a Model Registry

abstract make_model_spec(raw_spec: ModelSpec)

Make a ModelSpec from the raw model spec

Parameters

spec (ModelSpec) –

resolve(raw_spec: ModelSpec)

Resolve a model from the raw model spec.

Parameters

spec (ModelSpec) –

rikai.spark.sql.codegen.dummy module

class rikai.spark.sql.codegen.dummy.DummyModelSpec(raw_spec: ModelSpec, validate: bool = True)

Bases: ModelSpec

load_model()

Load the model artifact specified in this spec

validate()

Validate model spec

Raises

SpecError – If the spec is not well-formatted.

class rikai.spark.sql.codegen.dummy.DummyRegistry

Bases: Registry

Dummy Model Registry without URI

make_model_spec(raw_spec: dict)

Make a ModelSpec from the raw model spec

Parameters

spec (ModelSpec) –

rikai.spark.sql.codegen.fs module

class rikai.spark.sql.codegen.fs.FileSystemRegistry

Bases: Registry

FileSystem-based Model Registry

make_model_spec(raw_spec: dict)

Make a ModelSpec from the raw model spec

Parameters

spec (ModelSpec) –

rikai.spark.sql.codegen.mlflow_logger module

Custom Mlflow model logger to make sure models have the right logging for Rikai SQL ML

class rikai.spark.sql.codegen.mlflow_logger.MlflowLogger(flavor: str)

Bases: object

An alternative model logger for use during training instead of the vanilla mlflow logger.

log_model(model: Any, artifact_path: str, schema: Optional[str] = None, registered_model_name: Optional[str] = None, customized_flavor: Optional[str] = None, model_type: Optional[str] = None, labels: Optional[dict] = None, **kwargs)

Convenience function to log the model with tags needed by rikai. This should be called during training when the model is produced.

Parameters
  • model (Any) – The model artifact object

  • artifact_path (str) – The relative (to the run) artifact path

  • schema (str) – Output schema (pyspark DataType)

  • registered_model_name (str, default None) – Model name in the mlflow model registry

  • model_type (str) – Model type

  • kwargs (dict) – Passed to mlflow.<flavor>.log_model

Examples

import rikai.mlflow

# Log PyTorch model
with mlflow.start_run() as run:

    # Training loop
    # ...

    # Assume `model` is the trained model from the training loop
    rikai.mlflow.pytorch.log_model(model, "model",
            model_type="ssd",
            registered_model_name="MyPytorchModel")

For more details see mlflow docs.

rikai.spark.sql.codegen.mlflow_registry module

rikai.spark.sql.codegen.pytorch module

rikai.spark.sql.codegen.pytorch.generate_inference_func(payload: ModelSpec)
rikai.spark.sql.codegen.pytorch.generate_udf(payload: ModelSpec)
rikai.spark.sql.codegen.pytorch.load_model_from_uri(uri: str)
rikai.spark.sql.codegen.pytorch.move_tensor_to_device(data, device)

rikai.spark.sql.codegen.sklearn module

rikai.spark.sql.codegen.sklearn.generate_udf(spec: ModelSpec)

Construct a UDF to run sklearn model.

Parameters

spec (ModelSpec) – the model specifications object

Return type

A Spark Pandas UDF.

rikai.spark.sql.codegen.tensorflow module

rikai.spark.sql.codegen.testing module

class rikai.spark.sql.codegen.testing.TestModel(name: str, uri: str, options: Dict[str, str])

Bases: object

codegen(spark: SparkSession, temporary: bool)

Codegen for TestModel

Parameters
  • spark (SparkSession) – SparkSession

  • temporary (bool) – Whether this model is generate temporary functions.

Module contents

rikai.spark.sql.codegen.command_from_spec(registry_class: str, row_spec: dict)