Text Classification

class ibm_watsonx_ai.foundation_models.classifications.TextClassification(credentials=None, project_id=None, space_id=None, api_client=None)[source]

Bases: WMLResource

Instantiate the text classification service.

Parameters:
  • credentials (Credentials, optional) – credentials to the watsonx.ai instance

  • project_id (str, optional) – ID of the project, defaults to None

  • space_id (str, optional) – ID of the space, defaults to None

  • api_client (APIClient, optional) – initialized APIClient object with a set project ID or space ID. If passed, credentials and project_id/space_id are not required, defaults to None

Raises:

InvalidMultipleArguments – raised when neither api_client nor credentials alongside space_id or project_id are provided

from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models.classifications import TextClassification

text_classification = TextClassification(
    credentials=Credentials(
        api_key = IAM_API_KEY,
        url = "https://us-south.ml.cloud.ibm.com"),
    project_id="*****"
    )
cancel_job(classification_job_id)[source]

Cancel a text classification job.

Parameters:

classification_job_id (str) – ID of text classification job

Returns:

“SUCCESS” if the cancel succeeds

Return type:

str

Example:

text_classification.cancel_job(classification_job_id="<CLASSIFICATION_JOB_ID>")
delete_job(classification_job_id)[source]

Delete a text classification job.

Parameters:

classification_job_id (str) – ID of text classification job

Returns:

“SUCCESS” if the deletion succeeds

Return type:

str

Example:

text_classification.delete_job(classification_job_id="<CLASSIFICATION_JOB_ID>")
get_job_details(classification_job_id=None, limit=None)[source]

Return text classification job details. If classification_job_id is None, return the details of all text classification jobs.

Parameters:
  • classification_job_id (str, optional) – ID of the text classification job, defaults to None

  • limit (int, optional) – limit number of fetched records, defaults to None

Returns:

details of the text classification job

Return type:

dict

Example:

text_classification.get_job_details(classification_job_id="<CLASSIFICATION_JOB_ID>")
classmethod get_job_id(classification_details)[source]

Get the unique ID of a stored classification request.

Parameters:

classification_details (dict) – metadata of the stored classification

Returns:

unique ID of the stored clasification request

Return type:

str

Example:

classification_details = text_classification.run_job(...)
classification_job_id = text_classification.get_job_id(classification_details)
get_results(classification_job_id)[source]

Get the text classification results.

Parameters:

classification_job_id (str) – ID of text classification job

Returns:

text classification job results

Return type:

dict

Example:

results = text_classification.get_results(classification_job_id="<CLASSIFICATION_JOB_ID>")
get_status(classification_job_id)[source]

Get the text classification status.

Parameters:

classification_job_id (str) – ID of text classification job

Returns:

text classification job status, possible values: [submitted, uploading, running, downloading, downloaded, completed, failed]

Return type:

str

Example:

status = text_classification.get_status(classification_job_id="<CLASSIFICATION_JOB_ID>")
list_jobs(limit=None)[source]

List text classification jobs. If limit is None, all jobs will be listed.

Parameters:

limit (int, optional) – limit number of fetched records, defaults to None

Returns:

text classification jobs information as a pandas DataFrame

Return type:

pandas.DataFrame

Example:

text_classification.list_jobs()
run_job(document_reference, parameters, custom=None)[source]

Start a request to classify text in the document.

Parameters:
  • document_reference (DataConnection) – reference to the document in the bucket from which text will be classified

  • parameters (TextClassificationParameters or dict) – the parameters for the text classification

  • custom (dict, optional) – user defined properties for the text classification, defaults to None

Returns:

text classification response

Return type:

dict

Example:

from ibm_watsonx_ai.helpers import DataConnection, S3Location
from ibm_watsonx_ai.foundation_models.schema import (
    TextClassificationParameters,
    ClassificationMode,
    OCRMode,
)

document_reference = DataConnection(
    connection_asset_id="<connection_id>",
    location=S3Location(bucket="<bucket_name>", path="path/to/file"),
)

response = text_classification.run_job(
    document_reference=document_reference,
    parameters=TextClassificationParameters(
        ocr_mode=OCRMode.ENABLED,
        classification_mode=ClassificationMode.EXACT,
        auto_rotation_correction=True,
        languages=["en"],
        semantic_config=TextClassificationSemanticConfig(
            schemas_merge_strategy=SchemasMergeStrategy.MERGE,
            schemas=[...],
        ),
    ),
    custom={},
)

Enums

class ibm_watsonx_ai.foundation_models.schema.SchemasMergeStrategy(value)[source]

Bases: StrEnum

Strategy for schemas merge.

MERGE = 'merge'
REPLACE = 'replace'
class ibm_watsonx_ai.foundation_models.schema.OCRMode(value)[source]

Bases: StrEnum

DISABLED = 'disabled'
ENABLED = 'enabled'
FORCED = 'forced'
class ibm_watsonx_ai.foundation_models.schema.ClassificationMode(value)[source]

Bases: StrEnum

BINARY = 'binary'
EXACT = 'exact'
class ibm_watsonx_ai.foundation_models.schema.TextClassificationSemanticConfig(schemas_merge_strategy=None, schemas=None)[source]

Bases: BaseSchema

Semantic configuration for text classification.

Parameters:
  • schemas_merge_strategy (SchemasMergeStrategy, optional) – strategy for schemas merge

  • schemas (list[dict], optional) – schemas

schemas = None
schemas_merge_strategy = None
class ibm_watsonx_ai.foundation_models.schema.TextClassificationParameters(ocr_mode=None, classification_mode=None, auto_rotation_correction=None, languages=None, semantic_config=None)[source]

Bases: BaseSchema

Parameters used for text classification.

Parameters:
  • ocr_mode (OCRMode, optional) – whether OCR should be used when processing a document, an empty value allows the service to select the best option for your processing mode

  • classification_mode (ClassificationMode, optional) – classification mode, the value exact gives the exact schema name the document is classified to, the option binary only gives whether the document is classified to a known schema or not

  • auto_rotation_correction (bool, optional) – whether should the service attempt to fix a rotated page or image

  • languages (list[str], optional) – set of languages to be expected in the document, the language codes follow ISO 639 where possible, see the REST API documentation for the currently supported languages

  • semantic_config (TextClassificationSemanticConfig, optional) – additional configuration settings for the Semantic KVP model

auto_rotation_correction = None
classification_mode = None
languages = None
ocr_mode = None
semantic_config = None