Text Extractions

class ibm_watsonx_ai.foundation_models.extractions.TextExtractions(credentials=None, project_id=None, space_id=None, api_client=None)[source]

Bases: WMLResource

Instantiate the Text Extraction service.

Parameters:
  • credentials (ibm_watsonx_ai.Credentials | None, optional) – credentials to Watson Machine Learning instance

  • project_id (str | None, optional) – ID of the Watson Studio project, defaults to None

  • space_id (str | None, optional) – ID of the Watson Studio space, defaults to None

  • api_client (APIClient | None, optional) – Initialized APIClient object with set project or space ID. If passed, credentials and project_id/space_id are not required, defaults to None

Raises:
  • InvalidMultipleArguments – if space_id and project_id or credentials and api_client are provided simultaneously

  • WMLClientError – if CPD version is less than 5.0

 from ibm_watsonx_ai import Credentials
 from ibm_watsonx_ai.foundation_models.extractions import TextExtractions

extraction = TextExtractions(
     credentials=Credentials(
                         api_key = "***",
                         url = "https://us-south.ml.cloud.ibm.com"),
     project_id="*****"
     )
delete_job(extraction_id)[source]

Delete text extraction job.

Returns:

Return “SUCCESS” if deletion succeed

Return type:

str

Example

extraction.delete(extraction_id="<extraction_id>")
static get_id(extraction_details)[source]

Get the unique ID of a stored extraction request.

Parameters:

extraction_details (dict) – metadata of the stored extraction

Returns:

unique ID of the stored extraction request

Return type:

str

Example

extraction_details = extraction.get_job_details(extraction_id)
extraction_id = extraction.get_id(extraction_details)
get_job_details(extraction_id=None, limit=None)[source]

Return text extraction job details. If extraction_id is None, return details of all text extraction jobs.

Parameters:
  • extraction_id (str | None, optional) – Id of text extraction job, defaults to None

  • limit (int | None, optional) – limit number of fetched records, defaults to None

Returns:

Text extraction job details

Return type:

dict

Example

extraction.get_job_details(extraction_id="<extraction_id>")
get_results_reference(extraction_id)[source]

Get DataConnection instance that is a reference to the results stored on COS.

Parameters:

extraction_id (str) – Id of text extraction job.

Returns:

Data Connection to text extraction job results location.

Return type:

DataConnection

Example

results_reference = extraction.get_results_reference(extraction_id="<extraction_id>")
list_jobs(limit=None)[source]

List text extraction jobs. If limit is None, all jobs will be listed.

Parameters:

limit (int | None, optional) – limit number of fetched records, defaults to None

Returns:

pandas DataFrame with text extraction jobs information

Return type:

pandas.DataFrame

Example

extraction.list_jobs()
run_job(document_reference, results_reference, steps=None)[source]

Start a request to extract text and metadata from document.

Parameters:
  • document_reference (DataConnection) – Reference to document in bucket from which text will be extracted

  • results_reference (DataConnection) – Reference to location in bucket where results will saved

  • steps (dict | None, optional) – The steps for the text extraction pipeline, defaults to None

Returns:

Raw response from server with text extraction job details

Return type:

dict

Example

from ibm_watsonx_ai.metanames import TextExtractionsMetaNames
from ibm_watsonx_ai.helpers import DataConnection, S3Location

document_reference = DataConnection(
    connection_asset_id="<connection_id>",
    location=S3Location(bucket="<bucket_name>", path="path/to/file"),
    )

results_reference = DataConnection(
    connection_asset_id="<connection_id>",
    location=S3Location(bucket="<bucket_name>", path="path/to/file"),
    )

response = extraction.run_job(
    document_reference=document_reference,
    results_reference=results_reference,
    steps={
        TextExtractionsMetaNames.OCR: {
            "process_image": True,
            "languages_list": ["en", "fr"],
        },
        TextExtractionsMetaNames.TABLE_PROCESSING: {"enabled": True},
    },
)

Enums

class metanames.TextExtractionsMetaNames[source]

Set of MetaNames for Text Extraction Steps.

Available MetaNames:

MetaName

Type

Required

Example value

OCR

dict

N

{'process_images': True, 'language_list': ['en']}

TABLE_PROCESSING

dict

N

{'enabled': True}

Note

For more details about Text Extraction Steps see https://cloud.ibm.com/apidocs/watsonx-ai