Text Extractions¶
- class ibm_watsonx_ai.foundation_models.extractions.TextExtractions(credentials=None, project_id=None, space_id=None, api_client=None)[source]¶
Bases:
WMLResource
Instantiate the Text Extraction service.
- Parameters:
credentials (ibm_watsonx_ai.Credentials | None, optional) – credentials to Watson Machine Learning instance
project_id (str | None, optional) – ID of the Watson Studio project, defaults to None
space_id (str | None, optional) – ID of the Watson Studio space, defaults to None
api_client (APIClient | None, optional) – Initialized APIClient object with set project or space ID. If passed,
credentials
andproject_id
/space_id
are not required, defaults to None
- Raises:
InvalidMultipleArguments – if space_id and project_id or credentials and api_client are provided simultaneously
WMLClientError – if CPD version is less than 5.0
from ibm_watsonx_ai import Credentials from ibm_watsonx_ai.foundation_models.extractions import TextExtractions extraction = TextExtractions( credentials=Credentials( api_key = "***", url = "https://us-south.ml.cloud.ibm.com"), project_id="*****" )
- delete_job(extraction_id)[source]¶
Delete text extraction job.
- Returns:
Return “SUCCESS” if deletion succeed
- Return type:
str
Example
extraction.delete(extraction_id="<extraction_id>")
- static get_id(extraction_details)[source]¶
Get the unique ID of a stored extraction request.
- Parameters:
extraction_details (dict) – metadata of the stored extraction
- Returns:
unique ID of the stored extraction request
- Return type:
str
Example
extraction_details = extraction.get_job_details(extraction_id) extraction_id = extraction.get_id(extraction_details)
- get_job_details(extraction_id=None, limit=None)[source]¶
Return text extraction job details. If extraction_id is None, return details of all text extraction jobs.
- Parameters:
extraction_id (str | None, optional) – Id of text extraction job, defaults to None
limit (int | None, optional) – limit number of fetched records, defaults to None
- Returns:
Text extraction job details
- Return type:
dict
Example
extraction.get_job_details(extraction_id="<extraction_id>")
- get_results_reference(extraction_id)[source]¶
Get DataConnection instance that is a reference to the results stored on COS.
- Parameters:
extraction_id (str) – Id of text extraction job.
- Returns:
Data Connection to text extraction job results location.
- Return type:
Example
results_reference = extraction.get_results_reference(extraction_id="<extraction_id>")
- list_jobs(limit=None)[source]¶
List text extraction jobs. If limit is None, all jobs will be listed.
- Parameters:
limit (int | None, optional) – limit number of fetched records, defaults to None
- Returns:
pandas DataFrame with text extraction jobs information
- Return type:
pandas.DataFrame
Example
extraction.list_jobs()
- run_job(document_reference, results_reference, steps=None)[source]¶
Start a request to extract text and metadata from document.
- Parameters:
document_reference (DataConnection) – Reference to document in bucket from which text will be extracted
results_reference (DataConnection) – Reference to location in bucket where results will saved
steps (dict | None, optional) – The steps for the text extraction pipeline, defaults to None
- Returns:
Raw response from server with text extraction job details
- Return type:
dict
Example
from ibm_watsonx_ai.metanames import TextExtractionsMetaNames from ibm_watsonx_ai.helpers import DataConnection, S3Location document_reference = DataConnection( connection_asset_id="<connection_id>", location=S3Location(bucket="<bucket_name>", path="path/to/file"), ) results_reference = DataConnection( connection_asset_id="<connection_id>", location=S3Location(bucket="<bucket_name>", path="path/to/file"), ) response = extraction.run_job( document_reference=document_reference, results_reference=results_reference, steps={ TextExtractionsMetaNames.OCR: { "process_image": True, "languages_list": ["en", "fr"], }, TextExtractionsMetaNames.TABLE_PROCESSING: {"enabled": True}, }, )
Enums¶
- class metanames.TextExtractionsMetaNames[source]¶
Set of MetaNames for Text Extraction Steps.
Available MetaNames:
MetaName
Type
Required
Example value
OCR
dict
N
{'process_images': True, 'language_list': ['en']}
TABLE_PROCESSING
dict
N
{'enabled': True}
Note
For more details about Text Extraction Steps see https://cloud.ibm.com/apidocs/watsonx-ai