Text Extractions¶
- class ibm_watsonx_ai.foundation_models.extractions.TextExtractions(credentials=None, project_id=None, space_id=None, api_client=None)[source]¶
Bases:
WMLResource
Instantiate the Text Extraction service.
- Parameters:
credentials (Credentials, optional) – credentials to the Watson Machine Learning instance
project_id (str, optional) – ID of the Watson Studio project, defaults to None
space_id (str, optional) – ID of the Watson Studio space, defaults to None
api_client (APIClient, optional) – initialized APIClient object with a set project ID or space ID. If passed,
credentials
andproject_id
/space_id
are not required, defaults to None
- Raises:
InvalidMultipleArguments – raised if space_id and project_id or credentials and api_client are provided simultaneously
WMLClientError – raised if the CPD version is less than 5.0
from ibm_watsonx_ai import Credentials from ibm_watsonx_ai.foundation_models.extractions import TextExtractions extraction = TextExtractions( credentials=Credentials( api_key = "***", url = "https://us-south.ml.cloud.ibm.com"), project_id="*****" )
- delete_job(extraction_id)[source]¶
Delete a text extraction job.
- Returns:
return “SUCCESS” if the deletion succeeds
- Return type:
str
Example:
extraction.delete_job(extraction_id="<extraction_id>")
- static get_id(extraction_details)[source]¶
Get the unique ID of a stored extraction request.
- Parameters:
extraction_details (dict) – metadata of the stored extraction
- Returns:
unique ID of the stored extraction request
- Return type:
str
Example:
extraction_details = extraction.get_job_details(extraction_id) extraction_id = extraction.get_id(extraction_details)
- get_job_details(extraction_id=None, limit=None)[source]¶
Return text extraction job details. If extraction_id is None, returns the details of all text extraction jobs.
- Parameters:
extraction_id (str | None, optional) – ID of the text extraction job, defaults to None
limit (int | None, optional) – limit number of fetched records, defaults to None
- Returns:
details of the text extraction job
- Return type:
dict
Example:
extraction.get_job_details(extraction_id="<extraction_id>")
- get_results_reference(extraction_id)[source]¶
Get a DataConnection instance that is a reference to the results stored on COS.
- Parameters:
extraction_id (str) – ID of text extraction job
- Returns:
location of the Data Connection to text extraction job results
- Return type:
Example:
results_reference = extraction.get_results_reference(extraction_id="<extraction_id>")
- list_jobs(limit=None)[source]¶
List text extraction jobs. If limit is None, all jobs will be listed.
- Parameters:
limit (int | None, optional) – limit number of fetched records, defaults to None
- Returns:
job information of a pandas DataFrame with text extraction
- Return type:
pandas.DataFrame
Example:
extraction.list_jobs()
- run_job(document_reference, results_reference, steps=None, results_format='json')[source]¶
Start a request to extract text and metadata from a document.
- Parameters:
document_reference (DataConnection) – reference to the document in the bucket from which text will be extracted
results_reference (DataConnection) – reference to the location in the bucket where results will saved
steps (dict | None, optional) – steps for the text extraction pipeline, defaults to None
results_format (Literal["json", "markdown"], optional) – results format for the text extraction, defaults to “json”
- Returns:
raw response from the server with the text extraction job details
- Return type:
dict
Example:
from ibm_watsonx_ai.metanames import TextExtractionsMetaNames from ibm_watsonx_ai.helpers import DataConnection, S3Location document_reference = DataConnection( connection_asset_id="<connection_id>", location=S3Location(bucket="<bucket_name>", path="path/to/file"), ) results_reference = DataConnection( connection_asset_id="<connection_id>", location=S3Location(bucket="<bucket_name>", path="path/to/file"), ) response = extraction.run_job( document_reference=document_reference, results_reference=results_reference, steps={ TextExtractionsMetaNames.OCR: { "process_image": True, "languages_list": ["en", "fr"], }, TextExtractionsMetaNames.TABLE_PROCESSING: {"enabled": True}, results_format="markdown", }, )
Enums¶
- class metanames.TextExtractionsMetaNames[source]¶
Set of MetaNames for Text Extraction Steps.
Available MetaNames:
MetaName
Type
Required
Example value
OCR
dict
N
{'process_images': True, 'languages_list': ['en']}
TABLE_PROCESSING
dict
N
{'enabled': True}
Note
For more details about Text Extraction Steps, see https://cloud.ibm.com/apidocs/watsonx-ai