Document Conversion with Docling¶
The primary purpose of Docling is document conversion. Docling enables us to convert documents various format into formats that are more useful in AI applications, while preserving document structure.
This lab walks through the different document conversion options Docling offers, as well as some enrichment features. We will also explore the converted documents to examine how Docling stores metadata to preserve document structure.
Prerequisites¶
This lab is a Jupyter notebook. Please follow the instructions in prework for the prerequisites to run the lab.
Lab¶
Launch Jupyter Lab by running the following commands from the opentech directory of your beeai-workshop cloned repo.
-
Create
doclingkernelwhich will have the dependencies preinstalled in our virtual environment.uv run --directory docling ipython kernel install --user --env VIRTUAL_ENV .venv --name=doclingkernel -
Use
uvto run Jupyter Lab. The directory and allow_hidden gives us access to.venvmodules.uv run --directory docling jupyter lab --ContentsManager.allow_hidden=True -
In Jupyter Lab in your browser, walk through the notebook:
- Jupyter Lab will open in your browser
- Navigate to the
notebooksfolder - Open
Conversion.ipynb - Use the play button to walk through the notebook
- Be sure to read the text, the code, and the output
- Exit your browser tab
- Exit your Jupyter Lab server by entering CTRL-C, CTRL-C in your the terminal where it is running