Documentation Index
Fetch the complete documentation index at: https://docs.unstructured.io/llms.txt
Use this file to discover all available pages before exploring further.
This sample code utilizes the Unstructured Open Source Library.
Objectives
- Extract text and metadata from a PDF file using the Unstructured.io Python SDK.
- Process and store this data in a Databricks Delta Table.
- Retrieve data from the Delta Table using the Unstructured.io Delta Table Connector.
Prerequisites
- Unstructured Python SDK
- Databricks account and workspace
- AWS S3 for Delta Table storage
Processing and Storing into Databricks Delta Table
- Initialize PySpark
- Convert JSON output into Dataframe
- Store DataFrame as Delta Table

