Free Unstructured API
This page describes how to obtain an API key to use with the free Unstructured API, the limitations of the free Unstructured API, and provides a quickstart example.
Get an API key
The Free Unstructured API requires authentication via an API key. Here’s how you can obtain your API key:
- Go to https://unstructured.io/api-key-free.
- Fill out the registration form. Make sure your contact information (especially your Email) is valid.
- Check the I agree box if you consent to Unstructured contacting you about our products and services.
- Click the Terms and Conditions link, read it, and check the related box to agree.
- Click Submit. You will receive a Free Unstructured API key at the Email you provided. Store your API key in a secure location. Do not share it with others.
- For the Free Unstructured API, the API URL is
https://api.unstructured.io/general/v0/general
Free Unstructured API keys do not work with the Unstructured Serverless API. If you try to use a Free Unstructured API key with an Unstructured Serverless API URL, the call will fail. Use your Free Unstructured API URL instead.
Free Unstructured API limitations
The Free Unstructured API is designed for prototyping purposes, and not for production use:
- The API usage is limited to 1000 pages per month.
- Unlike the users of Unstructured Serverless API, users of the Free Unstructured API do not get their own dedicated infrastructure.
- The data sent over the Free Unstructured API can be used for model training purposes, and other service improvements.
If you require a production-ready API, consider using the Unstructured Serverless API instead.
We calculate a page as follows:
- For these file types, a page is a page, slide, or image: .pdf, .pptx, and .tiff.
- For .docx files that have page metadata, we calculate the number of pages based on that metadata.
- For all other file types, we calculate the number of pages as the file’s size divided by 100 KB.
Quickstart
These examples use your local machine. They send source (input) files from your local machine to the Unstructured Serverless API which delivers the processed data to a destination (output) location, also on your local machine. Data is processed on Unstructured-hosted compute resources.
Unstructured Ingest CLI
To work with the Free Unstructured API by using the Unstructured Ingest CLI, you will need to:
-
Install Python, and then install the CLI package:
pip install unstructured
-
Set the
UNSTRUCTURED_API_KEY
environment variable to your Free Unstructured API key. -
Set the
UNSTRUCTURED_API_URL
environment variable to your Free Unstructured API URL, which ishttps://api.unstructured.io/general/v0/general
-
Have some compatible files on your local machine to be processed. See the list of supported file types. If you do not have any files available, you can download some from the example-docs folder in the Unstructured repo on GitHub.
Now, use the CLI to call the API, replacing:
<path/to/input>
with the source (input) path to the directory on your local machine that contains the compatible files for Unstructured to process on its hosted compute resources.<path/to/output>
with the destination (output) path to the directory on your local machine that will contain the processed data that Unstructured returns from its hosted compute resources.
unstructured-ingest \
local \
--input-path <path/to/input> \
--output-dir <path/to/output> \
--partition-by-api \
--api-key $UNSTRUCTURED_API_KEY \
--partition-endpoint $UNSTRUCTURED_API_URL \
--strategy hi_res \
--additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}"
After the command successfully runs, see the results in the specified output path on your local machine.
Unstructured Ingest Python library
To work with the Unstructured Serverless API by using the Unstructured Python library, you will need to:
-
Install Python, and then install the CLI package:
pip install unstructured-ingest
-
Set the following environment variables:
- Set
UNSTRUCTURED_API_KEY
to your API key. - Set
UNSTRUCTURED_API_URL
to your API URL.
- Set
-
Have some compatible files on your local machine to be processed. See the list of supported file types. If you do not have any files available, you can download some from the example-docs folder in the Unstructured repo on GitHub.
Now, use the CLI to call the API, replacing:
<path/to/input>
with the source (input) path to the directory on your local machine that contains the compatible files for Unstructured to process on its hosted compute resources.<path/to/output>
with the destination (output) path to the directory on your local machine that will contain the processed data that Unstructured returns from its hosted compute resources.
import os
from unstructured_ingest.v2.pipeline.pipeline import Pipeline
from unstructured_ingest.v2.interfaces import ProcessorConfig
from unstructured_ingest.v2.processes.connectors.local import (
LocalIndexerConfig,
LocalDownloaderConfig,
LocalConnectionConfig,
LocalUploaderConfig
)
from unstructured_ingest.v2.processes.partitioner import PartitionerConfig
if __name__ == "__main__":
Pipeline.from_configs(
context=ProcessorConfig(),
indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
downloader_config=LocalDownloaderConfig(),
source_connection_config=LocalConnectionConfig(),
partitioner_config=PartitionerConfig(
partition_by_api=True,
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
strategy="hi_res",
additional_partition_args={
"split_pdf_page": True,
"split_pdf_allow_failed": True,
"split_pdf_concurrency_level": 15
}
),
uploader_config=LocalUploaderConfig(output_dir=os.getenv("LOCAL_FILE_OUTPUT_DIR"))
).run()
After the command successfully runs, see the results in the specified output path on your local machine.
Telemetry
We’ve partnered with Scarf to collect anonymized user statistics to understand which features our community is using and how to prioritize product decision-making in the future.
To learn more about how we collect and use this data, please read our Privacy Policy.
To opt out of this data collection, you can set the environment variable SCARF_NO_ANALYTICS=true
before running any commands that call Unstructured Serverless API services.
Was this page helpful?