Elasticsearch
Batch process all your records to store structured outputs in Elasticsearch.
You will need:
The Elasticsearch prerequisites:
-
An Elasticsearch instance, such as an Elastic Cloud service instance…
…or a self-managed Elasticsearch instance.
-
The name of the index on the instance. See Create index and Get index.
The Elasticsearch index that you use must have a schema that is compatible with the schema of the documents that Unstructured produces for you. Unstructured cannot provide a schema that is guaranteed to work in all circumstances. This is because these schemas will vary based on your source files’ types; how you want Unstructured to partition, chunk, and generate embeddings; any custom post-processing code that you run; and other factors.
You can adapt the following index schema example for your own needs:
See also:
-
If you’re connecting to an Elastic Cloud instance, the Cloud ID and API key. To get these, see your Elasticsearch Service web console.
-
If you’re connecting to a self-managed instance, the instance’s hostname and port number. See Networking.
-
If you’re using basic authentication to the instance, the user’s name and password.
-
If you’re using token-based authentication to the instance, the bearer token or API key for the instance. See Token-based authentication services and Create API key.
-
If you’re using certificate, the path to the Certificate Authority (CA) file on the instance, and the certificate fingerprint. See SSL certificate API and Where can I see my Certificate Fingerprint?.
The Elasticsearch connector dependencies:
You might also need to install additional dependencies, depending on your needs. Learn more.
The following environment variables:
ELASTICSEARCH_HOST
- The hostname and port number, defined as<hostname>:<port-number>
and represented by--hosts
(CLI) orhosts
(Python).ELASTICSEARCH_INDEX_NAME
- The name of the search index, represented by--index-name
(CLI) orindex_name
(Python).
If you’re using Elastic Cloud to connect to ElasticSearch:
ELASTIC_CLOUD_ID
- The instance’s Cloud ID, represented by--cloud-id
(CLI) orcloud_id
(Python).ELASTIC_CLOUD_API_KEY
- The corresponding Cloud API key value, represented by--api-key-id
(CLI) orapi_key_id
(Python).
If you’re using basic authentication to the instance:
ELASTICSEARCH_USERNAME
- The user’s name, represented by--username
(CLI) orusername
(Python).ELASTICSEARCH_PASSWORD
- The user’s password, represented by--password
(CLI) orpassword
(Python).
If you’re using token-based authentication to the instance instead:
ELASTICSEARCH_BEARER_TOKEN
- The bearer token’s value, represented by--bearer-auth
(CLI) orbearer_auth
(Python).ELASTIC_CLOUD_API_KEY_ID
- The API key’s value, represented by--api-key
(CLI) orapi_key
(Python).
If you’re using certificates:
ELASTICSEARCH_CA_CERTS
- The path to the Certificate Authority (CA) file, represented by--ca-certs
(CLI) orca_certs
(Python).ELASTICSEARCH_SSL_ASSERT_FINGERPRINT
- The certificate’s fingerprint, represented by--ssl-assert-fingerprint
orssl_assert_fingerprint
(Python).
These environment variables:
UNSTRUCTURED_API_KEY
- Your Unstructured API key value.UNSTRUCTURED_API_URL
- Your Unstructured API URL.
Now call the Unstructured CLI or Python SDK. The source connector can be any of the ones supported. This example uses the local source connector:
Was this page helpful?