Models

On this page

Using a Non-Default Model

Basic usage:

elements = partition(filename=filename,
                     strategy="hi_res",
                     hi_res_model_name="yolox")

To use any model with the partition, set the strategy to hi_res as shown above.
To maintain the consistency between the unstructured and unstructured-api libraries, we are deprecating the model_name parameter. Please use hi_res_model_name parameter when specifying a model.

The hi_res_model_name parameter supports the yolox and detectron2_onnx arguments.

Using a Non-Default Model

Unstructured will download the model specified in UNSTRUCTURED_HI_RES_MODEL_NAME environment variable. If not defined, it will download the default model. There are three ways you can use the non-default model as follows:

Store the model name in the environment variable

import os
from unstructured.partition.pdf import partition_pdf

os.environ["UNSTRUCTURED_HI_RES_MODEL_NAME"] = "yolox"

out_yolox = partition_pdf("example-docs/pdf/layout-parser-paper-fast.pdf", strategy="hi_res")

Pass the model name in the partition function.

filename = "example-docs/pdf/layout-parser-paper-fast.pdf"

elements = partition(filename=filename,
                     strategy="hi_res",
                     hi_res_model_name="yolox")

Use unstructured-inference library.

from unstructured_inference.models.base import get_model
from unstructured_inference.inference.layout import DocumentLayout

model = get_model("yolox")
layout = DocumentLayout.from_file("sample-docs/layout-parser-paper.pdf", detection_model=model)

Document elements and metadata Partitioning strategies

Unstructured open source

Getting started with open source

Using Unstructured open source

Ingestion

How to

Best practices

Concepts

Integrations

Using a Non-Default Model

Unstructured open source

Getting started with open source

Using Unstructured open source

Ingestion

How to

Best practices

Concepts

Integrations

​Using a Non-Default Model

Using a Non-Default Model