Index
--input-path
command option.<Prefix>IndexerConfig
class (where <Prefix>
represents the connector provider’s name, such as Azure
for Azure.)Post-Index Filter
FiltererConfig
class.Download
--download-dir
command option.<Prefix>DownloaderConfig
class.Post-Download Filter
Uncompress
--uncompress
command option.UncompressConfig
class.Post-Uncompress Filter
Partition
PartitionerConfig
class.Chunk
ChunkerConfig
class.Embed
EmbedderConfig
class.Stage
<Prefix>UploadStagerConfig
class.Upload
<Prefix>UploaderConfig
class.Table
and non-Table
elements in the same chunk.basic
strategy and also preserve section boundaries. Optionally preserve page boundaries as well.basic
strategy and also preserve page boundaries.sentence-transformers/multi-qa-mpnet-base-dot-v1
embedding model to identify topically similar sequential elements and combine them into chunks. This strategy is availably only when calling Unstructured..pdf
, .pptx
, and .tiff
..docx
files that have page metadata, Unstructured calculates the number of pages based on that metadata.