After reading this section, you should understand the following:

  • How to partition a document into json or csv.

  • How to remove unwanted content from document elements using cleaning functions.

  • How to extract content from a document using the extraction functions.

  • How to prepare data for downstream use cases using staging functions

  • How to chunk partitioned documents for use cases such as Retrieval Augmented Generation (RAG).