A shared read configuration serves as a universal set of parameters that are consistent across all source connectors, providing a standardized way to access and retrieve documents from various sources. This configuration typically includes settings such as the download directory, which specifies the location where retrieved documents are stored. By maintaining common parameters like the download directory, users can streamline their data extraction processes, making it easier to manage and organize the downloaded documents irrespective of the source connector in use. This promotes consistency, ease of maintenance, and a more straightforward integration process when working with multiple source connectors within a system.

Configs

  •   download_dir: What location to download the files to. When run via the CLI, a default location will be used if one is not provided.

  •   download_only: Default: False. If set to True, the process will exit right after all the files are downloaded and omit any future steps such as partitioning and uploading to a destination.

  •   max_docs: How many documents are pulled in for a single process.

  •   preserve_downloads: False (default) for the process to delete the downloaded content at the end if everything finished without error. By setting this to True, those files will be preserved.

  •   re_download: False (default) for the process to skip downloads if the files already exist in the download directory. By setting this to True, it will force the files to be re downloaded regardless of them existing already.