Source connectors
Connect Reddit to your preprocessing pipeline, and batch process all your documents using unstructured-ingest
to store structured outputs locally on your filesystem.
First, install the Reddit dependencies as shown here.
You must provide:
- A client ID and a client secret to authenticate yourself. Learn how to get them here.
user-agent
: user agent request header to use when calling Reddit APIsubreddit-name
: The name of a subreddit, without the “r\”, e.g. “machinelearning”
Optionally you can choose to specify:
search-query
: If set, return posts using this query. Otherwise, use hot posts.num-posts
: If set, limits the number of posts to pull in.
For a full list of the options the Unstructured Ingest CLI accepts check unstructured-ingest reddit --help
.
Was this page helpful?