E&F

InsigntRed - Intelligent Marketing On Reddit

Project’s GitHub Repo

photo

About

InsightRed is an LLM-powered tool adept at extracting the latest Reddit comments from Subreddits, sorted by “Hot”, and pinpointing users who exhibit potential interest in your project or product. It’s a Reddit marketing tool to help you get your initial users for your product/project. This project was originally build for the ANARCHY October 2023 Hackathon. This project was originally inspired by a post made by casta on Hacker News. Their solution for this idea was not open-source, so InsightRed was created to act as an open-source version of the project.

Demo

Announcement(s)

October 19, 2023

As a follow up to this project, I’m excited to announce that we won 1st place at Anarchy’s October 2023 Hackathon!

photo

InsightRed’s Components

🧩 Collector

The Collector collects the latest Reddit posts and that post’s comments, for a given Subreddits, by using Reddit’s API. After collecting, the collector saves the collected data to a local SQLite database. This is made easy by using the python package praw to assist with using the Reddit API and SQLAlchemy for performing CRUD operations in the local SQLite database.

🧩 Vectorizer

The Vectorizer checks the local SQLite database to see which comments have not been saved to the vector database. After getting a list of comments, it creates an embedding of the post+comment using OpenAI’s “text-embedding-ada-002” model. This embedding is used as an Index in the vector database and some metadata, in the form of a JSON, is also created. The Index and metadata is then uploaded to the vector database, which in this case is Pinecone (cloud-based). After being uploaded, the local SQLite database is updated to avoid re-uploading the same data to Pinecone. This is all done by using Pinecone’s python client (pinecone-client) for making CRUD options to the vector database and LangChain for handing the embedding process.

🧩 Interface

The interface is what is used by the user to interact with the tool. In this case, the interface is a CLI. The interface has an implementation of Retrieval-Augmented-Generation (RAG). Where the user provides a description of their product, a list of Subreddits to check, as well as some filters. Given this context, the Collector is called then the Vectorizer is called. After those two services are done processing, the inputted product description is used to make a similarly search in the vector database. The top results and the product description are then fed into a prompt template which creates the final prompt. The final prompt is then sent to OpenAI’s GPT-4 model and the final results are then presented to the user. These results will be a listing of all the Reddit comments that highly suggest the Reddit user(s) would be interested in the provided product, based on it’s description. This component works by using the Collector and Vectorizer comments, as well as, by using Anarchy’s LLM-VM to handling querying OpenAI’s GPT-4 model.