Key Concepts
Database
A database in Lightfeed is the central container that organizes and stores all your extracted web data. Think of it as a structured, dynamic table where Lightfeed extracts and maintains information from websites on a scheduled interval. Each database includes settings that determine what data to extract, how to structure it, and when to extract. You can also create custom views to filter and search data within the database.
For more on how a database works and how to setup, see the Database Guide.
Source
Sources are the websites from which Lightfeed extracts data into a database. When adding multiple sources to a database, Lightfeed applies the same extraction instructions, data structure, and time of extraction.
Lightfeed supports various source types including website URLs, Google Search, Reddit, LinkedIn, RSS feeds, and more.
For more details on configuring sources, see the Source Guide.
Prompt
A prompt is the instruction that guides Lightfeed's AI-powered extraction process. It defines what data should be retrieved and how it should be interpreted. A good prompt is clear, specific, and structured, ensuring accurate and relevant data retrieval. Here is an example:
Extract a list of companies in the legal space, including their name, description, website.
From their website, find their pricing and contact email.
For detailed guidance on creating effective prompts, see the Prompt Guide.
Schema
A schema defines the structure of your extracted data, ensuring consistency and usability. It consists of a list of fields with specific names, data types, and descriptions. Lightfeed can automatically generate a schema from your prompt using AI, or you can manually create and edit schema fields.
For best practices on defining schema, see the Schema Guide.
ID Field or Deduplication Key
The ID field (or deduplication key) identifies unique records in your database. It can be either a single field or a combination of multiple fields that together form a unique identifier.
When Lightfeed extracts data, it uses the ID field(s) to detect existing records. If a record with the same ID value(s) already exists, Lightfeed will update it rather than creating a duplicate.
For more on ID field, see the Data Deduplication Guide.
Schedule
A schedule determines when Lightfeed automatically extracts data for your database. It can run on specific day(s) of the week and at any hour(s) of the day. During scheduled runs, Lightfeed updates your database with fresh data while maintaining consistency through the deduplication rules.
For more details on scheduling options, see the Schedule Guide.