Key Concepts

Database

A database in Lightfeed is the central container that organizes and stores all your extracted web data. Think of it as a structured, dynamic table where Lightfeed extracts and maintains information from websites on a scheduled interval. Each database includes settings that determine what data to extract, how to structure it, and when to extract. You can also create custom views to filter and search data within the database.

For more on how a database works and how to setup, see the Database Guide.

Source

Sources are the websites from which Lightfeed extracts data into a database. When adding multiple sources to a database, Lightfeed applies the same extraction instructions, data structure, and time of extraction.

Lightfeed offers two extraction modes: List Mode and Detail Mode. List Mode optimizes for pages with multiple similar items, while Detail Mode targets pages focused on a single item or when you need comprehensive information about a specific entity. Your choice should be based on both the page structure and your extraction goal. Sources can be configured to extract from various destinations including Website URLs, Google Search results, Reddit communities, LinkedIn pages, and RSS feeds.

For more details on configuring sources, see the Source Guide.

Prompt

A prompt is the instruction that guides Lightfeed's AI-powered extraction process. It defines what data should be retrieved and how it should be interpreted. A good prompt is clear, specific, and structured, ensuring accurate and relevant data retrieval. Here is an example:

Extract a list of companies in the legal space, including their name, description, website.
From their website, find their pricing and contact email.

For detailed guidance on creating effective prompts, see the Prompt Guide.

Schema

A schema defines the structure of your extracted data, ensuring consistency and usability. It consists of a list of fields with specific names, data types, and descriptions. Lightfeed can automatically generate a schema from your prompt using AI, or you can manually create and edit schema fields.

For best practices on defining schema, see the Schema Guide.

ID Field or Deduplication Key

The ID field (or deduplication key) identifies unique records in your database. It can be either a single field or a combination of multiple fields that together form a unique identifier.

When Lightfeed extracts data, it uses the ID field(s) to detect existing records. If a record with the same ID value(s) already exists, Lightfeed will update it rather than creating a duplicate.

For more on ID field, see the Data Deduplication Guide.

Schedule

A schedule determines when Lightfeed automatically extracts data for your database. It can run on specific day(s) of the week and at any hour(s) of the day. During scheduled runs, Lightfeed updates your database with fresh data while maintaining consistency through the deduplication rules.

For more details on scheduling options, see the Schedule Guide.

Database​

Source​

Prompt​

Schema​

ID Field or Deduplication Key​

Schedule​