Prompt
A prompt is the instruction that guides Lightfeed's extraction process. It defines what data should be retrieved from websites and how it should be interpreted. A well-crafted prompt is essential for accurate, comprehensive data extraction.
What Makes an Effective Prompt?
An effective prompt for web data extraction is clear, specific, structured, and contextually relevant to the websites you're extracting from. It ensures that Lightfeed can accurately identify and extract exactly the data you need.
1. Clarity and Precision
Your prompt should clearly define what data to extract with no ambiguity.
✅ Extract the product name, current price, original price (only if on sale), and in-stock status and product URL from each product listing on the page.
❌ Get product details (too vague to guide extraction effectively)
2. Context Awareness
You can also include relevant context to further filter or refine extraction.
✅ Extract a list of all companies in the legal space, including their name, description, website. From their website, find their pricing and contact email.
❌ Get all the companies and their contact info (lacks contextual guidance)
3. Structured Format
Structure your prompt to generate or match the desired output schema. Clearly define each field you want to extract along with descriptions. This approach is especially powerful because Lightfeed can automatically generate schema fields directly from well-structured prompts, saving you time and ensuring alignment between your extraction intent and results.
✅ For each property listing on the page, extract:
- property title (the full address as shown)
- postal code (five digits postal code)
- asking price (current listing price as a number)
- original price (only if price reduced, otherwise leave empty)
- status (text indicating 'For Sale', 'Pending', 'Sold', or 'Coming Soon')
- number of bedrooms
- number of bathrooms
- square footage
- listing URL
❌ Extract all the property info" (Too general, lacks structure)
4. Handling Variability
If data layout varies or data is not available, the prompt should specify fallback strategies.
✅ ...If the contact email is missing, return 'N/A' in the email field...
✅ ...If a product has multiple colors, extract data for the default selected variation only...
Real-World Examples
Here are complete prompt examples for common extraction scenarios:
E-commerce Product Extraction
Extract the following information from each product listing:
1. Product Name: The full title of the product
2. Current Price: The current selling price (number only, no currency symbol)
3. Original Price: If the item is on sale, extract the crossed-out original price; otherwise leave blank
4. Discount Percentage: If shown, extract the percentage discount
5. Rating: The numerical rating out of 5 stars
6. Review Count: The number of customer reviews
7. Availability: Whether the item is in stock, out of stock, or pre-order
8. Shipping Info: Any text related to shipping (e.g., "Free Shipping," "2-day delivery")
9. Product URL: The full URL to the product page
If any information is not available, leave that field blank. Ignore sponsored products or advertisements.
News Article Extraction
Extract all news articles with the following details:
1. Headline: The main title of the article
2. Subtitle: The subtitle or summary if available
3. Author: The name(s) of the author(s)
4. Publication Date: The date and time the article was published
5. Category: The category or section the article belongs to (e.g., Politics, Technology)
6. Content Summary: A brief 1-2 sentence summary of the article's content
7. Article URL: The full URL to the complete article
Only extract articles published within the last 7 days. Ignore sponsored content, opinion pieces, and video-only content.
Job Listing Extraction
Extract all job listings from this career page with the following information:
1. Job Title: The exact title of the position
2. Company Name: The name of the hiring company
3. Location: The full location text (city, state or "remote")
4. Salary Range: The salary or compensation range if provided
5. Experience Level: Required experience level (entry, mid, senior)
6. Employment Type: Whether it's full-time, part-time, contract, etc.
7. Posted Date: When the job was posted
8. Job Description Summary: A brief 1-2 sentence summary of the job description
9. Required Skills: List the top 3-5 required skills mentioned
10. Application URL: The direct link to apply for the position
If a field is not available, leave it blank.
Optimizing Your Prompts
As you use Lightfeed, you may need to refine your prompts for better results:
- Start specific: Begin with a detailed prompt that clearly describes the data you need
- Test and refine: Run a test extraction and examine the results
- Adjust as needed: If results aren't as expected, add more context or be more specific
- Consider edge cases: Update your prompt to handle variations or exceptions
Common Prompt Patterns
These patterns can be adapted for various extraction scenarios:
The Structured List
Extract the following information from each [item type]:
- [Field 1]: [Description]
- [Field 2]: [Description]
...
The Context-Aware Extraction
Find only items related to [specific topic]. Extract the [fields] from each item.
Deep Extraction
... Extract the [fields] from each item. From the [URL field], find [more fields].
The Conditional Extraction
... If [condition on a field], then [alternative action].