AirmailerairmailerDocs

OverviewUploading DocumentsSupported FormatsURL Ingestion

Sign In
Airmailerairmailer
  1. Docs
  2. Documents
  3. URL Ingestion

URL Ingestion

URL ingestion lets you import content directly from web pages without downloading files. Simply provide a URL, and Airmailer fetches, processes, and stores the content automatically.

How It Works

Enter URL → Airmailer fetches page → Content extracted → Stored as document
  1. Fetch: Airmailer retrieves the webpage content
  2. Parse: HTML is analyzed to identify main content
  3. Clean: Navigation, headers, footers removed
  4. Convert: Content converted to Markdown
  5. Store: Document saved to your knowledge base

Using URL Ingestion

  1. Navigate to Documents in the sidebar
  2. Click Import from URL
  3. Enter the webpage URL
  4. Provide a title for the document
  5. Select a document type
  6. Click Import

URL Requirements

Supported URLs

  • Public webpages (no login required)
  • HTTPS URLs (HTTP automatically upgraded)
  • Pages under 2 MB content

Unsupported URLs

  • Password-protected pages
  • Content behind paywalls
  • Dynamic/JavaScript-only content
  • Private IP addresses
  • Localhost URLs

Content Extraction

What's Captured

  • Main article/page content
  • Headings and structure
  • Text formatting (bold, italic)
  • Lists and tables
  • Inline links

What's Removed

  • Navigation menus
  • Site headers and footers
  • Sidebar content
  • Cookie notices
  • Advertisement blocks
  • Social sharing buttons
  • Comments sections

Security Features

Airmailer includes security measures for URL ingestion:

| Protection | Description | |------------|-------------| | HTTPS Required | All URLs upgraded to HTTPS | | Private IP Blocked | Cannot fetch from internal networks | | Size Limit | Maximum 2 MB content | | Timeout | 10-second fetch timeout | | Redirect Limit | Maximum 3 redirects followed |

Best Practices

Choose the Right Pages

Good candidates for import:

  • FAQ pages
  • Policy pages (returns, privacy, terms)
  • Product description pages
  • Help center articles
  • Blog posts with evergreen content

Avoid importing:

  • Pages with mostly images
  • Dynamic content (changes frequently)
  • Pages with minimal text
  • User-generated content

Verify After Import

After importing, review the document:

  1. Check content was extracted correctly
  2. Verify formatting looks right
  3. Edit title if needed
  4. Confirm document type is appropriate

Handling Import Issues

Page Not Loading

Possible causes:

  • Page requires authentication
  • Server blocking automated requests
  • Page doesn't exist (404)
  • Network timeout

Solution: Try downloading the page manually and uploading as HTML.

Content Missing

If the imported content is incomplete:

  • The page may use JavaScript rendering
  • Content might be in an iframe
  • The main content detection may have missed areas

Solution: Export the page to HTML and upload manually.

Wrong Content Extracted

If navigation or sidebar content appears:

  • The page structure may be non-standard
  • Content detection found the wrong area

Solution: Edit the document to remove unwanted content, or re-import from a cleaner source.

Common Use Cases

Import Your FAQ Page

URL: https://yoursite.com/faq
Title: "Frequently Asked Questions"
Type: FAQ

Import Return Policy

URL: https://yoursite.com/returns
Title: "Return and Refund Policy"
Type: Returns

Import Product Info

URL: https://yoursite.com/products/widget
Title: "Widget Product Information"
Type: Other

Refreshing Content

If the source webpage changes:

  1. Delete the existing document
  2. Re-import from the same URL
  3. Or manually update the document content

Airmailer doesn't automatically sync with source URLs—imported content is a snapshot.

Limits

| Limit | Value | |-------|-------| | Content size | 2 MB maximum | | Fetch timeout | 10 seconds | | Max redirects | 3 | | Rate limit | No specific limit |

Troubleshooting

"Failed to fetch URL"

  • Verify the URL is correct and accessible
  • Check if the page requires login
  • Try accessing the URL in an incognito browser window

"Content too large"

  • The page exceeds 2 MB
  • Try importing a more focused sub-page
  • Download and manually trim the HTML

"Timeout error"

  • The server took too long to respond
  • Try again later
  • Download the page manually instead

"No content found"

  • The page may be mostly JavaScript-rendered
  • Content structure not recognized
  • Use manual HTML upload instead

Next Steps

  • Upload files manually
  • Learn about formats
  • Configure your AI agent
PreviousSupported FormatsNextOverview

Command Palette

Search for a command to run...