Skip to content

HtmlTinkerX is a powerful async C# library for HTML, CSS, and JS processing, parsing, formatting, and optimization. It provides web content processing capabilities including browser automation, document parsing with multiple engines, resource optimization, and more. PSParseHTML is the PowerShell module exposing HtmlTinkerX to PowerShell.

Notifications You must be signed in to change notification settings

EvotecIT/HtmlTinkerX

Repository files navigation

HtmlTinkerX & PSParseHTML - Modern HTML Processing for .NET and PowerShell

HtmlTinkerX is available as NuGet from the Nuget Gallery and PSParseHTML PowerShell module from PSGallery

πŸ“¦ NuGet Package

nuget downloads nuget version

πŸ’» PowerShell Module

powershell gallery version powershell gallery preview powershell gallery platforms powershell gallery downloads

πŸ› οΈ Project Information

.NET Tests PowerShell Tests top language codecov

πŸ‘¨β€πŸ’» Author & Social

Twitter Follow Blog LinkedIn Discord

What it's all about

HtmlTinkerX is a powerful async C# library for HTML, CSS, and JavaScript processing, parsing, formatting, and optimization. It provides comprehensive web content processing capabilities including browser automation with Playwright, document parsing with multiple engines, resource optimization, and much more. PSParseHTML is the PowerShell module that exposes HtmlTinkerX functionality through easy-to-use cmdlets.

Whether you're working in C# or PowerShell, you get access to:

  • πŸ” HTML Parsing - Multiple parsing engines (AngleSharp, HtmlAgilityPack)
  • 🎨 Resource Optimization - Minify and format HTML, CSS, JavaScript
  • 🌐 Browser Automation - Full Playwright integration for screenshots, PDFs, interaction
  • πŸ“Š Data Extraction - Tables, forms, metadata, microdata, Open Graph
  • πŸ“§ Email Processing - CSS inlining for email compatibility
  • πŸ”§ Network Tools - HAR export, request interception, console logging
  • πŸͺ State Management - Cookie handling, session persistence
  • πŸ“± Multi-Platform - .NET Framework 4.7.2, .NET Standard 2.0, .NET 8.0

πŸ”„ PowerShell to C# Method Mapping

This comprehensive table shows how PowerShell cmdlets map to C# methods, making it easy to transition between platforms:

HTML Parsing & Processing

PowerShell Cmdlet C# Method Description
ConvertFrom-HTML HtmlParser.ParseWithAngleSharp() Parse HTML documents
ConvertFrom-HtmlTable HtmlParser.ParseTablesWithAngleSharp() Extract tables with rowspan/colspan support
ConvertFrom-HTMLAttributes HtmlParserExtensions.GetElements() Query elements by tag, class, id, or name
ConvertFrom-HtmlForm HtmlFormExtractor.ExtractForms() Extract form data and structure
ConvertFrom-HtmlList HtmlListParser.ParseLists() Parse list elements into structured data
ConvertFrom-HtmlMeta HtmlMetaParser.ExtractMeta() Extract meta tag name/content pairs
ConvertFrom-HtmlMicrodata HtmlMicrodataParser.ExtractMicrodata() Extract schema.org structured data
ConvertFrom-HtmlOpenGraph HtmlOpenGraphParser.ExtractOpenGraph() Extract Open Graph metadata
Convert-HTMLToText HtmlUtilities.ConvertToText() Convert HTML to plain text
Compare-HTML HtmlDiffer.Compare() Compare HTML documents
Measure-HTMLDocument HtmlParser.AnalyzeDocument() Analyze document metrics

Resource Formatting & Optimization

PowerShell Cmdlet C# Method Description
Format-HTML HtmlFormatter.FormatHtml() Pretty-print HTML markup
Format-CSS HtmlFormatter.FormatCss() Format CSS stylesheets
Format-JavaScript HtmlFormatter.FormatJavaScript() Beautify JavaScript with options
Optimize-HTML HtmlOptimizer.OptimizeHtml() Minify HTML content
Optimize-CSS HtmlOptimizer.OptimizeCss() Minify CSS stylesheets
Optimize-JavaScript HtmlOptimizer.OptimizeJavaScript() Minify JavaScript code
Optimize-Email PreMailerClient.MoveCssInline() Inline CSS for email compatibility

Browser Automation & Sessions

PowerShell Cmdlet C# Method Description
Start-HTMLSession HtmlBrowser.OpenSessionAsync() Create browser session
Close-HTMLSession session.DisposeAsync() Close browser session
Invoke-HTMLRendering HtmlBrowser.OpenSessionAsync() Render pages with authentication
Invoke-HTMLNavigation HtmlBrowser.NavigateAsync() Navigate to different URLs
Invoke-HTMLScript HtmlBrowser.ExecuteScriptAsync() Execute JavaScript in browser
Invoke-HTMLDomScript HtmlScriptRunner.ExecuteScript() Run JavaScript with AngleSharp
Export-BrowserState HtmlBrowser.ExportStateAsync() Save browser state
Import-BrowserState HtmlBrowser.ImportStateAsync() Restore browser state
Export-HTMLSession HtmlBrowser.ExportSessionAsync() Export session data
Import-HTMLSession HtmlBrowser.ImportSessionAsync() Import session data

Screenshots & Media

PowerShell Cmdlet C# Method Description
Save-HTMLScreenshot HtmlBrowser.SaveScreenshotAsync() Capture page screenshots
Save-HTMLPdf HtmlBrowser.SavePdfAsync() Generate PDFs from pages
Start-HTMLVideoRecording HtmlBrowser.StartVideoRecordingAsync() Start recording browser session
Stop-HTMLVideoRecording HtmlBrowser.StopVideoRecordingAsync() Stop video recording

Element Interaction

PowerShell Cmdlet C# Method Description
Invoke-HTMLClick HtmlBrowser.ClickAsync() Click elements
Set-HTMLInput HtmlBrowser.SetInputAsync() Set input field values
Set-HTMLSelectOption HtmlBrowser.SetSelectAsync() Select dropdown options
Set-HTMLChecked HtmlBrowser.SetCheckedAsync() Check/uncheck checkboxes
Submit-HTMLForm HtmlBrowser.SubmitFormAsync() Submit forms
Get-HTMLInteractable HtmlBrowser.GetInteractableElementsAsync() List clickable elements
Get-HTMLFormField HtmlFormExtractor.GetFormFields() Extract form field information
Get-HTMLLoginForm HtmlFormExtractor.DetectLoginForms() Detect login forms

Network & Debugging

PowerShell Cmdlet C# Method Description
Get-HTMLNetworkLog HtmlBrowser.GetNetworkLog() View network requests/responses
Get-HTMLConsoleLog HtmlBrowser.GetConsoleLog() Retrieve console messages
Save-HTMLHar HtmlBrowser.SaveHarAsync() Export network traffic to HAR
Start-HTMLTracing HtmlBrowser.StartTracingAsync() Start Playwright tracing
Stop-HTMLTracing HtmlBrowser.StopTracingAsync() Stop tracing and save
Register-HTMLRoute HtmlBrowser.RegisterRouteAsync() Intercept requests
Unregister-HTMLRoute HtmlBrowser.UnregisterRouteAsync() Remove route handler
Show-HTMLHar HtmlHarViewer.ShowHar() Visualize HAR files

Browser Testing

PowerShell Cmdlet C# Method Description
Test-HtmlBrowser HtmlBrowserTester.TestUrlAsync() Comprehensive browser testing
Test-HtmlBrowser HtmlBrowserTester.TestFileAsync() Test local HTML file (with -Path)
Test-HtmlBrowser HtmlBrowserTester.TestCssResourceAsync() Test specific CSS resource (with -CssResource)
Test-HtmlBrowser HtmlBrowserTester.TestConsoleErrorsAsync() Get console errors (with -ErrorsOnly)
Test-HtmlBrowser HtmlBrowserTester.TestPerformanceAsync() Get performance metrics (with -PerformanceOnly)
Clear-HtmlBrowserCache HtmlBrowserCacheCleaner.CleanAllCache() Clean Playwright browser downloads

Cookies & State

PowerShell Cmdlet C# Method Description
Get-HTMLCookie HtmlBrowser.GetCookiesAsync() Retrieve session cookies
Set-HTMLCookie HtmlBrowser.SetCookieAsync() Add cookies to session
New-HTMLCookie new HtmlCookie() Create cookie objects

Content & Resources

PowerShell Cmdlet C# Method Description
Get-HTMLResource HtmlResourceParser.ExtractResources() Extract scripts and CSS
Save-HTMLAttachment HtmlBrowser.SaveAttachmentsAsync() Download files from pages
Get-HTMLContent HtmlBrowser.GetContentAsync() Retrieve page content
Export-HTMLOutline HtmlOutlineBuilder.BuildOutline() Generate document outlines
Set-HTMLHttpClientOption HttpClientFactory.ConfigureClient() Configure HTTP client options

πŸ“¦ Installation & Packages

πŸ“¦ NuGet Package (C#/.NET)

dotnet add package HtmlTinkerX

πŸ”§ PowerShell Module

Install-Module -Name PSParseHTML -AllowClobber -Force

πŸ“‹ Package Information

  • πŸ“¦ NuGet Package: HtmlTinkerX - Core .NET library
  • πŸ”§ PowerShell Module: PSParseHTML - PowerShell cmdlets wrapper
  • 🎯 Target Frameworks: .NET Framework 4.7.2, .NET Standard 2.0, .NET 8.0
  • πŸ’» PowerShell Compatibility: Windows PowerShell 5.1+ and PowerShell Core 6.0+

πŸš€ Quick Start

C# Example

using HtmlTinkerX;

// Parse HTML and extract tables
string html = await File.ReadAllTextAsync("page.html");
var tables = HtmlParser.ParseTablesWithAngleSharp(html);

// Format and optimize resources
string formatted = HtmlFormatter.FormatHtml(html);
string minified = HtmlOptimizer.OptimizeHtml(html);

// Browser automation
await using var session = await HtmlBrowser.OpenSessionAsync("https://example.com");
await HtmlBrowser.SaveScreenshotAsync(session, "screenshot.png");

PowerShell Example

# Parse HTML tables from a webpage
$tables = ConvertFrom-HtmlTable -Url 'https://example.com'

# Format and optimize resources
$formatted = Format-HTML -Path 'page.html'
$minified = Optimize-HTML -Path 'page.html'

# Browser automation
$session = Start-HTMLSession -Url 'https://example.com'
Save-HTMLScreenshot -Session $session -OutFile 'screenshot.png'
Close-HTMLSession -Session $session

πŸ”§ PowerShell Cmdlets

HTML/CSS/JavaScript Processing

  • Convert-HTMLToText - Convert markup to plain text
  • ConvertFrom-HtmlTable - Extract table elements into objects (supports rowspan/colspan)
  • ConvertFrom-HTMLAttributes - Extract elements by tag, class, id or name
  • ConvertFrom-HTML - Parse full documents or fragments
  • ConvertFrom-HtmlForm - Extract form data and structure
  • ConvertFrom-HtmlList - Parse list elements into structured data
  • ConvertFrom-HtmlMeta - Extract name/content pairs from meta tags
  • ConvertFrom-HtmlMicrodata - Extract structured data items (schema.org types)
  • ConvertFrom-HtmlOpenGraph - Extract Open Graph metadata
  • Format-CSS - Pretty-print style sheets
  • Format-HTML - Tidy up HTML markup
  • Format-JavaScript - Beautify JavaScript with customizable options
  • Optimize-CSS - Minify style sheets
  • Optimize-Email - Inline CSS for email bodies
  • Optimize-HTML - Minify HTML
  • Optimize-JavaScript - Minify JavaScript

Browser Automation & Interaction

  • Start-HTMLSession / Invoke-HTMLRendering - Create browser sessions with authentication support
  • Close-HTMLSession - Dispose browser sessions
  • Invoke-HTMLNavigation - Navigate to different URLs
  • Invoke-HTMLScript - Execute JavaScript in browser context
  • Invoke-HTMLDomScript - Run JavaScript with AngleSharp (no browser required)
  • Invoke-HTMLClick - Click elements in the browser
  • Get-HTMLInteractable - List clickable elements
  • Set-HTMLInput - Set input field values
  • Set-HTMLSelectOption - Select dropdown options
  • Set-HTMLChecked - Check/uncheck checkboxes and radio buttons
  • Submit-HTMLForm - Submit forms

Screenshots & Media

  • Save-HTMLScreenshot - Capture page screenshots with advanced options
  • Save-HTMLPdf - Generate PDFs from rendered pages
  • Start-HTMLVideoRecording / Stop-HTMLVideoRecording - Record browser sessions

Network & Debugging

  • Get-HTMLNetworkLog - View captured network requests and responses
  • Get-HTMLConsoleLog - Retrieve browser console messages
  • Save-HTMLHar - Export network traffic to HAR files
  • Start-HTMLTracing / Stop-HTMLTracing - Record Playwright traces
  • Register-HTMLRoute / Unregister-HTMLRoute - Intercept and mock requests
  • Test-HtmlBrowser - Comprehensive browser testing for errors, performance, and resources
  • Clear-HtmlBrowserCache - Clean Playwright browser downloads

Cookies & State Management

  • Get-HTMLCookie - Retrieve cookies from sessions
  • Set-HTMLCookie - Add cookies to sessions
  • New-HTMLCookie - Create cookie objects
  • Export-BrowserState / Import-BrowserState - Save/restore browser state
  • Export-HTMLSession / Import-HTMLSession - Session state management

Content & Resources

  • Get-HTMLResource - Extract script and CSS resources
  • Save-HTMLAttachment - Download files from pages
  • Get-HTMLContent - Retrieve page content
  • Get-HTMLFormField - Extract form field information
  • Get-HTMLLoginForm - Detect login forms
  • Export-HTMLOutline - Generate document outlines
  • Show-HTMLHar - Visualize HAR files
  • Compare-HTML - Compare HTML documents
  • Measure-HTMLDocument - Analyze document metrics## 🎯 C# API Reference

Core Classes

HtmlParser

// Parse with different engines
var doc = HtmlParser.ParseWithAngleSharp(html);
var doc2 = HtmlParser.ParseWithHtmlAgilityPack(html);

// Extract tables with detailed information
var tables = HtmlParser.ParseTablesWithAngleSharpDetailed(html);
var tables2 = HtmlParser.ParseTablesWithHtmlAgilityPack(html);

// Parse from URLs
var urlDoc = await HtmlParser.ParseFromUrlAsync("https://example.com");

HtmlFormatter

// Format different resource types
string formattedHtml = HtmlFormatter.FormatHtml(html);
string formattedCss = HtmlFormatter.FormatCss(css);
string formattedJs = HtmlFormatter.FormatJavaScript(javascript);

// Custom JavaScript formatting options
var options = new BeautifierOptions {
    IndentSize = 2,
    BraceStyle = BraceStyle.Expand
};
string customJs = HtmlFormatter.FormatJavaScript(javascript, options);

// Async operations
string formatted = await HtmlFormatter.FormatHtmlAsync(html);

HtmlOptimizer

// Minify resources
string minifiedHtml = HtmlOptimizer.OptimizeHtml(html);
string minifiedCss = HtmlOptimizer.OptimizeCss(css);
string minifiedJs = HtmlOptimizer.OptimizeJavaScript(javascript);

// File operations
await HtmlOptimizer.OptimizeHtmlFileAsync("input.html", "output.html");

HtmlBrowser (Browser Automation)

// Create browser sessions
await using var session = await HtmlBrowser.OpenSessionAsync("https://example.com");

// Authentication
var credentials = new NetworkCredential("user", "pass");
await using var authSession = await HtmlBrowser.OpenSessionAsync(
    "https://example.com/protected",
    credential: credentials,
    loginUrl: "https://example.com/login"
);

// Screenshots
await HtmlBrowser.SaveScreenshotAsync(session, "screenshot.png");
await HtmlBrowser.SaveScreenshotAsync(session, "full.png", fullPage: true);

// PDF generation
await HtmlBrowser.SavePdfAsync(session, "document.pdf");

// Navigation
await HtmlBrowser.NavigateAsync(session, "https://example.com/page2");

// JavaScript execution
var result = await HtmlBrowser.ExecuteScriptAsync(session, "return document.title;");

// Element interaction
await HtmlBrowser.ClickAsync(session, "#button");
await HtmlBrowser.SetInputAsync(session, "#username", "user");
await HtmlBrowser.SubmitFormAsync(session, "#loginForm");

HtmlUtilities

// Convert HTML to plain text
string plainText = HtmlUtilities.ConvertToText(html);

// HTTP client operations
using var httpClient = HttpClientFactory.CreateHttpClient();
string content = await httpClient.GetStringAsync("https://example.com");

PreMailerClient

// Email optimization
string inlinedCss = PreMailerClient.MoveCssInline(emailHtml);
string optimized = await PreMailerClient.MoveCssInlineAsync(emailHtml, downloadRemoteCss: true);

Extension Methods

HtmlParserExtensions

// Quick element queries
var elements = doc.GetElements("div.class-name");
var byId = doc.GetElements("#element-id");
var byTag = doc.GetElements("p");

πŸ“š Examples

PowerShell Examples

Table Extraction

# Extract tables from Wikipedia
$tables = ConvertFrom-HtmlTable -Url 'https://en.wikipedia.org/wiki/PowerShell'
$tables[0] | Format-Table -AutoSize

# Parse local HTML file
$tables = ConvertFrom-HtmlTable -Path './data.html'
foreach ($table in $tables) {
    $table | Export-Csv "table_$($tables.IndexOf($table)).csv" -NoTypeInformation
}

Resource Optimization

# Format and minify HTML
$formatted = Format-HTML -Path './messy.html'
$minified = Optimize-HTML -Content $formatted -OutputFile './clean.min.html'

# Optimize JavaScript with custom options
$js = Format-JavaScript -Path './script.js' -IndentSize 2 -BraceStyle Expand
Optimize-JavaScript -Content $js -OutputFile './script.min.js'

# Email optimization
$emailHtml = Get-Content './newsletter.html' -Raw
$optimized = Optimize-Email -Body $emailHtml -UseEmailFormatter -DownloadRemoteCss

Browser Automation

# Authenticated session with form login
$cred = Get-Credential
$session = Start-HTMLSession -Url 'https://example.com/protected' `
    -Credential $cred `
    -LoginUrl 'https://example.com/login' `
    -UsernameSelector 'input[name=username]' `
    -PasswordSelector 'input[name=password]' `
    -SubmitSelector 'button[type=submit]'

# Take screenshots with different options
Save-HTMLScreenshot -Session $session -OutFile 'full-page.png' -Full
Save-HTMLScreenshot -Session $session -OutFile 'element.png' -ElementSelector '#content'
Save-HTMLScreenshot -Session $session -OutFile 'highlighted.png' -HighlightSelector '.important'

# Download files
Save-HTMLAttachment -Session $session -Path './downloads' -Filter '.pdf'

# Network monitoring
Start-HTMLTracing -Session $session
Invoke-HTMLNavigation -Session $session -Url 'https://example.com/api/data'
Stop-HTMLTracing -Session $session -OutFile 'trace.zip'
Save-HTMLHar -Session $session -OutFile 'network.har'

Close-HTMLSession -Session $session

C# Examples

Document Processing

using HtmlTinkerX;

// Parse and process HTML
string html = await File.ReadAllTextAsync("document.html");
var document = HtmlParser.ParseWithAngleSharp(html);

// Extract specific elements
var links = document.GetElements("a[href]");
var images = document.GetElements("img[src]");

// Extract tables with detailed information
var tables = HtmlParser.ParseTablesWithAngleSharpDetailed(html);
foreach (var table in tables)
{
    Console.WriteLine($"Table has {table.Rows.Count} rows and {table.Headers.Count} columns");
    foreach (var row in table.Rows)
    {
        Console.WriteLine(string.Join(" | ", row.Values));
    }
}

Resource Optimization

// Format resources
string formattedHtml = HtmlFormatter.FormatHtml(html);
string formattedCss = HtmlFormatter.FormatCss(css);

// Custom JavaScript formatting
var jsOptions = new BeautifierOptions
{
    IndentSize = 4,
    BraceStyle = BraceStyle.Collapse,
    PreserveNewlines = true
};
string formattedJs = HtmlFormatter.FormatJavaScript(javascript, jsOptions);

// Minification
string minifiedHtml = HtmlOptimizer.OptimizeHtml(html);
string minifiedCss = HtmlOptimizer.OptimizeCss(css);
string minifiedJs = HtmlOptimizer.OptimizeJavaScript(javascript);

// Email optimization
string emailBody = await File.ReadAllTextAsync("newsletter.html");
string inlined = await PreMailerClient.MoveCssInlineAsync(emailBody, downloadRemoteCss: true);

Browser Automation

// Basic browser session
await using var session = await HtmlBrowser.OpenSessionAsync("https://example.com");

// Authenticated session
var credentials = new NetworkCredential("username", "password");
await using var authSession = await HtmlBrowser.OpenSessionAsync(
    "https://example.com/protected",
    credential: credentials,
    loginUrl: "https://example.com/login",
    usernameSelector: "#username",
    passwordSelector: "#password",
    submitSelector: "#login-button"
);

// Interact with the page
await HtmlBrowser.SetInputAsync(session, "#search", "query");
await HtmlBrowser.ClickAsync(session, "#search-button");
await Task.Delay(2000); // Wait for results

// Capture results
await HtmlBrowser.SaveScreenshotAsync(session, "results.png");
var consoleMessages = HtmlBrowser.GetConsoleLog(session);
foreach (var message in consoleMessages)
{
    Console.WriteLine($"{message.Type}: {message.Text}");
}

// Download files
var downloads = await HtmlBrowser.SaveAttachmentsAsync(session, "./downloads", ".pdf");
Console.WriteLine($"Downloaded {downloads.Count} files");

πŸ§ͺ Browser Testing & Network Monitoring

PSParseHTML now includes comprehensive browser testing capabilities for checking network requests, CSS resources, console errors, and performance metrics. This feature uses strongly-typed classes instead of dictionaries for better IntelliSense and type safety.

PowerShell Browser Testing

Basic Testing

# Run a comprehensive test on a URL
$result = Test-HtmlBrowser -Url 'https://example.com'

# Test a local HTML file
$result = Test-HtmlBrowser -Path 'C:\MyProject\index.html'

# Check if test passed (no errors or failed requests)
if ($result.Passed) {
    Write-Host "βœ… All tests passed!"
} else {
    Write-Host "❌ Issues found: $($result.Summary)"
}

# View detailed results
Write-Host "Total Requests: $($result.TotalRequests)"
Write-Host "Failed Requests: $($result.FailedRequestCount)"
Write-Host "Console Errors: $($result.ErrorCount)"
Write-Host "Console Warnings: $($result.WarningCount)"

Testing Local HTML Files

# Test local HTML files created by HTMLForgeX or other tools
$htmlFile = "C:\Projects\MyReport\report.html"
$result = Test-HtmlBrowser -Path $htmlFile

# Check for JavaScript errors in local file
$errors = Test-HtmlBrowser -Path $htmlFile -ErrorsOnly
if ($errors.Count -gt 0) {
    Write-Host "Found $($errors.Count) JavaScript errors:"
    $errors | ForEach-Object {
        Write-Host "  - $($_.Text) at $($_.FullLocation)"
    }
}

# Test CSS loading in local file
$cssCheck = Test-HtmlBrowser -Path $htmlFile -CssResource 'styles.css'
if ($cssCheck) {
    Write-Host "CSS loaded successfully in $($cssCheck.Duration.TotalMilliseconds)ms"
}

# Test with visible browser (not headless) for debugging
$result = Test-HtmlBrowser -Path $htmlFile -Headless:$false

Testing for Console Errors

# Get only console errors
$errors = Test-HtmlBrowser -Url 'https://example.com' -ErrorsOnly

foreach ($error in $errors) {
    Write-Host "Error: $($error.Text)"
    Write-Host "  Location: $($error.FullLocation)"
    Write-Host "  Severity: $($error.SeverityLevel)"

    if ($error.StackTrace) {
        Write-Host "  Stack: $($error.StackTrace)"
    }
}

CSS Resource Testing

# Check if a specific CSS file is loaded
$cssResource = Test-HtmlBrowser -Url 'https://example.com' -CssResource 'styles.css'

if ($cssResource) {
    Write-Host "CSS found: $($cssResource.Url)"
    Write-Host "Load time: $($cssResource.Duration.TotalMilliseconds)ms"
    Write-Host "Size: $($cssResource.TransferSize) bytes"
    Write-Host "From cache: $($cssResource.ServedFromCache)"
}

Performance Testing

# Get performance metrics only
$metrics = Test-HtmlBrowser -Url 'https://example.com' -PerformanceOnly

# Display performance report
Write-Host $metrics.GetReport()

# Access specific metrics
Write-Host "Page Load Time: $($metrics.TotalLoadTime.TotalSeconds)s"
Write-Host "Average Request Duration: $($metrics.AverageRequestDuration.TotalMilliseconds)ms"
Write-Host "Total Bytes: $($metrics.TotalBytesTransferred / 1KB)KB"

# Resource breakdown by type
$metrics.ResourceBreakdown | ForEach-Object {
    Write-Host "$($_.Key): $($_.Value) requests"
}

# Or get the full formatted report
Write-Host $metrics.GetReport()

Advanced Testing with Proxy

# Test through a proxy with authentication
$cred = Get-Credential
$result = Test-HtmlBrowser -Url 'https://example.com' `
    -Proxy 'http://proxy:8080' `
    -ProxyCredential $cred `
    -Timeout 60000

Batch Testing Multiple URLs

# Test multiple URLs and generate report
$urls = @(
    'https://example.com/home',
    'https://example.com/about',
    'https://example.com/contact'
)

$results = $urls | ForEach-Object {
    $result = Test-HtmlBrowser -Url $_
    [PSCustomObject]@{
        Url = $_
        Status = if ($result.Passed) { 'PASS' } else { 'FAIL' }
        LoadTime = $result.PageLoadTime.TotalSeconds
        Requests = $result.TotalRequests
        Failed = $result.FailedRequestCount
        Errors = $result.ErrorCount
        Warnings = $result.WarningCount
    }
}

# Display results in a table
$results | Format-Table -AutoSize

# Export to CSV for further analysis
$results | Export-Csv -Path 'browser-test-results.csv' -NoTypeInformation

# Find pages with issues
$results | Where-Object { $_.Status -eq 'FAIL' } | ForEach-Object {
    Write-Warning "Failed: $($_.Url) - $($_.Failed) failed requests, $($_.Errors) errors"
}

Integration with Pester Tests

# Save as MyWebsite.Tests.ps1
Describe "Website Browser Tests" {

    BeforeAll {
        $baseUrl = 'https://mywebsite.com'
    }

    It "Homepage should load without errors" {
        $result = Test-HtmlBrowser -Url $baseUrl
        $result.Passed | Should -BeTrue
        $result.ConsoleErrors.Count | Should -Be 0
        $result.FailedRequestCount | Should -Be 0
    }

    It "All CSS files should load successfully" {
        $result = Test-HtmlBrowser -Url $baseUrl
        $cssFiles = $result.CssResources

        $cssFiles.Count | Should -BeGreaterThan 0
        $cssFiles | ForEach-Object {
            $_.Status | Should -Be 200
            $_.ErrorType | Should -BeNullOrEmpty
        }
    }

    It "Page should load within 3 seconds" {
        $result = Test-HtmlBrowser -Url $baseUrl
        $result.PageLoadTime.TotalSeconds | Should -BeLessOrEqual 3
    }

    It "Console should not contain JavaScript errors" {
        $errors = Test-HtmlBrowser -Url $baseUrl -ErrorsOnly
        $errors | Should -BeNullOrEmpty
    }

    It "Total page size should be under 5MB" {
        $metrics = Test-HtmlBrowser -Url $baseUrl -PerformanceOnly
        $totalMB = $metrics.TotalBytesTransferred / 1MB
        $totalMB | Should -BeLessOrEqual 5
    }
}

# Run tests
Invoke-Pester -Path .\MyWebsite.Tests.ps1 -Output Detailed

Testing Local HTML Reports

# Test HTMLForgeX generated reports
$reportPath = "C:\Reports\MonthlyReport.html"

# Basic test
$result = Test-HtmlBrowser -Path $reportPath
if (-not $result.Passed) {
    Write-Warning "Report has issues:"
    $result.ConsoleErrors | ForEach-Object {
        Write-Warning "  JS Error: $($_.Text)"
    }
    $result.FailedRequests | ForEach-Object {
        Write-Warning "  Failed Resource: $($_.Url)"
    }
}

# Test multiple reports
Get-ChildItem -Path "C:\Reports" -Filter "*.html" | ForEach-Object {
    $result = Test-HtmlBrowser -Path $_.FullName
    [PSCustomObject]@{
        Report = $_.Name
        Status = if ($result.Passed) { 'βœ…' } else { '❌' }
        LoadTime = "$($result.PageLoadTime.TotalSeconds)s"
        Errors = $result.ErrorCount
        MissingResources = $result.FailedRequestCount
    }
} | Format-Table -AutoSize

# Test with visible browser for debugging
$debugResult = Test-HtmlBrowser -Path $reportPath -Headless:$false -Timeout 60000

Monitoring and Alerting

# Monitor website health
function Test-WebsiteHealth {
    param(
        [string]$Url,
        [int]$MaxLoadTime = 5,
        [int]$MaxErrors = 0
    )

    $result = Test-HtmlBrowser -Url $Url

    $issues = @()

    if ($result.PageLoadTime.TotalSeconds -gt $MaxLoadTime) {
        $issues += "Slow load time: $($result.PageLoadTime.TotalSeconds)s"
    }

    if ($result.ErrorCount -gt $MaxErrors) {
        $issues += "Console errors: $($result.ErrorCount)"
    }

    if ($result.FailedRequestCount -gt 0) {
        $issues += "Failed requests: $($result.FailedRequestCount)"
    }

    if ($issues.Count -eq 0) {
        Write-Host "βœ… $Url is healthy" -ForegroundColor Green
    } else {
        Write-Host "❌ $Url has issues:" -ForegroundColor Red
        $issues | ForEach-Object { Write-Host "   - $_" -ForegroundColor Yellow }

        # Send alert (example)
        # Send-MailMessage -To "[email protected]" -Subject "Website Issue" -Body ($issues -join "`n")
    }

    return @{
        Url = $Url
        Healthy = $issues.Count -eq 0
        Issues = $issues
        Timestamp = Get-Date
    }
}

# Test multiple sites
$sites = @('https://site1.com', 'https://site2.com')
$healthChecks = $sites | ForEach-Object { Test-WebsiteHealth -Url $_ }

# Save results
$healthChecks | ConvertTo-Json | Out-File "health-check-$(Get-Date -Format 'yyyyMMdd-HHmmss').json"

C# Browser Testing

Basic Testing

using HtmlTinkerX;

// Run comprehensive test on URL
var result = await HtmlBrowserTester.TestUrlAsync("https://example.com");

// Test a local HTML file
var fileResult = await HtmlBrowserTester.TestFileAsync(@"C:\MyProject\index.html");

if (result.Passed)
{
    Console.WriteLine("βœ… All tests passed!");
}
else
{
    Console.WriteLine($"❌ {result.Summary}");
}

// Analyze results
Console.WriteLine($"Total Requests: {result.TotalRequests}");
Console.WriteLine($"Failed: {result.FailedRequestCount}");
Console.WriteLine($"Errors: {result.ErrorCount}");
Console.WriteLine($"Warnings: {result.WarningCount}");

Testing Local HTML Files

// Test local HTML file with full analysis
var testResult = await HtmlBrowserTester.TestFileAsync(
    @"C:\Projects\MyReport\report.html",
    HtmlBrowserEngine.Chromium,
    headless: true,
    timeout: 30000);

// Check specific issues
if (testResult.ConsoleErrors.Any())
{
    Console.WriteLine($"Found {testResult.ErrorCount} JavaScript errors:");
    foreach (var error in testResult.ConsoleErrors)
    {
        Console.WriteLine($"  - {error.Text}");
        Console.WriteLine($"    Location: {error.FullLocation}");
        if (!string.IsNullOrEmpty(error.StackTrace))
        {
            Console.WriteLine($"    Stack: {error.StackTrace}");
        }
    }
}

// Analyze resource loading
var slowResources = testResult.NetworkEntries
    .Where(r => r.Duration > TimeSpan.FromSeconds(1))
    .OrderByDescending(r => r.Duration);

foreach (var resource in slowResources)
{
    Console.WriteLine($"Slow resource: {resource.Url} took {resource.Duration?.TotalSeconds}s");
}

Network Request Analysis

// Test and analyze network requests
var result = await HtmlBrowserTester.TestUrlAsync("https://example.com");

// Check CSS resources
foreach (var css in result.CssResources)
{
    Console.WriteLine($"CSS: {css.Url}");
    Console.WriteLine($"  Duration: {css.Duration?.TotalMilliseconds}ms");
    Console.WriteLine($"  Size: {css.TransferSize} bytes");
    Console.WriteLine($"  Cached: {css.ServedFromCache}");
}

// Check failed requests
foreach (var failed in result.FailedRequests)
{
    Console.WriteLine($"Failed: {failed.Url}");
    Console.WriteLine($"  Error: {failed.ErrorType} - {failed.ErrorMessage}");
}

// Check JavaScript resources
var jsFiles = result.JavaScriptResources;
var totalJsSize = jsFiles.Sum(js => js.TransferSize ?? 0);
Console.WriteLine($"Total JS size: {totalJsSize / 1024}KB");

Console Error Detection

// Get only console errors
var errors = await HtmlBrowserTester.TestConsoleErrorsAsync("https://example.com");

foreach (var error in errors)
{
    Console.WriteLine($"Error: {error.Text}");
    Console.WriteLine($"  Type: {error.Type}");
    Console.WriteLine($"  Location: {error.FullLocation}");
    Console.WriteLine($"  Timestamp: {error.Timestamp}");

    if (!string.IsNullOrEmpty(error.StackTrace))
    {
        Console.WriteLine($"  Stack: {error.StackTrace}");
    }
}

Performance Analysis

// Get performance metrics
var metrics = await HtmlBrowserTester.TestPerformanceAsync("https://example.com");

// Display performance report
Console.WriteLine(metrics.GetReport());

// Check specific thresholds
if (metrics.TotalLoadTime > TimeSpan.FromSeconds(5))
{
    Console.WriteLine("⚠️ Page load time exceeds 5 seconds!");
}

if (metrics.LongestRequest?.Duration > TimeSpan.FromSeconds(2))
{
    Console.WriteLine($"⚠️ Slow resource: {metrics.LongestRequest.Url}");
}

Testing Local HTML Files

// Test a local HTML file created by HTMLForgeX or other tools
var localResult = await HtmlBrowserTester.TestFileAsync(@"C:\Projects\MyReport\report.html");

// Check if all resources loaded correctly
if (localResult.Passed)
{
    Console.WriteLine("βœ… Local HTML file passed all tests!");
}
else
{
    // Analyze what went wrong
    foreach (var failed in localResult.FailedRequests)
    {
        Console.WriteLine($"❌ Failed to load: {failed.Url}");
        Console.WriteLine($"   Error: {failed.ErrorType}");
    }
}

// Test with custom timeout for slow local resources
var slowResult = await HtmlBrowserTester.TestFileAsync(
    @"C:\MyProject\index.html",
    timeout: 30000  // 30 seconds
);

Integration Testing Examples

// Example: Testing in xUnit
[Fact]
public async Task Website_Should_Load_Without_Errors()
{
    var result = await HtmlBrowserTester.TestUrlAsync("https://mysite.com");

    Assert.True(result.Passed, $"Test failed: {result.Summary}");
    Assert.Empty(result.ConsoleErrors);
    Assert.Empty(result.FailedRequests);
    Assert.True(result.PageLoadTime < TimeSpan.FromSeconds(3),
        "Page load time exceeded 3 seconds");
}

// Example: Testing specific CSS resources
[Theory]
[InlineData("styles.css")]
[InlineData("theme.css")]
public async Task CSS_Files_Should_Load_Successfully(string cssFile)
{
    var css = await HtmlBrowserTester.TestCssResourceAsync(
        "https://mysite.com", cssFile);

    Assert.NotNull(css);
    Assert.Equal(200, css.Status);
    Assert.True(css.Duration < TimeSpan.FromSeconds(1));
}

// Example: Performance regression test
[Fact]
public async Task Page_Performance_Should_Meet_Thresholds()
{
    var metrics = await HtmlBrowserTester.TestPerformanceAsync("https://mysite.com");

    Assert.True(metrics.TotalLoadTime < TimeSpan.FromSeconds(5));
    Assert.True(metrics.TotalBytesTransferred < 5 * 1024 * 1024); // 5MB
    Assert.True(metrics.TotalRequests < 50);
    Assert.All(metrics.RequestsByResourceType, kvp =>
    {
        if (kvp.Key == HtmlNetworkResourceType.Image)
        {
            Assert.True(kvp.Value.TotalSizeMB < 2,
                $"Images exceed 2MB limit: {kvp.Value.TotalSizeMB:F2}MB");
        }
    });
}

Batch Testing Multiple Pages

// Test multiple pages efficiently
var urls = new[] {
    "https://example.com/home",
    "https://example.com/about",
    "https://example.com/contact"
};

var results = await Task.WhenAll(
    urls.Select(url => HtmlBrowserTester.TestUrlAsync(url))
);

// Generate summary report
foreach (var (url, result) in urls.Zip(results))
{
    Console.WriteLine($"\n{url}:");
    Console.WriteLine($"  Status: {(result.Passed ? "PASS" : "FAIL")}");
    Console.WriteLine($"  Load Time: {result.PageLoadTime.TotalSeconds:F2}s");
    Console.WriteLine($"  Requests: {result.TotalRequests} ({result.FailedRequestCount} failed)");
    Console.WriteLine($"  Console: {result.ErrorCount} errors, {result.WarningCount} warnings");
}

// Find slowest page
var slowest = results.OrderByDescending(r => r.PageLoadTime).First();
Console.WriteLine($"\nSlowest page: {slowest.Url} ({slowest.PageLoadTime.TotalSeconds:F2}s)");

Test Result Properties

HtmlBrowserTestResult

  • Url - The tested URL
  • PageLoadTime - Total page load duration
  • NetworkEntries - All network requests with detailed info
  • ConsoleEntries - All console messages
  • ConsoleErrors - Only error messages
  • ConsoleWarnings - Only warning messages
  • FailedRequests - Failed network requests
  • CssResources - CSS file requests
  • JavaScriptResources - JS file requests
  • ImageResources - Image requests
  • Passed - Whether all tests passed
  • Summary - Human-readable summary

HtmlNetworkEntryDetailed

  • Url - Request URL
  • Method - HTTP method
  • Status - Response status code
  • ProtocolVersion - HTTP protocol version
  • Duration - Request duration
  • ResourceType - Type of resource (Document, Stylesheet, Script, etc.)
  • TransferSize - Total bytes transferred
  • ServedFromCache - Whether served from cache
  • ErrorType - Error type if failed
  • ContentType - Response content type

HtmlConsoleEntryDetailed

  • Text - Console message text
  • Type - Message type (Error, Warning, Info, etc.)
  • Timestamp - When logged
  • SourceUrl - Source file URL
  • LineNumber - Line in source
  • StackTrace - Stack trace for errors
  • SeverityLevel - 1=Info, 2=Warning, 3=Error
  • IsError/IsWarning/IsInfo - Quick type checks

HtmlPerformanceMetrics

  • TotalLoadTime - Total time to load the page
  • TotalRequests - Number of network requests made
  • TotalBytesTransferred - Total bytes downloaded
  • AverageRequestDuration - Average time per request
  • LongestRequest - The slowest network request
  • ResourceBreakdown - Dictionary of requests grouped by type (Document, Stylesheet, Script, Image, Font, etc.)
  • GetReport() - Returns a formatted text report with all metrics

Playwright Auto-Setup

Playwright browsers are automatically downloaded on first use. No manual setup required! The download happens once per system and is shared across all applications.

How Auto-Download Works

When you first use browser testing, Playwright automatically downloads required components:

  1. Playwright Driver & Node.js:

    • Windows: %LOCALAPPDATA%\ms-playwright-driver
    • macOS: ~/Library/Caches/ms-playwright-driver
    • Linux: ~/.cache/ms-playwright-driver
    • Contains the Playwright driver and embedded Node.js runtime
  2. Browser Installations:

    • Windows: %LOCALAPPDATA%\ms-playwright
    • macOS: ~/Library/Caches/ms-playwright
    • Linux: ~/.cache/ms-playwright
    • Contains Chromium, Firefox, and/or WebKit browsers
  3. Download Process:

    • Shows progress: "Downloading Playwright driver... X% (Y MB/s)"
    • Thread-safe - prevents concurrent downloads
    • Subsequent runs use cached components - no re-download needed
    • You can manually ensure browsers are installed using HtmlBrowser.EnsureInstalledAsync()

Cleaning Playwright Cache

# View cache size and clean if needed
Clear-HtmlBrowserCache -WhatIf

# Force clean without confirmation
Clear-HtmlBrowserCache -Force

# Skip cleaning temporary files (only clean browser downloads)
Clear-HtmlBrowserCache -SkipTemp -Force

# Skip cleaning browser downloads (only clean temp files)
Clear-HtmlBrowserCache -SkipBrowsers -Force

# View detailed information about what will be cleaned
Clear-HtmlBrowserCache -Verbose

The enhanced cache cleaner now:

  • Cleans multiple Playwright cache locations (LocalAppData and .cache)
  • Removes temporary Playwright files from the temp directory
  • Cleans up trace files left behind by debugging sessions
  • Shows detailed size information for each location
  • Provides granular control over what to clean

C# Cache Cleaning

// Manually ensure browser is installed (usually not needed - happens automatically)
await HtmlBrowser.EnsureInstalledAsync(HtmlBrowserEngine.Chromium);

// Get all cache locations
var locations = HtmlBrowserCacheCleaner.GetCacheLocations();
Console.WriteLine($"Found {locations.Count} locations totaling {locations.Sum(l => l.SizeMB):F2} MB");

// Clean all cache
var result = HtmlBrowserCacheCleaner.CleanAllCache();
if (result.Success)
{
    Console.WriteLine($"Cleaned {result.TotalSizeClearedMB:F2} MB");
}
else
{
    Console.WriteLine($"Failed to clean {result.Failed.Count} locations");
}

// Clean only browser downloads
var browserResult = HtmlBrowserCacheCleaner.CleanAllCache(
    includeBrowsers: true,
    includeTemp: false);

// Get locations without cleaning (for inspection)
var tempOnly = HtmlBrowserCacheCleaner.GetCacheLocations(
    includeBrowsers: false,
    includeTemp: true);
foreach (var location in tempOnly)
{
    Console.WriteLine($"{location.Description}: {location.SizeMB:F2} MB at {location.Path}");
}

Integration with Test Frameworks

xUnit Example

[Fact]
public async Task WebsiteShouldHaveNoErrors()
{
    var result = await HtmlBrowserTester.TestUrlAsync("https://mysite.com");

    Assert.True(result.Passed, result.Summary);
    Assert.Empty(result.ConsoleErrors);
    Assert.Empty(result.FailedRequests);
}

[Fact]
public async Task CssShouldLoadQuickly()
{
    var result = await HtmlBrowserTester.TestUrlAsync("https://mysite.com");

    foreach (var css in result.CssResources)
    {
        Assert.True(css.Duration < TimeSpan.FromSeconds(2),
            $"CSS {css.Url} took {css.Duration?.TotalSeconds}s");
    }
}

Pester Example

Describe "Website Health Check" {
    It "Should have no console errors" {
        $result = Test-HtmlBrowser -Url "https://mysite.com"
        $result.ErrorCount | Should -Be 0
    }

    It "Should load all resources successfully" {
        $result = Test-HtmlBrowser -Url "https://mysite.com"
        $result.FailedRequestCount | Should -Be 0
    }

    It "Should load within 5 seconds" {
        $metrics = Test-HtmlBrowser -Url "https://mysite.com" -PerformanceOnly
        $metrics.TotalLoadTime.TotalSeconds | Should -BeLessThan 5
    }
}

πŸ”§ Advanced Features

Browser Configuration

# Custom browser settings
$session = Start-HTMLSession -Url 'https://example.com' `
    -UserAgent 'Custom Bot 1.0' `
    -ViewportWidth 1920 `
    -ViewportHeight 1080 `
    -DeviceScaleFactor 2 `
    -Visible `
    -SlowMo 1000

Request Interception

# Mock API responses
$handler = Register-HTMLRoute -Session $session -Pattern '**/api/data' -ScriptBlock {
    param($route)
    $route.FulfillAsync([Microsoft.Playwright.RouteFulfillOptions]@{
        Status = 200
        ContentType = 'application/json'
        Body = '{"status": "success", "data": []}'
    }) | Out-Null
}

# Navigate and test
Invoke-HTMLNavigation -Session $session -Url 'https://example.com/app'
Unregister-HTMLRoute -Session $session -Pattern '**/api/data' -Handler $handler

State Management

# Save browser state
Export-BrowserState -Session $session -Path 'session-state.json'

# Restore in new session
$newSession = Import-BrowserState -Path 'session-state.json' -Url 'https://example.com/dashboard'

πŸ—οΈ Third-Party Dependencies

HtmlTinkerX utilizes several high-quality open-source libraries:

πŸ“¦ HTML & DOM Processing

🎨 Resource Optimization

  • NUglify - BSD 2-Clause License - HTML/CSS/JS minification
  • Jsbeautifier - MIT License - JavaScript formatting
  • PreMailer.Net - Apache 2.0 License - Email CSS inlining

🌐 Browser Automation

πŸ”§ System Libraries

All dependencies are distributed under permissive licenses. Refer to each project's repository for complete license information.

πŸ“– Documentation & Support

  • πŸ“š Examples: Check the Examples folder for comprehensive usage samples
  • πŸ› Issues: Report bugs and request features on GitHub Issues
  • πŸ’¬ Discord: Join our Discord community for support and discussions
  • πŸ“ Blog: Read detailed tutorials on evotec.xyz

πŸ”„ Updates & Versioning

PowerShell Module Updates

Update-Module -Name PSParseHTML

NuGet Package Updates

dotnet add package HtmlTinkerX --version latest

⚠️ Important: Always test updates in a development environment before deploying to production. Breaking changes may occur between versions.

πŸ”§ Troubleshooting

Jint Version Conflict Warning

You may see warnings about conflicting Jint versions when building for .NET Framework 4.7.2:

warning MSB3277: Found conflicts between different versions of "Jint" that could not be resolved.
There was a conflict between "Jint, Version=3.1.6.0" and "Jint, Version=4.1.0.0"

Why this happens:

  • HtmlTinkerX references Jint 3.1.6 directly
  • AngleSharp.Js 1.0.0-beta.43 (a dependency) was compiled against a different version
  • This is a known issue with prerelease packages

Impact:

  • The warning only affects .NET Framework 4.7.2 builds
  • .NET 8.0 and .NET Standard 2.0 builds are not affected
  • The library will still work correctly as the older Jint version (3.1.6) is used

Solutions:

  1. Ignore the warning - It doesn't affect functionality
  2. Target only modern frameworks - Use .NET 8.0 or .NET Standard 2.0
  3. Add binding redirect in your app.config (for .NET Framework apps):
    <configuration>
      <runtime>
        <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
          <dependentAssembly>
            <assemblyIdentity name="Jint" publicKeyToken="2e92ba9c8d81157f" />
            <bindingRedirect oldVersion="0.0.0.0-4.0.0.0" newVersion="3.1.6.0" />
          </dependentAssembly>
        </assemblyBinding>
      </runtime>
    </configuration>

Browser Testing Issues

If browser tests fail:

  1. First run downloads browsers automatically - This can take a few minutes (~400MB)

    • You'll see: "Downloading Playwright driver... X% (Y MB/s)"
    • This only happens once per system
  2. Network timeout issues - Some sites may be slow or blocked

    • Try increasing timeout: Test-HtmlBrowser -Url $url -Timeout 60000
    • Test with a simple URL first: Test-HtmlBrowser -Url "http://httpbin.org/html"
  3. Behind a proxy - Set proxy environment variables:

    $env:HTTPS_PROXY = "http://proxy:8080"
    $env:HTTP_PROXY = "http://proxy:8080"

    Or use proxy parameters:

    Test-HtmlBrowser -Url $url -Proxy "http://proxy:8080" -ProxyCredential (Get-Credential)
  4. Clean and retry if you suspect corrupted downloads:

    Clear-HtmlBrowserCache -Force
    # Then run your test again - it will re-download browsers
  5. Manual browser installation (C#):

    // Ensure browser is installed before testing
    await HtmlBrowser.EnsureInstalledAsync(HtmlBrowserEngine.Chromium);

πŸ“„ License

Copyright (c) 2011 - 2025 Przemyslaw Klys @ Evotec. All rights reserved.

This project and its dependencies are distributed under various permissive licenses. See individual dependency repositories for specific license terms.


Built with ❀️ by Evotec - Making web content processing simple and powerful.

About

HtmlTinkerX is a powerful async C# library for HTML, CSS, and JS processing, parsing, formatting, and optimization. It provides web content processing capabilities including browser automation, document parsing with multiple engines, resource optimization, and more. PSParseHTML is the PowerShell module exposing HtmlTinkerX to PowerShell.

Topics

Resources

Stars

Watchers

Forks

Sponsor this project

  •  

Languages