[Ingest] Route scraping through an optional V2Ray/Xray proxy (Telegram in Iran)
CI/CD / CI · dotnet build (push) Successful in 53s
CI/CD / Deploy · hamkadr (push) Successful in 1m12s

Telegram and some sources are filtered in Iran. .NET cannot speak vmess/vless/trojan, so add an Xray sidecar (compose service 'xray', behind the 'proxy' profile) that converts the admin's config into a local SOCKS5 proxy (xray:10808). New ScrapeHttpClients provider builds a proxied or direct HttpClient (WebProxy supports socks5/socks4/http) cached per proxy URL; all five ingestion sources (Telegram/Bale/Divar/Medjobs/Websites) now use it. Admin settings gain IngestProxyEnabled + IngestProxyUrl (migration; UI under sources). Added deploy/xray/config.json template + README with vmess/vless/trojan examples.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
soroush.asadi
2026-06-04 17:53:17 +03:30
parent 698565c460
commit cea27c8684
17 changed files with 1411 additions and 20 deletions
@@ -0,0 +1,55 @@
using System.Collections.Concurrent;
using System.Net;
using JobsMedical.Web.Models;
namespace JobsMedical.Web.Services.Scraping;
/// <summary>
/// Supplies the HttpClient used by ingestion sources, optionally routed through a proxy.
///
/// Telegram (t.me) and some other sources are filtered in Iran, so the admin can point
/// ingestion at a local proxy that an Xray/V2Ray client sidecar exposes (e.g.
/// <c>socks5://xray:10808</c>). .NET's WebProxy understands <c>socks5://</c>, <c>socks4://</c>
/// and <c>http://</c> schemes, so the same code path covers all of them.
///
/// Clients are cached per proxy descriptor (singleton). Changing the proxy in admin settings
/// makes the next run pick up a new client; the old one is disposed.
/// </summary>
public sealed class ScrapeHttpClients : IDisposable
{
private readonly ConcurrentDictionary<string, HttpClient> _cache = new();
/// <summary>The HttpClient for the given settings — proxied when enabled, direct otherwise.</summary>
public HttpClient For(AppSetting s)
{
var key = (s.IngestProxyEnabled && !string.IsNullOrWhiteSpace(s.IngestProxyUrl))
? s.IngestProxyUrl.Trim()
: "direct";
// Drop stale clients if the proxy URL changed (keep only "direct" + the current proxy).
foreach (var k in _cache.Keys)
if (k != "direct" && k != key && _cache.TryRemove(k, out var stale))
stale.Dispose();
return _cache.GetOrAdd(key, Build);
}
private static HttpClient Build(string key)
{
var handler = new HttpClientHandler { AutomaticDecompression = DecompressionMethods.All };
if (key != "direct")
{
handler.Proxy = new WebProxy(key); // socks5:// | socks4:// | http://
handler.UseProxy = true;
}
var c = new HttpClient(handler) { Timeout = TimeSpan.FromSeconds(20) };
c.DefaultRequestHeaders.UserAgent.ParseAdd("Mozilla/5.0 (compatible; HamkadrBot/1.0)");
return c;
}
public void Dispose()
{
foreach (var c in _cache.Values) c.Dispose();
_cache.Clear();
}
}