Ingestion data-quality + map fixes: AI salary, geocode coverage, in-place backfill & purge
CI/CD / CI · dotnet build (push) Successful in 30s
CI/CD / Deploy · hamkadr (push) Successful in 1m11s

- Jobs now keep the AI-extracted salary (d.PayAmount ?? parsed.PayAmount); they
  previously used only the parser figure, so every aggregated opening showed «توافقی».
- Geocoder also scans the ad body, so Tehran ads that name a neighbourhood only in
  free text («… در سهروردی») get an approximate map point.
- New BackfillCoordsAsync (+ admin button): fills missing coords on existing aggregated
  listings from their stored text, in place — no ID/URL churn, SEO-safe.
- New PurgeInvalidAggregatedAsync + DedupeJobsAsync (+ admin button): in-place removal of
  out-of-scope (domestic/promo/spam) aggregated jobs/shifts and duplicate job reposts,
  keeping valid listings' IDs.
- Jobs detail page always renders the location card (matches Shifts) instead of hiding it
  when coords are missing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
soroush.asadi
2026-06-21 05:09:39 +03:30
parent a16a805869
commit e2011d335e
4 changed files with 173 additions and 10 deletions
@@ -120,6 +120,30 @@ public class IndexModel : PageModel
return RedirectToPage();
}
/// <summary>
/// Fill missing map coordinates on existing aggregated Tehran listings from their stored ad text
/// (TehranGeo). In place — no AI calls, no re-fetch, and crucially no delete/recreate, so indexed
/// shift/job URLs keep their IDs. Fast (pure DB + string matching), so it runs inline.
/// </summary>
public async Task<IActionResult> OnPostBackfillCoordsAsync()
{
var n = await _ingest.BackfillCoordsAsync();
IngestMessage = $"مختصات تقریبی برای {n} آگهی جمع‌آوری‌شده از روی متن آگهی تکمیل شد (بدون تغییر شناسه یا آدرس صفحه).";
return RedirectToPage();
}
/// <summary>
/// In-place cleanup of existing aggregated jobs/shifts: delete only the out-of-scope ones
/// (domestic-helper / promotional / spam) per the current validator, plus near-duplicate job
/// reposts. Valid listings keep their IDs/URLs. No re-fetch, no AI — runs inline.
/// </summary>
public async Task<IActionResult> OnPostPurgeInvalidAsync()
{
var (removed, deduped) = await _ingest.PurgeInvalidAggregatedAsync();
IngestMessage = $"پاک‌سازیِ درجا: {removed} آگهیِ خارج از حوزه (خدمات منزل/تبلیغاتی/اسپم) و {deduped} استخدامِ تکراری حذف شد. سایر آگهی‌ها و شناسه/آدرسشان دست‌نخورده ماند.";
return RedirectToPage();
}
private async Task LoadAsync()
{
Queue = await _db.RawListings