Fix role + contact mislabels seen on a live iranestekhdam ad
CI/CD / CI · dotnet build (push) Successful in 33s
CI/CD / Deploy · hamkadr (push) Successful in 44s

(1) Specialist guard: the AI sometimes labels a clearly-specialist ad («پزشک متخصص گوش و
حلق و بینی»، «فلوشیپ»، «فوق تخصص») as «پزشک عمومی», so an ENT post published as
«استخدام پزشک عمومی». When the primary role is GP but the ad text names a specialist, swap
it to «پزشک متخصص» (the subspecialty stays as a tag).

(2) Phone type: the landline regex 0\d{2,3} also matched 09xx MOBILE numbers and labeled them
«تلفن ثابت». Iranian landline area codes are 0[1-8]xx (021/026/…), never 09 — restrict it so
mobiles are no longer mislabeled as landlines.

Both apply to new ingests; existing mislabeled rows correct on turnover/reprocess.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
soroush.asadi
2026-06-21 13:29:43 +03:30
parent b48e7dbc65
commit 1c580e0f7a
2 changed files with 20 additions and 1 deletions
@@ -361,7 +361,9 @@ public class HeuristicListingParser : IListingParser
if (d.Length == 10 && d[0] == '9') d = "0" + d;
Add(ContactType.Mobile, d);
}
foreach (Match m in Regex.Matches(latin, @"(?<!\d)0\d{2,3}[\s-]?\d{7,8}(?!\d)"))
// Landline area codes start 0[1-8] (021 Tehran, 026 Karaj, …) — never 09, which is a MOBILE.
// The old 0\d{2,3} matched 09xx numbers and mislabeled mobiles as «تلفن ثابت».
foreach (Match m in Regex.Matches(latin, @"(?<!\d)0[1-8]\d{1,2}[\s-]?\d{7,8}(?!\d)"))
Add(ContactType.Phone, Regex.Replace(m.Value, @"\D", ""));
return list.Take(8).ToList();