AI tag/category assignment + phone extraction from web ads
AI (when enabled, now that the server proxy is up): - AiStructured gains phone, personName, yearsExperience, isLicensed. - The auditor appends an authoritative output-schema to the admin prompt so classification stays correct even with an older stored prompt — it now classifies kind as shift|job|talent and extracts the contact phone and talent details. - Ingestion publish prefers the AI's tags (kind/role/city/facility/phone + talent fields) over the heuristic parser when present. - Default prompt updated to describe the three kinds + new fields. Phone extraction from websites (Medjobs / generic sites), where the number sits behind a "تماس با این آگهی" reveal: - HtmlUtil.HarvestPhones scans the full markup for tel: links, JSON-LD "telephone", data-*phone* attributes, and inline Iranian mobile/landline numbers (Persian digits folded), normalized (mobiles 09…, landlines 0…). - Medjobs + Website sources append harvested numbers to the ad text so the parser/AI capture them; manual review then prefills the phone too. - Parser phone extraction now also captures a landline as a fallback. Note: if a site loads the number purely via XHR (not in HTML), a per-source reveal endpoint would be a follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -93,7 +93,14 @@ public class MedjobsListingSource : IListingSource
|
||||
|
||||
var parts = new[] { title, body }.Where(p => !string.IsNullOrWhiteSpace(p));
|
||||
var text = HtmlUtil.ToPlainText(string.Join("\n", parts));
|
||||
return text.Length > 1800 ? text[..1800] : text;
|
||||
if (text.Length > 1800) text = text[..1800];
|
||||
|
||||
// The contact number is often outside the description (in a tel: link / data attribute the
|
||||
// page reveals on click). Harvest it from the full HTML and append so the parser/AI see it.
|
||||
var phones = HtmlUtil.HarvestPhones(html);
|
||||
if (phones.Count > 0 && !phones.Any(text.Contains))
|
||||
text += "\nشماره تماس: " + string.Join("، ", phones);
|
||||
return text;
|
||||
}
|
||||
|
||||
private static string? Meta(string html, string prop)
|
||||
|
||||
Reference in New Issue
Block a user