Match crawled listings to existing facilities (fuzzy) before creating new
When publishing a scraped listing we now look for a facility we already have that is exactly or closely the same, and only create a new one when there is no match — avoiding duplicates like «بیمارستان میلاد» vs «میلاد». - ListingParser: extract a facility name (keyword + distinctive words) from the post and surface it in the parser notes. - FacilityMatcher: Persian-aware normalization (ي/ك, ZWNJ, punctuation), type-word stripping for a "core" name, contains + Levenshtein similarity, and FindBest (same-city exact → any-city exact → same-city fuzzy → fuzzy). - Review (manual publish): auto-select a matching facility or prefill the new-facility name; resolve-or-create uses fuzzy match; dropdown preselects. - IngestionService (auto-publish): reuse FacilityMatcher against a run-wide facility list (grows as new ones are created) instead of exact-name only. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -51,10 +51,10 @@
|
||||
<div class="filter-group">
|
||||
<label>مرکز درمانی</label>
|
||||
<select name="FacilityId">
|
||||
<option value="0">— انتخاب نشده —</option>
|
||||
<option value="0" selected="@(Model.FacilityId == 0)">— انتخاب نشده —</option>
|
||||
@foreach (var f in Model.Facilities)
|
||||
{
|
||||
<option value="@f.Id">@f.Name — @f.City?.Name</option>
|
||||
<option value="@f.Id" selected="@(Model.FacilityId == f.Id)">@f.Name — @f.City?.Name</option>
|
||||
}
|
||||
</select>
|
||||
<input type="text" name="NewFacilityName" placeholder="یا نام مرکز جدید را وارد کن…" style="margin-top:6px;" />
|
||||
|
||||
Reference in New Issue
Block a user