Commit Graph

11 Commits

Author SHA1 Message Date
soroush.asadi 88eca92333 Facility data hygiene: merge duplicates, drop junk-named facilities
CI/CD / CI · dotnet build (push) Successful in 1m51s
CI/CD / Deploy · hamkadr (push) Successful in 2m17s
Cleans up the crawl-generated facility table that surfaced garbage on /Facilities
(«بیمارستان هستم», «... از مدجابز», bare «کلینیک», «سازمان برنامه جنوبی» x3):

- FacilityMatcher.IsJunkName: shared detector for non-names — bare type words, cores
  made only of filler/verb tokens, and leaked crawl-source/placeholder text. Added
  داروخانه/آسایشگاه to the generic type words so bare ones are caught and dedupe better.
- HeuristicListingParser.ExtractFacilityName now rejects junk candidates (and emoji), so
  new ingests fall back to the shared placeholder instead of forging a fake facility.
- IngestionService.MergeAndCleanFacilitiesAsync (+ admin button): folds junk facilities
  into the placeholder and merges Persian-fuzzy duplicates into one keeper, repointing
  their shifts/jobs first. Hard guard: only purely crawl-generated, unmanaged facilities
  are removed — employer-owned and verified facilities are never touched.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 05:40:29 +03:30
soroush.asadi 8be275596b Make the listing purge SEO-standard: archive (not delete) + 410 Gone
CI/CD / CI · dotnet build (push) Successful in 49s
CI/CD / Deploy · hamkadr (push) Successful in 2m13s
Per the project archive-not-delete convention, the in-place purge now sets out-of-scope
and duplicate aggregated jobs/shifts to ShiftStatus.Archived instead of hard-deleting:
- The row is retained for analysis and the change is reversible.
- The listing drops out of every public screen and the sitemap (which filter Status == Open).
- Its detail page now returns 410 Gone (the standard permanent-removal signal) so search
  engines deindex it cleanly, instead of leaving the off-topic page live at 200 or hard-404ing.
Dedupe of job reposts archives the older copies the same way. Coordinate backfill now also
skips non-Open rows. Valid listings are untouched, so IDs/URLs stay stable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 05:25:51 +03:30
soroush.asadi e2011d335e Ingestion data-quality + map fixes: AI salary, geocode coverage, in-place backfill & purge
CI/CD / CI · dotnet build (push) Successful in 30s
CI/CD / Deploy · hamkadr (push) Successful in 1m11s
- Jobs now keep the AI-extracted salary (d.PayAmount ?? parsed.PayAmount); they
  previously used only the parser figure, so every aggregated opening showed «توافقی».
- Geocoder also scans the ad body, so Tehran ads that name a neighbourhood only in
  free text («… در سهروردی») get an approximate map point.
- New BackfillCoordsAsync (+ admin button): fills missing coords on existing aggregated
  listings from their stored text, in place — no ID/URL churn, SEO-safe.
- New PurgeInvalidAggregatedAsync + DedupeJobsAsync (+ admin button): in-place removal of
  out-of-scope (domestic/promo/spam) aggregated jobs/shifts and duplicate job reposts,
  keeping valid listings' IDs.
- Jobs detail page always renders the location card (matches Shifts) instead of hiding it
  when coords are missing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-21 05:09:39 +03:30
soroush.asadi fb7bfad9ce Reprocess: SEO-safe applicants-only default (don't churn indexed shift/job URLs)
CI/CD / CI · dotnet build (push) Successful in 2m11s
CI/CD / Deploy · hamkadr (push) Successful in 2m10s
Reprocess deletes+rebuilds aggregated listings, which changes their IDs. Shift/Job
detail pages are indexed and in the sitemap, so churning them would 404 ranked
URLs. «آماده به کار» pages are NoIndex + Disallow, so rebuilding them has zero SEO
impact — and that's where all the duplicate/sprawl problems were.

ReprocessAsync(talentOnly: true) now only deletes/rebuilds TalentListings and
skips non-talent raws (leaving shift/job listings + their RawListing links
untouched). Admin button relabelled «پردازش مجددِ آماده به کارها (امن برای SEO)».
Shifts/jobs self-clean via normal ingestion turnover.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 16:08:20 +03:30
soroush.asadi d62929ca0d AI qualify: de-dupe applicants, base roles, closed categories, tag hygiene + reprocess-stored action
CI/CD / CI · dotnet build (push) Successful in 2m35s
CI/CD / Deploy · hamkadr (push) Successful in 1m23s
Qualified live applicants and found three problems, all fixed:
- Duplicate cards: one ad fanned out into «پرستار» + «پرستار کودک» (same person).
  Applicants now publish ONE listing (no role fan-out); secondary roles → tags.
- Role sprawl: modifiers became roles. Prompt now returns the BASE profession
  and pushes age-group/ward/seniority to tags; new roles only for a genuinely
  new base profession (تکنسین داروخانه ✓, پرستار کودک ✗).
- Tag/category noise: categories pinned to the 5 fixed groups (+سایر, never
  invented); BuildTags drops pay/contact/location/fragment words.

Reprocess action: IngestionService.ReprocessAsync re-runs the current pipeline
over every stored RawListing WITHOUT re-fetching (keeps the raw text, so nothing
is lost to sources only exposing recent posts), deleting the old aggregated
posts and republishing cleanly. Admin dashboard button «پردازش مجددِ آیتم‌های
ذخیره‌شده» runs it on a background scope; result lands in the run-log.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-20 14:24:20 +03:30
soroush.asadi 380243b669 Divar geo-coords to facility map + medical gate + RawListing FK/geo migrations
CI/CD / CI · dotnet build (push) Successful in 2m6s
CI/CD / Deploy · hamkadr (push) Successful in 2m3s
2026-06-09 21:38:55 +03:30
soroush.asadi da6e86fa7f [Ingest] Full results page (all statuses) + inline quick-reject in queue
CI/CD / CI · dotnet build (push) Successful in 2m13s
CI/CD / Deploy · hamkadr (push) Has been cancelled
New /Admin/Ingested page lists every crawled item with its outcome, filterable by status (همه/در صف/پرچم‌خورده/منتشرشده/ردشده) with per-status counts and a link to the published shift or the review page. Linked from the run-history header and the admin panel nav. Plus an inline ✕رد (quick-discard) button on each queue/flagged row so admins can audit without opening the review page; full accept/reject stays on /Admin/Review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 06:41:17 +03:30
soroush.asadi 487c7ca82f [Ingest] Persistent crawl run-log + per-source breakdown on admin queue
CI/CD / CI · dotnet build (push) Has been cancelled
CI/CD / Deploy · hamkadr (push) Has been cancelled
Each ingestion run now records an IngestionRun row (found/queued/published/flagged/spam/duplicates + a per-source detail string). Admin → صف آگهی‌ها shows a «تاریخچه جمع‌آوری» table of the last 15 runs (hover a row for the per-source breakdown), so admins can see how much each source found vs added over time. IngestionSummary gains TotalFetched. Migration: IngestionRuns table.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 06:23:58 +03:30
soroush.asadi 3c08c1a265 Move ingestion + Telegram/Bale/Divar config to DB-backed admin settings
CI/CD / CI · dotnet build (push) Successful in 6m22s
CI/CD / Deploy · hamkadr (push) Failing after 3s
- AppSetting gains source config: AutoIngestEnabled, IngestIntervalMinutes, Telegram/Bale/Divar enabled+channels/token/queries
- IListingSource.FetchAsync(AppSetting) — sources read config from DB, not IOptions/appsettings; sample source dev-only
- IngestionWorker reads AutoIngest+interval from DB each cycle (toggle at runtime, no redeploy)
- /Admin/Settings gets a 'منابع جمع‌آوری' section; removed Ingestion env/appsettings + compose env vars
- ENV_FILE shrinks to HOST_PORT + POSTGRES_* + ADMIN_PHONE (AI + sources are all in-admin); migration

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 00:44:11 +03:30
soroush.asadi 931b7b6ffb Add scrape/ingestion engine + validation, and 24h shift hour-range visualization
Scrape engine (Services/Scraping/): pluggable IListingSource (working sample + Telegram/Divar credential-ready stubs) → IngestionService (content-hash dedupe → parse → validate → review queue) → ListingValidator (completeness score + spam screen) → IngestionWorker (config-gated hosted service). RawListing gains ContentHash/Confidence/ValidationNotes; RawListingStatus.Flagged. Admin /Admin gets run-now, source list, confidence + flagged queue.

Hour-range viz: _HourBar 24h timeline bar (colored by type, overnight wrap) on shift cards, recommendation cards, and detail.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 08:18:19 +03:30
soroush.asadi 2fb86a435e Initial commit — Hamkadr (همکادر) healthcare-staffing marketplace
ASP.NET Core 10 Razor Pages + PostgreSQL/EF Core. RTL Persian, Jalali dates, self-hosted Vazirmatn, teal/coral brand.

Features:
- Shift listings: browse/filter (city, district, role, type, pay), weekly Jalali calendar, detail + interest handoff, near-me distance sort
- Hiring (استخدام) listings with employment type + salary range
- Pattern-engine recommendations + anonymous interest tracking (visitor cookie)
- Heuristic Persian listing-parser + admin queue (raw channel post → shift/job)
- Phone-OTP cookie auth + visitor-history linking + profile

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 01:44:24 +03:30