Improving Employee Count Extraction with Argon Claygemt Model: Prompt Tips and Consistency Issues
Im using Argon as a Claygemt model and this prompt: # DOMAIN RESTRICTION - CRITICAL NEVER visit external domains. You can ONLY access pages ending with {Company Domain}. NEVER visit linkedin.com, crunchbase.com, wikipedia.org, or any other external site. # ROLE Extract employee counts from company websites using ONLY the official domain provided. # VARIABLES Company Name: Company Domain: # DOMAIN RULES - READ TWICE ONLY access URLs ending exactly with {Company Domain} Subdomains allowed ONLY if they end with {Company Domain} (careers.example.com ✓, example.de ✗) NEVER follow redirects to external domains NEVER use LinkedIn, Crunchbase, Wikipedia, news sites, or third-party sources # SEARCH PRIORITY High Priority Pages: /about, /company, /careers, /investors, /press, /team, /operations Check All PDFs: Annual reports, fact sheets, investor docs hosted on domain ALL other Pages: All navigation links, footer links, sitemap pages (you must check ALL pages until you either find a number or there are no more pages left) # EXTRACTION RULES Must refer to the company itself (not subsidiaries/groups) Prefer exact numbers over ranges For ranges (200-500), use upper bound (500) For "1,200+" use base number (1200) Most recent data wins (ignore 36+ month old data) Copy quotes exactly, max 30 words # VALIDATION STEP Before outputting ANY number, confirm the source URL ends with {Company Domain}. If not, discard it. # OUTPUT FORMAT Employee Count: [number or 0] Evidence 1 URL: [exact URL] Evidence 1 Quote: "[exact text from page]" # FAILURE CONDITION Output "Employee Count: 0" ONLY after exhaustively checking ALL pages and PDFs on the domain. [1] #EXAMPLE INPUTS, DATA FOUND & OUTPUTS# Example 1: Multiple employee numbers on the same website Input: Company Name: Eiffage Infra-Bau SE Domain: bau.eiffage-infra.de Data Found: Quote 1: “At Eiffage Infra-Bau, more than 4,000 employees work in a wide range of professions.” Quote 2: “With 7,200 employees at 95 locations, we generate revenue of EUR 2 billion.” Quote 3: “With over 72,500 employees worldwide, we are one of Europe’s leading construction groups.” Output: 7,200 Reasoning: Quote 3 refers to the global parent group and must be excluded. Quotes 1 and 2 refer to the same local company. When multiple valid local numbers exist, select the highest stated value. Example 2: Employees described by job roles instead of “employees” Input: Company Name: Xervon GmbH Domain: xervon.de Data Found: Quote 1: “With more than 1,000 skilled scaffolders and scaffolding specialists, we are well positioned in terms of personnel.” Output: 1,000 Reasoning: The roles describe the primary workforce of the company. No indication that the number refers to a subset. Treat role-based counts as total employees when no broader figure exists. Example 3: Company uses “Group” but operates as a single entity Company Name: Local Manufacturing Company Domain: manufacturingco.com Data Found: Quote 1: “560 employees work for the Local Manufacturing Group across Germany.” Output: 560 Reasoning: No evidence of multiple subsidiaries or separate operating companies. “Group” is used as a naming convention only. Attribute the number directly to the company.
- 1.
Is this the right model?
- 2.
Can the prompt be written better? How?
- 3.
Why does the agent sometimes return different numbers on the same domain if I let him run twice?
