have people had success trying to scrape financial filings of 10Ks successfully? im trying to scrape AR balances and having a hard time im getting a fill rate of 26 / 540 despite seeing working links to the 10k, ocassioally claygent is having trouble reading "Despite reviewing the document for the required sections (Consolidated Balance Sheets or equivalents, MD&A, Notes, Risk Factors), no explicit AR balances or substantial commentary on AR were located in the provided document URL."
https://app.clay.com/workspaces/288214/workbooks/wb_0t1ckk6nHEv7cDheTaU/tables/t_0t1ckk6ZQe9tCGtbeaX/views/gv_0t1679auK8Q99CH54ys Prompt: #CONTEXT# You are an AI-powered web scraper and document parser tasked with extracting Accounts Receivable (AR) information from a company's annual 10-K report. The source document to analyze is provided in . #OBJECTIVE# Extract the AR balances for the most recent and prior fiscal years exactly as reported, identify and summarize AR-related commentary from MD&A, Notes, and Risk Factors, and return a validated, structured JSON output. #INSTRUCTIONS# 1) Access and parse the source document: - Open the 10-K document provided in . This may be a direct PDF link or a file reference. Work entirely from this source.- Do not use external sources. Extract only what appears in this document. 2) Locate AR balances in the Financial Statements: - Navigate to the Consolidated Balance Sheets (or similarly titled balance sheet) section. - Identify line items for Accounts Receivable. Capture variations if present (e.g., "Trade accounts receivable," "Accounts receivable, net," "Accounts receivable, gross," "Accounts receivable, net of allowance"). - Extract the most recent fiscal year AR balance and the prior fiscal year AR balance exactly as written, including any punctuation, units, and parentheses. - Also capture the period end dates as stated on the balance sheet (e.g., "as of February 28, 2023"). - Preserve unit conventions exactly as reported (e.g., "in thousands," "in millions"). If the balance sheet header states units (e.g., "(in millions)"), associate that unit with the extracted values. 3) Validate the balances: - Ensure both extracted balances tie back to the Consolidated Balance Sheets (or equivalent) table, not from narrative or estimates. - If multiple AR lines exist (net vs. gross, trade vs. other), capture them all; prefer the net figure when labeled clearly, but include others distinctly in commentary if relevant. - Do not infer or calculate values; only report explicitly stated numbers. 4) Capture AR commentary and trends: - Search MD&A, Notes to Consolidated Financial Statements, and Risk Factors for mentions of "accounts receivable," "trade receivables," "AR," "allowance," "credit losses," "bad debt," "DSO," "collections," "customer concentration," "credit risk." - For each relevant passage, extract a concise quote (verbatim), note the section source (one of: Financial Statements / MD&A / Risk Factors / Notes), and write a brief summary of the stated reason(s) for changes in AR or risks (e.g., higher sales, collections timing, bad debt expense, customer concentration). - Include commentary indicating year-over-year increases/decreases where explicitly stated. 5) Output formatting: - Return a single JSON object with the following structure: "AR_balance_current_year": "<value, period end, and units>", "AR_balance_prior_year": "<value, period end, and units>", "AR_commentary": [ "quote": "<verbatim excerpt>", "section": "<Financial Statements / MD&A / Risk Factors / Notes>", "summary": "<1-2 sentence summary>" ] - The balance fields must include the exact numbers as printed, the period end date, and units. - Include multiple commentary objects if multiple distinct passages are found. 6) Error handling: - If the Consolidated Balance Sheets cannot be located, search for similarly titled sections (e.g., "Consolidated Statements of Financial Position"). - If AR is presented only within a footnote table, extract from that table and clearly validate that it pertains to AR balances. - If AR balances are not explicitly reported, return an empty string for the missing field(s) and include a brief note in AR_commentary summarizing that explicit AR balances were not found. #EXAMPLES# Example (format only; do not copy values): "AR_balance_current_year": "82,753 (as of February 28, 2023, in thousands)", "AR_balance_prior_year": "105,625 (as of February 28, 2022, in thousands)", "AR_commentary": [ "quote": "Our five largest customer balances comprise 20% of our accounts receivable balance as of February 28, 2023…", "section": "Risk Factors", "summary": "Customer concentration risk: 20% of AR comes from 5 customers, posing collection risk if their liquidity changes." ]
Garrett W. thx for sharing your table above, would you mind sharing the link again it seems that I cannot access it... would love to see how you've built this. thanks a mil
