Optimizing AI Job Scraping: Reducing Unnecessary Data Costs

It's me again ! I have a worflow where I scan linkedin job offers via AI to detect projects in line with the solution I'm selling. The problem is, I'm scrapping the full DOM from the LinkedIn job page, which includes:

Duplicates of the same job description
Accessibility tags
JS-based filler content
Company marketing, policies, and benefits
Footer / menu text ...

it costs me too much tokens. I was thinking about pre-scrapping the text and target only the main job description Div, but it often fail to scrap it. What would you recommand ?

2 comments