I'm importing data from an Apify actor that queries Indeed for job postings. I want to fully automate the process that it extracts the latest data after the Apify actor runs and I want to make sure I don't run HTTP api calls for the same job posting. What would be the best way to go about doing this.
You can set up an automated workflow using our Apify source integration by following these steps:1. When creating a table, select "Import data from Apify actor" as your source. 2. Make sure to leave the "Get Data from Specific Run" option turned off. This way, the table will automatically update with data from the latest actor run and all subsequent runs (updated daily). 3. Enable the auto-update feature, which will automatically enrich new rows when they're added to the table. This helps maintain data freshness.This setup will ensure you're always getting the latest job posting data while avoiding duplicate calls, as Clay will automatically handle the updates from your Apify actor's most recent runs. Sources: New at Clay | Clay.com Pull Data from Apify Actor with Apify Apify integration overview | Clay University
How can I ensure that I don't run against a record (indeed job posting) that has previously ran before? From the apify actor
You can use conditional runs to prevent re-running against previously processed job postings. Here's how:1. Create a column to track whether a record has been processed 2. In the Apify enrichment settings, set up a conditional run formula that only runs when this tracking column is empty or false 3. After the Apify actor runs, it will populate data and you can set your tracking column to trueThis way, the enrichment will only run on new, unprocessed records. Just remember that with auto-update enabled, new rows will automatically be enriched when added to the table. Sources: Apify - Clay University HTTP API - Clay University
So essentially if you have the table set-up to automatically update the table from the Apify actor which scrapes Indeed job postings, it will only get the new data and not the same data that it continuously scrapes?
Also, how do you ensure that you are only getting new data from the Apify actor runs?
When using auto-update with Apify, the table will automatically enrich new rows when they are added, and the import refreshes every 24 hours.If you want to avoid duplicates and only get new data, you should not specify a particular run as an input. Instead, the integration will pull data from the latest run and all subsequent runs, updating daily.Keep in mind that if you don't see new data immediately, you can check when the next update will occur or manually trigger an update by clicking the Apify column header and selecting sources. Sources: Apify integration overview | Clay University Pull Data from Apify Actor with Apify
Looks like you're still looking for help. We're escalating this over to our support team and someone should be in touch soon. In the meantime, feel free to include any other information that may be helpful in solving your issue!
Thank you
Could you also setup a import to refresh hourly instead of every 24 hours?
Hey there! 😊 Let me help you with automating those Indeed job postings! For avoiding duplicate calls, Auto-Dedupe is your best friend here! It'll make sure you're not running the same data twice and keep your table clean. But here's the thing - for getting fresh Apify data, webhooks would actually work better than imports. You can: 1. Connect Apify directly to Clay using webhooks 2. Have it automatically send new data over when your actor runs 3. Let Auto-Dedupe handle any duplicates Check out these guides for setting it up: - Clay webhooks: https://www.clay.com/university/lesson/intro-to-typeforms-webhooks-and-inbound-lead-sources-automated-inbound - Apify webhooks: https://docs.apify.com/platform/integrations/webhooks This way you won't be limited by the 24-hour refresh and can get updates as often as you need! Need any help setting this up? 🙌
Thanks, this is useful information. How do I go about de-duping the data? Should I do it based on the Indeed URL or Indeed ID?
Also, I'm not seeing where to go to add a new source and add a webhook
They appear to be on the Graph page, and it doesn't have the buttons in the video on my webpage
Nevermind, I was able to find it.
Great to hear, let me know if we can do anything else!
This thread was picked up by our in-app web widget and will no longer sync to Slack. If you are the original poster, you can continue this conversation by logging into https://app.clay.com and clicking "Support" in the sidebar. If you're not the original poster and require help from support, please post in 02 Support.