Key Responsibilities
- Develop and maintain Python-based scraping scripts (e.g., using requests, BeautifulSoup, Selenium, Playwright, Scrapy).
- Implement rotating proxy, CAPTCHA bypass, and user-agent randomization to ensure high scraping success rate.
- Handle structured and unstructured data from APIs, HTML, JSON, and XML.
- Schedule and orchestrate scraping jobs (via cron, Airflow, n8n, or Prefect).
- Integrate pipelines with Snowflake, Google Sheets, or cloud storage (S3, GCS, SharePoint).
- Monitor, log, and troubleshoot scraping workflows to ensure reliability.
- Suggest and prototype new scraping targets or data enrichment sources.
- Stay updated with web structure or API changes and adapt scripts accordingly.
Requirements
- Bachelor's degree in Computer Science, Information Systems, or related field.
- 2+ years of experience in web scraping, crawling, or automation scripting.
- Proficiency in Python and libraries like requests, BeautifulSoup, Selenium, Playwright, or Scrapy.
- Experience with headless browser automation (e.g., Puppeteer, Playwright).
- Experience handling proxies, headers, and rate-limiting strategies.
- Knowledge of containerization (Docker) and Git-based CI/CD.
- Experience scraping social media or e-commerce