🔥 Open Source No Code Web Data Extraction Platform • Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥
-
Updated
May 9, 2025 - TypeScript
🔥 Open Source No Code Web Data Extraction Platform • Turn Websites To APIs & Spreadsheets With No-Code Robots In Minutes 🔥
Extract Keywords from sentence or Replace keywords in sentences.
🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Lightweight library for scraping web-sites with LLMs
A beginner-friendly yet powerful Python toolkit for financial analysis and automation — built to make modern investing accessible to everyone
📰 Let ChatGPT Summarize Hacker News for You
🚜 Parse text and tables from PDF files.
Pure Python, lightweight, Pillow-based solver for Amazon's text captcha.
ContextGem: Effortless LLM extraction from documents
Benchmarking PDF libraries
Undetected Web-Scraping & Seamless HTML Parsing in Python!
A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.
Wikipedia information extraction library
A python client for the Sypht API
MiniAiLive Intelligent ID OCR for Reliable Identity Verification From document verification to data entry, our MiniAiLive OCR solution can help transform your identity verification process.
This repository provides usage examples for the Python module Newspaper3k.
A Python utility to digitize plots.
Accurate, private and configurable document retrieval LLM
Add a description, image, and links to the data-extraction topic page so that developers can more easily learn about it.
To associate your repository with the data-extraction topic, visit your repo's landing page and select "manage topics."