COLLECTED BY
Organization:
Internet Archive
Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
The Wayback Machine - https://web.archive.org/web/20200728080351/https://github.com/topics/extract
Here are
501 public repositories
matching this topic...
Easily create & extract archives, and compress & decompress files of various formats
Camelot: PDF Table Extraction for Humans
Updated
Jul 21, 2020
Python
SwiftSoup: Pure Swift HTML Parser, with best of DOM, CSS, and jquery (Supports Linux, iOS, Mac, tvOS, watchOS)
Updated
Jun 30, 2020
Swift
This extension is now maintained in the Microsoft fork.
Updated
Jul 28, 2020
TypeScript
Reversing Google's 3D satellite mode
Converts a pdf file into a text file while keeping the layout of the original pdf. Useful to extract the content from a table in a pdf file for instance. This is a subclass of PDFTextStripper class (from the Apache PDFBox library).
Updated
Feb 27, 2020
Java
A library to read, parse, export and make subsets of different types of font files.
GUI and API library for working with Engine assets, serialized and bundle files
PDFsam, a desktop application to extract pages, split, merge, mix and rotate PDF files
Updated
Jul 20, 2020
Java
Download and extract files
Updated
Jun 13, 2020
JavaScript
The extension provides refactoring tools for your React codebase
Updated
Jul 18, 2020
TypeScript
💬 Python scripts to parse Messenger, Hangouts, WhatsApp and Telegram chat logs into DataFrames.
Updated
May 28, 2020
Python
A web interface to extract tabular data from PDFs
Updated
Jul 17, 2020
HTML
Preview extractor for news, articles and full-texts in Swift
Updated
Dec 30, 2019
Swift
Parse and output TODOs and FIXMEs from comments in your files
Updated
Jul 20, 2020
TypeScript
A tool to view and extract the contents of an Windows Installer (.msi) file.
Database Subsetting and Relational Data Browsing Tool.
Updated
Jul 19, 2020
Java
Reversing Apple's 3D satellite mode
💎 Detect , track and extract the optimal face in multi-target faces (exclude side face and select the optimal face).
Updated
Jun 1, 2019
Python
Rip, extract and convert subtitles to .srt closed captions from .xml/dfxp/ttml and .vtt/WebVTT (e.g. netflix)
Updated
Dec 29, 2019
Python
Extracting archives made easy
Updated
Apr 1, 2020
JavaScript
webpack loader to extract HTML and CSS from the bundle
Updated
Jul 13, 2020
JavaScript
Read basic info about an application from .apk file.
PhpZip is a php-library for extended work with ZIP-archives.
Data interchange, editor suite, and runtime re-implementations for games by Retro Studios | Mirror
A serverless architecture for orchestrating ETL jobs in arbitrarily-complex workflows using AWS Step Functions and AWS Lambda.
Updated
Dec 28, 2019
Python
A simple resume parser used for extracting information from resumes
Updated
Jul 5, 2020
Python
Better analyze information, in all its forms
Updated
Jul 27, 2020
Java
Extract structured variables from sass files
Updated
Dec 20, 2019
JavaScript
Parse and extract URL meta information (images, description, title, etc.)
Updated
Jul 27, 2020
JavaScript
Improve this page
Add a description, image, and links to the
extract
topic page so that developers can more easily learn about it.
Curate this topic
Add this topic to your repo
To associate your repository with the
extract
topic, visit your repo's landing page and select "manage topics."
Learn more
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session.
You signed out in another tab or window. Reload to refresh your session.