Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
WInte.r is a Java framework for end-to-end data integration. The WInte.r framework implements well-known methods for data pre-processing, schema matching, identity resolution, data fusion, and result evaluation.
Development of the Gellish Communicator reference application and tools for universal data exchange and data integration supporting Formal English and other Gellish formalized natural languages.
Repo for Data Warehouse Concepts, Design, and Data Integration by University of Colorado System (coursera)(Notes,Assignments, quiz and research papers)
Pipes for MarkLogic DataHub is visual programming tool for MarkLogic Data Hub. It integrates with MarkLogic's Datahub and produces custom code step(s) using a no-code UI environment. *Documentation*: https://marklogic-community.github.io/pipes/
Match schema attributes of relational databases by value similarity. As a study assignment, this isn't well documented, but you can contact me for questions and I may even add docs, if I sense enough interest.