The Wayback Machine - https://web.archive.org/web/20200717041950/https://github.com/topics/scraping
Skip to content
#

scraping

Here are 1,917 public repositories matching this topic...

teodoroanca
teodoroanca commented Apr 16, 2020

Description

When I scrape without proxy, both https and http urls work.
Using proxy through https works just fine. My problem is when I try http urls.
In that moment I get the twisted.web.error.SchemeNotSupported: Unsupported scheme: b'' error

As I see, most of the people have this issue the other way around.

Steps to Reproduce

  1. Scrape a http link with proxy

**Expected

jlvdh
jlvdh commented Nov 27, 2018

What is the current behavior?

Crawling a website that uses # (hashes) for url navigation does not crawl the pages that use #

The urls using # are not followed.

If the current behavior is a bug, please provide the steps to reproduce

Try crawling a website like mykita.com/en/

What is the motivation / use case for changing the behavior?

Though hashes are not ment to chan

ferret

Improve this page

Add a description, image, and links to the scraping topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the scraping topic, visit your repo's landing page and select "manage topics."

Learn more

You can’t perform that action at this time.