RE: Learn Python Series (#13) - Mini Project - Developing a Web Crawler Part 1

You are viewing a single comment's thread from:

Learn Python Series (#13) - Mini Project - Developing a Web Crawler Part 1

in utopian-io •  7 years ago 

I haven't (deliberately!) explained how to use all the Requests attributes and methods, including setting a user agent. But please know that I am well-aware of (temporary) web crawler blocking done by some web servers detecting crawls from a certain IP. Please also know that in those type of cases only setting a self-defined user agent adds zero value: you will get blocked nonetheless.

There are several workarounds for that (e.g. block detection combined with using a multitude of VPNs and/or Onion IPs, and/or even IP spoofing). But since I'm an ethical person, in situations such as those, I ask myself "would it be OK if I used those techniques on this webserver?" And my answer to that is: "No, let's look somewhere else for the data I need, or contact the web admin of that webserver to discuss if they are willing to voluntarily provide me with the data I'm looking for."

As a rule of thumb I always try to treat people like I'd like them to treat me. That's not always possible if the other party thinks and feels differently, but in such cases I prefer to interact with people that do feel the same as I do.

Google's motto is (was?) "Don't be evil" For me personally I'm taking it one step further: "Be Me, always, which equals: Be a Good Person".

@scipio

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!