Google captcha issue
bill at herrin.us
Fri Jun 19 16:40:16 UTC 2020
On Fri, Jun 19, 2020 at 9:15 AM Christopher Tyler
<chris at totalhighspeed.net> wrote:
> We run a smaller ISP of about 7.5k customers and the other day we got an email (excerpt below) from one of Google's automated tools.
> We are seeing automated scraping of Google Web Search from a large
> number of your IPs. Automated scraping violates our /robots.txt file
> and also our Terms of Service. We request that you terminate this
> traffic immediately. Failure to do so may cause your network to be
> blocked by our abuse systems.
> To allow you to identify the traffic, we are providing a list of
> your IPs they used today (Source field), as well as the most common
> destination (Google) IP and port and a timestamp of a recent request
> (in UTC) to aid in your identification. Note that this list may not
> be exhaustive, and we request that you terminate all such traffic, not
> just traffic from IPs in this list.
> All of the destination ports are either 80 or 443, so they at least appear to be legit web traffic on the surface. They are obviously spoofed IP address as there are network addresses in the list and the IP belongs to a router that doesn't appear to be compromised in any way. The initial letter included 700+ IP addresses from our network.
Presumably Google is smart enough to know the difference between
spoofed port scanning and completed TCP connections performing a web
search. If you take Google's report at face value, the addresses
aren't spoofed; something else is happening. The question is how.
There was a company revealed on Nanog earlier this year (or maybne
last year, I'm not great with dates) which contracts small ISPs and
virtual server providers to use their "spare bandwidth" to
pseudonymously originate web requests. They don't require you to
assign them IP addresses because they overload their activity on all
of your IP addresses. In theory they do this without disturbing your
customers and only access web sites whose owners have contracted them
to do so, generally to test connectivity. In practice, there's a
device inline with your traffic flow that injects TCP connections and
captures the associated return packets across your entire address
space. Including, for example, your routers' IP addresses.
Do you, or perhaps your upstream have such a contract?
bill at herrin.us
More information about the NANOG