How about a system where it checks the top 25 returns from a google search and if the info is similar across different domains that should flagged. Many searches it it will be 24 out of 25 are exactly the same but different domains each return, seemingly. A single blog post won't get indexed quickly and won't propagate to fill up search returns in a timely manner either.
So, that is one aspect of ML I would incorporate.