[BOUNTY] WE NEED THIS NOW. ALL STEEMDOLLARS WILL GO TO THE BEST SOLUTION.

in steem •  8 years ago  (edited)


by @cryptoctopus

Hello Everyone!

I believe I am not the only in this situation. I have acquired quite a bit of STEEMPOWER through the author's reward and I am now doing my best as a content curator. But I am getting slowly into despair.

People who are looking to do a quick buck and don't want to put any efforts creating actual unique and creative content simply copy/paste articles from other authors either from within steemit or outside on 3rd party websites and post here. Most people don't bother to check by copy/pasting in google to see if it's an original piece of content.

The sheer volume of new article is overwhelming and there are not enough of dolphins and whale to police it all.

This is a big problem

The value of the platform depends largely on Google giving us rankings for articles. When people do that, it hurts all of us...especially people like me who have a bigger stake in the platform.

Solution - Automation

I am putting a bounty up of all the SBD this article will earn for anyone who has the technical know-how to create a bot that searches new articles, check if its been copied elsewhere on the internet(or steemit), flag the article if it is the case, add a comment to put everyone on notice with a link to the original article.

This overtime should get the message across that this isn't the right thing to do.

Claiming your bounty

Demonstrate it's capability and then contact me directly via the slack channel. Also, I will be happy to upvote the comment from this bot anytime I see it since it is a major help to keep this platform clean from copy/pasted content.

EDITS

EDIT [1]IMPORTANT: The solution must be scale-able...I am not convinced yet that mechanical turks are the way to go for automation and scaleability.
Edit [2]: The article shouldn't be penalized from duplicate content if they use the quoting function

like this

EDIT [3]: The bot shouldn't flag links and pictures
EDIT [4]: Should flag the post after a certain threshold of plagiarizing to not flag anyone who just happened to write a sentence that is the same as somewhere else on the net.

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

This post should easily cover $500 and I would recommend to donate ALL income of this post as a bounty .. but it's up to you.

For those looking to code this .. you may want to look into http://piston.rocks if you intend to code this using python ..

If I hadn't some other projects that could be beneficial for steem, I would do this my self. I'll let others try it first!

Go

yes, your piston.rocks is definitely the tool to use. I am taking your suggestion into account. Changing the title now.
I am looking for a solution that can scale and I doubt amazon mechanical turks can perform this task.

You gotta love how easy steemit makes crowdfunding, this is insane on so many levels. :D

I think the problem lies in the distribution itself.

you mean?

piston is nice for bots! cool more voting bots here soon

I upvoted in support of this idea.

And\Or, Steemit could just make it easier to embed content than it is to steal it.

There is an open source project that does this with almost all content, from amazon to soundcloud to twitter to vimeo. It displays the content, credits the author, and provides a link to their domain, and it works just like steemit shares youtube videos right now.
http://embed.ly
http://oembed.com/
https://noembed.com/

These projects are open source, and just waiting in github.
https://github.com/iamcal/oembed/tree/master/providers
https://github.com/leedo/noembed

The reason I keep posting this, is because it is not a winnable war and wasting energy on policing when you could be sharing 100x more cool stuff ethically is just dumb.

I agree. Its a lot of wasted energy and efforts but something has to be done. A solution like the above would probably take a long time to integrate right now...but in the meanwhile we need something to deter people from copy/pasting

I cautiously agree that a downvote brigade would work fine temporarily.

This is a disruptive platform, it needs more disruptive services like embedding youtube videos. Embed.ly has built a enterprise level business that turns "plagiarism" into sharing, by simply making it easier to do the right thing and credit the author. Steemit needs this concept more than police and content bots.

Just keep in mind. Once a police force is created and given power, they never get disbanded. And once a bot coded to detect and censor stuff is created, it's a tiny step away from being used to censor ideas. Those 2 things are how you get reddit and facebook in their current states. Building them the first month on a decentralized platform is just plain scary.

  ·  8 years ago (edited)

Yep. And this is my fear. I think a bot should only post a comment linking the original source. Then be subject to flagging if the plagiarist claim is wrong.

This bot would be golden. I've noticed it happening so many times lately, and I believe I saw a post (was it maybe you) who warned people to google search the content and not just the title to see who steals content without even giving and credit to the creator.

As you can see from my post history I try to tell people in the #introduceyourself posts to verify in some way that they are the owner of the account and the ones they claim to be in the pictures.

Remember creating accounts is anonymous and people with many facebook or reddit accounts can create a lot of them. Its easy for them to just take pictures from facebook or other social media platforms and claim to be them here for a quick profit. How much would it suck in the future for someone to come on here and realize someone else has been acting to be them for years. I know I would be creeped out.

Excellent idea! The anonymity that mean "no morals". This is not only a robbery to another person, but also to the whole community Steemit. In real society, such theft could be sued with backrest justice.

It'd also be nice if occasionally one of you whales would give a dolphin a boost.

I write funny shit like this, and it gets drowned in spam before anyone notices it.

https://steemit.com/satire/@business/what-2026-will-be-like-when-your-steem-power-reaches-infinity

its really tough dude. Im actually a dolphin and I get drown trying to figure out which post are authentic and which on are copy/paste. That's where my despair come from.

This problem for me is more about stealing then actually copy/pasting. I might be wrong but I don't see any problem for anyone to actually repost their own content they already worked very hard to create in here. However, the problem becomes : how can we verify the original author for both entries is the same person. Maybe a platform like keybase.io could be leveraged to automate author verification?

I have the same concern. I have posted my own original content all over, including here.

For mass plagiarizers it might be good idea to downvote everything that they post so that their accounts will become useless.

In the blockchain reputation is everything. When you lose the reputation of your account, you basically destroy the account. Nobody will trust it anymore.

If i could remember the username of all spammers I would. But I cant 😤

That's why we need tools for it. It'd like to be able to mark users so that I'll see immediately if they have done something unwanted earlier.

But of course this is mostly a job for bots. Once a spammer is in the automatic downvote list, his account will be useless.

I agree there's too many reposts and we need to clean up these posts that are just reposts of already posted content.

We have @cheetah!

Gotta catch them all!

Ok, let's do it. I think this must be done.

also obvious "give me blogging rewards" for my comment.

I don't think so. The author really will have to give the reward. Or he will stop getting high upvotes.

I totally agree.
yesterday ( without having seen this post) uploaded a post about how to recognize the network using.
https://steemit.com/steemit/@maximiliano711/steemit-botbusters-realtime-canvas-network-graph-diagram
and it is possible to realize which ones are bots.
I think we need a way to report possible bots.

Cannot agree more to that! They could be a big problem!

I find it fascinating and kind of ironic than in search for openness and decentralization of trust we end up looking for control.

The platform needs to evolve if we expect any kind usability down the line, whatever happens I think moderation issues are bound to keep on coming as we on board more users.

Right now there is no usable search so I think that the state of things are worse than what they could be with more filtering.

Can't wait to see what the community and the steemit team come up with, to help keep the "website" part of steam usable with a large user base.

  ·  8 years ago (edited)

I wished I could help, but I haven't coded anything in years! This new stuff is all greek to me!

Bots are only so good, until their algorithm is figured out. What is really needed for Steem is the ability to "delegate" voting. Where you could manage a team that performs quality voting (but you retain the account balance).

There are a lot of great posts in the tag-name/new that goes missing because mostly everyone is looking at the trending section. A bot will indeed make our lives easier. Hopefully it can enable better discovery of posts from lesser known authors and speed up the curation process. The front end could be more dynamic in displaying posts as well perhaps?

I simply can't keep up...and I end up wasting all my votes of post that ultimately hurts my earnings. (no compensation for downvote...which is a good thing in a way)

I see. Well let's hope someone comes up with a solution quickly. :)

  ·  8 years ago (edited)

I don't know if that's the solution, i was in spymac and there was tolerance zero with no original contents, but what happens? all the original content was crap justo pretty girls doing exercise or make-up like here, i think we need to separate the original content with the link content, the link contents it's not bad if we have quality original content if people arrive to steem an find interesting link content an great original content, but if only found crap original content and bots and people catching whatever that looks a bit like plagiarism i don't see a good future

if someone use

quote from this article

They shouldn't be penalized or flagged.

Also, before the flood of people coming in, the original content here was stellar. I wouldn't give my vote to crappy original content either.

  ·  8 years ago (edited)

Also note, there are bloggers like Mummyimperfect that duplicate their real blog on steemit. Thry should not be penalized.

sounds like a great idea but does this idea not defeat the purpose of the whole free speech thing? just cause you would loose money you want to start censoring people over what they post??

Steemit needs to become more categorized,right now its a mess you cant find anything but a bot having a bot checking what you post is wrong.

Who says this bot will not get programmed to filter out words, statements even political views??

with the blockchain...bots can't be stopped from being programmed. Some dude this morning copy pasted an article I wrote a month ago trying to make money out of it. Google has bot checking for duplicate content. As long as the bot remain on task, I am fine with it.

When it comes to flagging opinions then that would really be an issue...you have a point there and I'm brainstorming right now how this could be prevented if governments started putting massive amount of SP in their portfolio...

EDIT: Looking into this, any account has a certain amount of voting power. After 20 per day, it significantly drops. It would be hard to implement a bot than can censor the whole place.

There will never be a consensus on this topic because the variables are too different from one another. On the one hand you have people who actually curate content and do it "right" and those people should not take a hit for doing what practically any major source of "news" does... curate content. Look at any platform like MSN, Yahoo, Bing, Google News/Trends, AOL and the list goes on and on. They all curate content as a primary business model- at least for those specific pieces of their properties.

There is a fine line between curating and blatant theft and that line has only gotten more blurred with the rise of social media. I think it will ultimately end up policing it's self simply because most of the people who just rip stuff off do a piss-poor job of it anyway.

I mean, if a bot could be developed in such a way that it would benefit steemit then great. I don't think that'll wind up being the case though, we'll wind up with something in place that just does what YouTubes copyright striking does and before you know it we'll see posts getting flagged that simply do not deserve it.

I'm down with people using the quote function ">" whenever they use content. The bot could filter that.

Yes. At times the "new" section looks like one of those "article" websites where spammers and "internet marketers" just throw a load of pasted articles up. Please also keep in mind that some articles will be "respun" to evade Googles duplicate content penalties so a bot might classify them as original by accident. Anyone familiar with Warrior Forum will have heard of article spinner software, and even Fiverr gigs where people will respin content to make it appear original.
Ultimately I don't think we should rely solely on automated processes to vet the content. The Steemit system should be refined to encourage "spam vigilance" by all users.
There should be so much more to Steemit than entertaining or informative articles. I look forward to the emergence of communities of hobbyists, ideologues, educators, hackers, philanthropists, etc. Users within these communities will build relationships, create "thought leaders", and spam will be weeded out more efficiently.

Imagine I write an article with quoting some author. The bot copies the text, and searches for a source. Text matches the source, it then flags my post.

There is no definitive way to prevent unfair flagging since there are countably infinite ways to reference a source.

So you'd also be flagging for a sort of referencing you don't approve.

Another point is how does the bot prove uniqueness? Let's say I copied some text from an article I wrote, but my name on STEEM isn't revealed, I have to get flagged or dox myself?

I think the bot should only look for text on google, then comment a potential plagiarised source.

Downvotes will dictate whether or not the bot was accurate.

there is a quote markdown. The bot could not be taking it into account as plagiarized content...just like google bots do.

Then formatting will have to be specified.

awesome ! its a great idea.

https://steemit.com/stolen/@knozaki2015/please-implement-a-reporting-function

it would be great if steemit could implement a report function too.

Regarding images used in posts : it would be useful if a credit could be given to the source of the image if its not originally created by the posts author

I would like to be able to upload directly to Steemit because my photographs I am posting are not found online. Instead, I am posting them on Facebook (which I dislike) and linking from there. Recently, to illuminate points, I have used easily found images that I am not claiming as my own but will begin to note their source for the audience.

I AM SURE the value of the platform depends largely on Google giving us rankings for articles. When people do that, it hurts all of us...especially people like me who have a bigger stake in the platform.

I AM SURE the value of the platform depends largely on Google giving us rankings for articles.

Please explain why you are sure of this. You might be right, but I have never heard the developers mention such a thing. I have heard them and many in the community say they will be quite skeptical and cautious of introducing advertising.

Good idea - this has been annoying me too - this site will be better for it (original content) in the long term

hello

Great idea

I think this is a fantastic idea! As others have said it looks like http://piston.rocks might be the best solution assuming one can write in python ;-)

Here are my pennies towards the bounty!

I have completed this task. In an unconventional way by utilizing cyborgs. I take Amazon Mechanical Turks and have them do it for me. I am awaiting by $500 bounty, thank you.

~Your Future Robot Overlord

Can it scale to 100,000 articles+ per day?

I was unable to build an environment for this. The error I got was:

    
Installing collected packages: steem-piston, diff-match-patch, python-frontmatter, appdirs, colorama, steem, graphenelib, prettytable, PyYAML, asyncio, websockets, ecdsa, txaio, requests, Unidecode, autobahn, scrypt, websocket-client
  Running setup.py install for diff-match-patch
    
  Running setup.py install for python-frontmatter
    
    warning: no previously-included files found matching '*.py[co]'
*** Error compiling '/tmp/pip_build_alumno/graphenelib/grapheneapi/graphenewsrpc_BACKUP_13829.py'...
  File "/tmp/pip_build_alumno/graphenelib/grapheneapi/graphenewsrpc_BACKUP_13829.py", line 80
    <<<<<<< HEAD
     ^
SyntaxError: invalid syntax

This looks like a problem caused by a merge that didn't go so well. It is in websocket-client. I am not sure how I can help this along. When the error happens the pip3 script clobbers the tmp directories.

Not sure if this will help but try this?
git reset --hard origin

This error results in the easy pip3 method. Any git checkouts were automated and I never modified any source files. I think if I can download graphenelib separately I can find a version that can work, however.

some debian packages mentioned in the graphene install instructions. Then I installed six and graphene manually (ergo: git clone, then setup.py). After that I was able to install piston manually.

threw in my vote to support the cause.

really interesting, framed like that - a post being used not just as a post to earn rewards, but as an idea proposal, job-posting, and fund-raiser all in one. very cool.

First of all we need to verify people's identity if they want to duplicate their blog posts or if they are public figures. We also need to ask the poster if this content is original content or not. If they lie and they get flagged by the bot or other users it will hurt their reputation. If they keep doing it, their reputation will hide all their future posts.

Let's do it?!?!