Something that I really like about the Brave Browser is that the built-in AI in the sidebar/toolbar can read the content in the the web sites that I'm viewing in the browser, and it can answer questions about the site. As a result of that capability, I've been experimenting with prompts that could help with curation decisions.
Certainly, the idea is not to replace my own decision-making with an AI, but rather I want to get an independent perspective - especially with regards to things like reach, novelty, and relevance. It's an adaptation of the Four eyes principle.
So, today, I thought I'd compare how the AI evaluated a series of posts and report on the results. I would have preferred to use the "Claude 3 Haiku" LLM in the browser settings, but I ran out of free AI time too fast, so most of the results are with "Llama 3.1 8B". I also did one evaluation with Mixtral, just for comparison purposes (but again, I ran out of free AI time).
Here's today's version of a curation prompt:
Evaluate this article in four dimensions and score each dimension from 1-10. The dimensions are:
- Novelty/surprise: Does the article provide unique information or perspective?
- Organization: Is the article structured in a way that motivates the reader to read the whole thing?
- Grammar/spelling: Are there any mistakes?
- Relevance: Is the topic relevant to a wide audience?
Also, provide an overall score from 0-10 that considers those factors and any others that you think are relevant.
Finally, I'm going to vote on 20 articles today with an average percentage of 50%. From 0-100, what percentage should I allocate to this article?
Explain your scoring.
Note that this prompt doesn't consider curation rewards at all. An improved version might explain the curation rules and have it include the potential rewards in its response.
Here's what Brave's AIs think of some posts on the Steem blockchain. Five of six posts are mine, and another anonymous trending post was a late addition.
First, a summary table
Model | Link | Novelty | Organization | Grammar & Spelling | Relevance | Overall | Suggested Vote |
---|---|---|---|---|---|---|---|
Llama | Make Downvotes Great Again? | 6 | 8 | 9 | 7 | 7.5 | 60 |
Llama | My selections for Developer Delegation Day (DDD) | 6 | 8 | 9 | 7 | 7.5 | 60 |
Llama | True crime in the Delaware Valley: A cover-up in the case of Ellen Greenberg? | 8 | 9 | 9 | 6 | 7.8 | 60 |
Llama | New technique makes skin invisible - yes really | 8 | 9 | 10 | 8 | 8.3 | 55 |
Llama | Community Service Announcement: Pinned posts by mod-bot | 6 | 8 | 10 | 8 | 7.4 | 60 |
Mixtral | Community Service Announcement: Pinned posts by mod-bot | 7 | 8 | 9 | 7 | 31/40 | 55-60 |
Llama | Anonymous trending article | 2 | 4 | 9 | 6 | 4.5 | 20-25 |
Here's what those scores look like:
Evaluations across dimensions
Suggested votes
Here's what it looks like in practice, with both the article and the AI interaction in the same browser environment.
Here are the details of the responses, followed by a short wrap-up.
Article 1 response:
Dimension 1: Novelty/surprise Score: 6/10 The article presents a unique perspective on downvoting on Steem, suggesting a new approach to using the tag "#thegauntlet" to encourage accurate appraisal of content. However, the idea is not entirely new, as it is based on a concept mentioned in the past.
Dimension 2: Organization Score: 8/10 The article is well-structured, with a clear introduction, explanation of the problem, presentation of a solution, and discussion of potential objections. The author effectively uses headings and paragraphs to break up the content and make it easier to follow.
Dimension 3: Grammar/spelling Score: 9/10 The article appears to be well-written, with no major grammatical errors or spelling mistakes. However, there are a few minor errors, such as the use of American spell-checker corrections.
Dimension 4: Relevance Score: 7/10 The topic of downvoting on Steem is relevant to a specific audience, namely Steem users and content creators. However, the article may not be of interest to a wider audience.
Overall score Score: 7.5/10 The article presents a unique perspective on downvoting, is well-structured, and is free of major errors. However, the topic may not be of interest to a wide audience, and the idea is not entirely new.
Percentage allocation Score: 60% Based on the article's content and relevance, I would recommend allocating 60% of your voting percentage to this article.
Article 2 response:
Dimension 1: Novelty/Surprise Score: 6/10 The article provides a unique perspective on the Steem ecosystem and the concept of Developer Delegation Day. However, the idea of delegating STEEM Power to developers is not entirely new.
Dimension 2: Organization Score: 8/10 The article is well-structured, with a clear introduction, explanation of the concept, and a list of the author's delegations. The use of brief highlights for each developer is effective in conveying the author's reasoning.
Dimension 3: Grammar/Spelling Score: 9/10 The article appears to be well-written, with no major grammatical or spelling errors. However, there are a few minor errors, such as the use of a smiley face in the text.
Dimension 4: Relevance Score: 7/10 The topic of the article is relevant to a niche audience interested in the Steem ecosystem and blockchain development. However, the article may not be of interest to a wider audience.
Overall Score Score: 7.5/10 The article is well-written and provides a unique perspective on the Steem ecosystem. However, its relevance and novelty are limited to a specific audience.
Percentage Allocation Score: 60% Based on the article's quality and relevance, I would recommend allocating 60% of your vote to this article. This is a moderate score, reflecting the article's strengths and weaknesses.
Article 3 response:
Dimension 1: Novelty/Surprise Score: 8/10 The article provides a detailed and unique perspective on a specific true crime case, which is not commonly discussed. The author's personal involvement and updates on the case make it engaging and informative.
Dimension 2: Organization Score: 9/10 The article is well-structured, with a clear introduction, background information, and updates on the case. The author provides links to previous articles and sources, making it easy to follow the story.
Dimension 3: Grammar/Spelling Score: 9/10 The article appears to be well-written, with no noticeable grammatical or spelling errors.
Dimension 4: Relevance Score: 6/10 While the topic is specific, it may not be relevant to a wide audience. However, true crime enthusiasts and those interested in justice and law may find it engaging.
Overall Score Score: 7.8/10 The article's unique perspective, clear organization, and well-written content make it a strong contender. However, its limited relevance to a wide audience and some potential biases (e.g., the author's personal involvement) prevent it from scoring higher.
Percentage Allocation Score: 60% Considering the article's strengths and weaknesses, I would recommend allocating 60% of your vote to this article. This score reflects its unique perspective, clear organization, and well-written content, while also acknowledging its limited relevance to a wide audience.
Article 4 response:
Dimension 1: Novelty/surprise Score: 8/10 The article presents a unique and surprising breakthrough in making mouse skin transparent using Yellow Dye #5, which has the potential to revolutionize medical diagnostics and procedures.
Dimension 2: Organization Score: 9/10 The article is well-structured, with a clear introduction, explanation of the research, and potential applications. The use of subheadings and quotes from experts adds to the clarity and readability.
Dimension 3: Grammar/spelling Score: 10/10 The article appears to be free of grammatical and spelling errors.
Dimension 4: Relevance Score: 8/10 The topic is relevant to a wide audience interested in science, technology, and medical advancements. However, it may not be of interest to readers who are not familiar with scientific research or medical applications.
Overall score Score: 8.3/10 The article's unique information, clear organization, and lack of errors make it a well-written and engaging piece. However, its relevance to a wider audience is somewhat limited.
Percentage allocation Percentage: 55% Considering the article's novelty, organization, and relevance, I would recommend allocating a higher percentage to this article compared to the average. However, it's essential to balance this with the need to evaluate other articles as well.
Article 4 response:
Llama 3.1 8B
Novelty/Surprise: 6 The article provides a unique perspective on how a moderation bot is used to promote posts in the Popular STEM community, but the concept of a moderation bot is not entirely new.
Organization: 8 The article is well-structured and easy to follow, with a clear explanation of how the mod-bot system works and its benefits.
Grammar/Spelling: 10 There are no noticeable mistakes in grammar or spelling.
Relevance: 8 The topic is relevant to a niche audience interested in Steem and blockchain technology, but may not be of interest to a wider audience.
Overall Score: 7.4 The article provides a clear and concise explanation of a unique concept, but its relevance is limited to a specific niche audience.
Percentage Allocation: 60 Considering the article's unique perspective and clear explanation, I would allocate a higher percentage than the average 50%. However, its limited relevance to a wider audience prevents me from allocating a higher percentage.
Mixtral
1. Novelty/surprise: The article provides a unique perspective on community moderation and visibility through the use of a moderation bot. I would rate this a 7/10.
2. Organization: The article is well-structured and easy to follow. The information is presented in a clear and logical manner. I would rate this a 8/10.
3. Grammar/spelling: The article appears to be free of grammatical and spelling errors. I would rate this a 9/10.
4. Relevance: The topic is relevant to the STEM community and could be of interest to anyone involved in online communities or moderation. I would rate this a 7/10.
Overall, I would rate this article a 31/40 (out of 10).
For your voting, since you plan to vote on 20 articles with an average percentage of 50%, and considering this article's overall score, I would suggest allocating around 55-60% to this article. This is slightly above the average due to its good organization and error-free text. However, the novelty and relevance scores are somewhat moderate, which is reflected in the suggested percentage.
Response to an anonymous trending article
Dimension 1: Novelty/surprise Score: 2/10 The article appears to be a collection of user-generated content from a web page, with various posts on different topics such as weather, personal experiences, and news. While some of the content may be interesting or surprising to individual readers, the overall collection lacks a unique perspective or information that would make it stand out.
Dimension 2: Organization Score: 4/10 The article is structured as a collection of posts, but it lacks a clear organization or flow. The posts are not grouped by topic or theme, and there is no clear introduction or conclusion to tie the content together. This makes it difficult for the reader to follow or engage with the content.
Dimension 3: Grammar/spelling Score: 9/10 Upon reviewing the article, I did not notice any significant grammar or spelling mistakes. The text appears to be well-written and free of errors.
Dimension 4: Relevance Score: 6/10 The article appears to be a collection of user-generated content, and the topics covered are diverse and may not be relevant to a wide audience. However, some of the posts may be of interest to readers who are interested in personal experiences, news, or current events.
Overall score Score: 4.5/10 Considering the dimensions above, the overall score reflects the article's lack of novelty, poor organization, and limited relevance to a wide audience.
Percentage allocation
I would recommend allocating 20-25% to this article. This is a relatively low percentage, reflecting the article's limited relevance and lack of unique information or perspective.
I hadn't planned to add the anonymous trending article, or the Mixtral output, but Llama's results were so consistent on my posts that I was afraid it was stuck. And then I was still concerned when Mixtral also came in close to Llama. We finally see a substantial change when I switch to some totally independent content by a different author who is represented on the trending page. So, I guess we know what the AIs think of me as a writer. 😉
Conclusion
This was an interesting experiment, and I think the technique could provide useful information to a curator. I've played with curation prompting before, but this is the first time that I logged and compared the answers. Overall, it seems promising, but there were a couple quirks.
- I don't trust its voting percentage recommendations. LLMs are notoriously bad at math; the results on my posts were too consistent; and the result on the anonymous trending post was far too generous (IMO).
- Despite the general consistency of the results, I noticed some inconsistencies in the details. For example the overall score for post 3 was lower than the overall score for post 4, but the voting percentages were ordered in reverse.
Clearly, this is not ready for primetime as a standalone curation tool, but I think it can be a useful virtual partner for "pair curation".
An interesting note is that it even works in languages that I can't read.
Question: How do you think the prompt could be improved to provide evaluations that are more useful?
Addendum 1: After proof-reading and before posting, I had the AI evaluate this post, too. You'll never guess what voting percentage it suggested...😉
Addendum 2: Not sure how I thought this was going to be a "-lite" article when I decided to post from this account.😂
Thank you for your time and attention.
As a general rule, I up-vote comments that demonstrate "proof of reading".
Steve Palmer is an IT professional with three decades of professional experience in data communications and information systems. He holds a bachelor's degree in mathematics, a master's degree in computer science, and a master's degree in information systems and technology management. He has been awarded 3 US patents.
Pixabay license, source
Reminder
Visit the /promoted page and #burnsteem25 to support the inflation-fighters who are helping to enable decentralized regulation of Steem token supply growth.
Very interesting experiment.
I wonder what you would get if you added in two more dimensions...
a) Assess whether the article has been written by a human, or been generated by an AI system.
b) Check if the article has been plagiarised from another article on the web.
This might be too much for it though and cause the AI to blow a valve 😊
On the summary table is it possible to shorten the headings so the 4th dimension scores are visible?
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
I have tried asking it about plagiarism and AI authorship in the past. It gives answers, but I didn't think they were very believable. Maybe it will improve as time passes, though. I'll probably keep checking it from time to time.
I just updated the table headers so that hopefully the whole table is visible. I hadn't realized that there's no way to scroll tables to the right in condenser. I don't think that used to be the case (except on mobile). Wondering if that happened as part of the changes from @the-gorilla? On upvu.org, I can scroll the tables right and left when they're longer than the web container.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
It would be an interesting little paradox if AI could spot AI !
Thanks for fixing the table. That would be good if @the-gorilla could make that amend for horizontal scrolling on tables as that is a common problem on steemit.com
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
In theory (and the theory should hold true), this should be a single line of CSS.
I'm coding today and have just finished a chunk of work so can have a look at it this afternoon 🙂
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
That's great.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
The prompt is ingenious. I was going to evaluate this post first and then a couple of mine. But this browser doesn't work in Pakistan. 😒
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Wow. I hadn't realized that Brave can't be used in some places. Maybe because it has Tor browser and VPN capability built into it... That's a shame, though. It's far and away my favorite browser.
You could probably do the same thing in ChatGPT or Claude.ai, but that would involve copy/pasting the entire post. Way too much effort to be practical for regular use.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
We can't access it through vpn?
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Did you put manual line breaks in your table titles? I'm wondering why my CSS isn't doing anything and I see many line breaks (which I'm hoping aren't generated in the code!)
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
I put the full headings back in. It's 9 days old now, so probably won't be viewed much between now and whenever you get the new version published. The scrolling looks nice on your test site. Thanks for updating that.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Great, thank you. That saves me trying to find another table!
It works well on a mobile - definitely a big improvement for a few lines of CSS!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Yeah, I did that in an edit to reduce the width of the columns so that the whole table would be viewable.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Upvoted. Thank You for sending some of your rewards to @null. It will make Steem stronger.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit