[Python Tutorials] Making a word counter with an included dictionary

in utopian-io •  6 years ago 


Image Source

Software Requirements:
Pycharm(Or any preferred code editor)

Repository: Python Open Source Repository

What you would learn
In this tutorial you would learn how to make your own word counter including a dictionary in python.Using;
-Modules
-Functions
-Arrays

Difficulty: Intermediate

Tutorial
In this tutorial you would learn how to make a word counter with an included dictionary.This could be used in any pdf document or website of your choice to check the number of times specific words appear by displaying a numerical value beside the word.This proves really efficient in research work.

The modules we would be using include the requests,
BeautifulsSoup, operator modules which would be imported below

import requests
from bs4 import BeautifulSoup
import operator

The requests module is an inbuilt module used for http calls to the internet and is a major resource for such code.
The BeautifulSoupmodule is an awesome resource used for parsing html content to a readable format for reading and editing.
Theoperatormodule is an inbuilt module for carrying out basic operating functions such as addition subtraction and many more of which we would use.

Necessary Functions To Be Used And How To Make Them

We would then make functions(the most vital part of our code).
First is the def search(url):
(NOTE:You can use any array name of your choice) In this function we would make an array to store the data we would be getting from the website or document.(which is the list = []in this code).

Then we request for the text which we would be using and assign it to a variable (data = requests.get(url).text)and assign it to BeautifulSoup to make it easier to edit and use.The "url"specifies the site that would be used in the code.
Then you specify the link and specific class which our code would be searching.( for postedtext in soup.findAll('a', {'class': 'news-info'}):)

Then turn it into a string to remove all the html data in it.I also recommend splitting the text and then adding it to the array already created.This is optional but recommended(words = plaintext.lower().split()).Then we add the text into the array we first created in our function using the word append(list.append(eachtext))

We would then create another function def clean_up_words(list): which as the name implies is for editing the text we already have by removing symbols that aren't needed, and then adding them to another array (cleaned_up_list.append(word)).

In this function we can see that the symbols are replaced by empty spaces (word = word.replace(symbols[w], "")). So if all the content of that word is just symbols the word would just be only a blank space. Therefore in order to avoid or evade such, we use say

 if len(word) > 0:
            cleaned_up_list.append(word)

so if the content of the word is only a blank space it won't be added to the array.

Then the last function we would be creating is the
def clean_up_words(list):
function which would hold another array that would be carrying the final words that would be printed when the function is called.The array used in this code is
words_list = {}

(NOTE:You can use any array name of your choice).We would also create a "for" loop to add the number of times a word appears in the text.This is shown below;
for word in cleaned_up_list: if word in words_list: word_list[word] += 1 else: word_list[word] = 1

What this loop simply does is;
If the word is already in the array(words_list:), it just increases the value of the word by 1. But if the word isn't in the array(words_list:), it the loop adds the word and and also increases the value of the word by 1. And lastly the word is printed with the value(in this case the number of times the word appears)
print(key,value)
Just as an example I used a random "url" in the 'search'function
So we are done with our word counter.
Below is the whole code all together;

import requests
from bs4 import BeautifulSoup
import operator


def search(url):
    list = []
    data = requests.get(url).text
    soup = BeautifulSoup(data,'lxml')#We are specifying lxml as a parser which is better than html in more recent versions of bs4
    for postedtext in soup.findAll('a', {'class': 'news-info'}):
        plaintext = postedtext.string
        words = plaintext.lower().split()
        for eachtext in words:
            list.append(eachtext)
    clean_up_words(list)
    

def clean_up_words(list):
    cleaned_up_list = []
    for word in list:
        symbols = "!@#$^&*())_+=][\';/.,><?'"
        for w in range(0, len(symbols)):
            word = word.replace(symbols[w], "")
        if len(word) > 0:
            cleaned_up_list.append(word)
    dictionary(cleaned_up_list)


def dictionary(cleaned_up_list):
    words_list = {}
    for word in cleaned_up_list:
        if word in words_list:
            word_list[word] += 1
        else:
            word_list[word] = 1
    for key, value in sorted(words_list.items(), key=operator.itemgetter(1)):
        print(key,value)


search('https://www.thenetnaija.com/news')

My Github Repository

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Thank you for your contribution.

  • Overall, the tutorial does not properly describe what the real target is. It says this is a word counter, yet it seems this is only strictly functional to reading and counting words based of a website url that you pass.
  • your github username is completely different than your steemit username, we cannot verify authenticity.
  • the the concept used is already well-documented across the web, try to do something more innovative.
  • your overall language has many problems and explanations do not drive easily to the reader, and risk losing the focus at many times.
    For any future tutorials, we advise to work on changing and improving all the above for higher quality work.

Your contribution has been evaluated according to Utopian policies and guidelines, as well as a predefined set of questions pertaining to the category.

Need help? Write a ticket on https://support.utopian.io/.
Chat with us on Discord.
[utopian-moderator]

Thank you for your review, @mcfarhat! Keep up the good work!

Hello! Your post has been resteemed and upvoted by @ilovecoding because we love coding! Keep up good work! Consider upvoting this comment to support the @ilovecoding and increase your future rewards! ^_^ Steem On!

Reply !stop to disable the comment. Thanks!

Congratulations @choja! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 1 year!

Click here to view your Board

Support SteemitBoard's project! Vote for its witness and get one more award!

Congratulations @choja! You received a personal award!

Happy Birthday! - You are on the Steem blockchain for 2 years!

You can view your badges on your Steem Board and compare to others on the Steem Ranking

Vote for @Steemitboard as a witness to get one more award and increased upvotes!