Hey Steemians!
How are ya'll doing? So, Today I am here to share with you how you can scrape a website using python in a few steps. First, lets have a quick overview what web scraping actually is and what is it used for?
What is web scraping?
Suppose that you want some particular set of information from a website. Now copy pasting all that data can take you hours or even days. That's where web scraping comes in.
Web scraping basically lets you extract data from a website by writing a few blocks of code. I personally prefer python for web scraping because it's syntax makes the process much easier.
Is Web scraping legal?
Yes and No. You need to understand the difference between copying the data and stealing it. As long as you are not using it for analysis and public consumption it's perfectly legal. Just make sure you are not using to steal any confidential data for profiteering purposes.
Now that you know what webscraping is, I am going to show you how in just a few steps you can scrape any website.
Scraping a website using website using python
Step1:
First of all, select a website and the data you want to scrape from it. As I mentioned above, make sure you are not scraping confidential data. Here, I am going to scrape steam's website. It is basically a platform to buy, play, create and discuss PC games. I am scraping it to fetch a list of all the new and trending games and their discounted prices.
Step2:
Now open your python IDE and the first thing that you have to do is to import two libraries.
The Request Library is basically a standardized way of making an HTTP request from python. It's simple API makes up for the complexities of making a request and the user can focus on fetching the data.
The Beautiful soup is a python based library and is used for web scraping and pulling data out of html and XML files. It basically forms a parse tree from the page source code which extracts the data and makes it more readible.
P.s If anyone does not have these libraries already installed , just go to your windows power shell and type pip install (library's name)
Step3:
Copy the link of the website that you want to scrape and save it in a variable.
Step4:
Write the following block of code to:
-Open the connection with the web page
-Read all the data of the web page and store it into a variable
-Close The Connection
-Parse the HTML file
Step5:
After successfully parsing the HTMl file, you need to play a little with the HTML code of the web page. For this go to the web page, click the right mouse button and click on inspect.
Step6:
Now since I wanted to fetch the name and prices of all the games from the new&trending section, I hovered my cursor over the first container i.e FIFA 22 to get the html code related to it.
Analyzing the above image, it is cleared that the first container is embedded by the anchor tag with the class name tab_item which also means that all the other containers will also have the same class name. Hence for scraping all the containers, we will write the following code.
Step7:
Similarly we will fetch the tag and class name of the Title of the game and it's discounted price as shown below.
From the above pictures we know that the class name for all the Titles would be tab_item_name and the class name for all the discounted prices would be discount_final_price and that both the attributes are embedded in div tags
Step8:
We will then create a for loop to access the titles and discounted prices of all the containers using the class names and tags and save it in a variable. We will then print the variables.
Finally, Run this program and voilà, You will see a list of the game titles and their discounted prices in the output just like this
That's it for today, if you like my post do upvote it.
Achievement 1
Achievement 2
#club5050
@cryptokraze | @arie.steem | @qasimwaqar | @vvarishayy | @suboohi
Cool, will try for sure 👍
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Please read these guidelines.
You cannot post Tutorials.
You can apply for course and after approval you can post such tutorials in form of a course.
https://steemit.com/hive-181430/@siz-official/siz-community-guidelines-on-daily-content-creation-categories
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit