What Will I Learn?
Greetings, this tutorial will cover the usage of Jsoup library in java. By using this library you can pull information on websites and use them then present them in a user friendly output. Jsoup becomes functional when you need to compare the data on multiple pages, get the rapidly changing/refreshing information, analyze the values on a web page and much more. In order to use this library you must first download it’s .jar file and locate it same place with your java class. Later in your java method you will be able to call the library.
Requirements
- IDE is required to test the code (preferably Eclipse IDE for java developers)
- Basic knowledge on Java.
- Basic knowledge about Jsoup library.
Difficulty
This tutorial is prepared for indivuduals who have a prior knowledge about Java classes, libraries and programming languages,
- Intermediate
Tutorial Contents
In this tutorial we will pull our data's from wikipedia and wiki-zero and process it according to our needs. There are quite a lot of methods and ways to index a webpage in java but the fastest and accurate one is to use api of the desired page if its possible. Firstly we should go to the page that we want to get datas. Then we should find the div class that we want to pull and after processing the data we will be able to get the below output,
Initially, before calling our function we must add the libraries that we are going to use in the project.
The first librarty that we need to locate is the java.io.IOException which is capable of showing/displaying detailed errors when user enters an unexpected input. Briefly it is used to optimize input/output (i/o) relationship,
import java.io.IOException;
Then we can continue adding the Jsoup libraries that has functions capable of fetching the data's on a website and tranforming into java friendly output strings/documents.
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
Now we should add one last library that will help us to get the user entered values,
import java.util.Scanner
Then we can declare our class
public class
You may name the class as you wish but it should be same with the file name.And we need to define our method by saying public static void, we mean that the code is visible,no return value and a class type.
public static void main(String[] args) throws IOException
Then we can continue calling the connect or parse function from Jsoup library. To do that Jsoup.connect'Yourwebsite') command will be enough. Here for this tutorail we will pull the data from wikipedia's top international rankings by country list. So we need to pull the data from it's corresponding site. If wikipedia if somehow not avaliable in your country you can also use wiki-zero. We can use the below code to get data from almost any website,
Jsoup.connect(yourwebsite).get();
Now since we want to get data from wikipedia we have to change the yourwebsite part into our desired data source,
String link = "http://www.wiki-zero.com/index.php?q=aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvTGlzdF9vZl90b3BfaW50ZXJuYXRpb25hbF9yYW5raW5nc19ieV9jb3VudHJ5" ;
Document doc = Jsoup.connect(link).get();
In your design instead of definig a link string you may prefer to write it inside the connect statement. The reason for extra define is because of longless of the link and to look more readable. After definig the site and the connection method we may procced on showing the div tag of the site where the data is displayed. Here the table we want is located inside the div tag table.wikitable.sortable moreover inside the table we have to pick the tr tag to eliminate and hide the properties of table like size,color.
In the below picutre the reason of picking this table is shown.
Now we can proceed on declaring the location of div tag,
Elements initialtable = doc.select("table.wikitable.sortable tr");
and to remove the top messages above the table like the name of the country statistic or date.
initialtable.remove(0);
This above code is pretty useful when you want to get the body elements of a site. It eventuall removes the top part and gives you the a list like representation of the table. Then to enable multiple searches a while loop is added that will always work as long as the user wants to stop the search. To make it possiible a variable defined.
int i3 = 1;
and always true while loop added,
while (i3 > 0) {
This is functional when you want to make multiple searches without runnig the program over and over again. Now we can continue getting the user input. In this task we need to get the country name from the user and then get its data from wikipedia.
Scanner keyboard =new Scanner (System.in);
System.out.println("Enter the country you want to search?");
String string=keyboard.next();
This above code will let us get the users desired, searched country. In your design you may use to get another value like the name of the celebrity, company or the altcoin that you want to track. Then we can move on displaying the data regarding to the users search. Firstly we have to write a for loop to trace all elements in the table and give break after each line.
for (Element d : initialtable) {
Then we can turn the element d into a text that we can analyse, split or search specific word inside.
dr = d.text();
To get the searched country and its info in the text an indexOf method is used. In your design you can use this to find an object or variable.
int i = dr.indexOf(" ");
Now we have the name of each country from the start.
Now since we have the name of countries we can check if they are matching with the user entered input,
if (country.equals(string))
{
dr = dr.replaceAll("\\[.*\\]", "");
System.out.println(i2 +". " + dr);
i2++;
This above code will print all the info about the user searched country in the table. Morevoer it will also remove the reference indexes in wikipedia shown in [brackets]. And finally we can ask the user if they want to continue searching for another country,
Scanner keyboard3 =new Scanner (System.in);
System.out.println("");
System.out.println("Do you want to continue searching? [y/n]");
String string3=keyboard3.next();
if (string3.equals("n"))
{
break;
}
Once the user types 'n' the break will cause the end of while loop and program will be terminated. Below are the sample outputs and overall code of the programme,
Overall code,
import java.io.IOException;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.util.Scanner;
public class listofcount {
public static void main(String[] args) throws IOException {
String link = "http://www.wiki-zero.com/index.php?q=aHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvTGlzdF9vZl90b3BfaW50ZXJuYXRpb25hbF9yYW5raW5nc19ieV9jb3VudHJ5" ;
Document doc = Jsoup.connect(link).get();
Elements initialtable = doc.select("table.wikitable.sortable tr");
initialtable.remove(0);
String dr = "";
int i3 = 1;
while (i3 > 0) {
Scanner keyboard =new Scanner (System.in);
System.out.println("Enter the country you want to search?");
String string=keyboard.next();
int i2 = 1;
System.out.println("Listing top international rankings of " +string+ "\n-------------------------------------------------" );
for (Element d : initialtable) {
dr = d.text();
int i = dr.indexOf(" ");
String country = dr.substring(0,i);
System.out.println(country);
if (country.equals(string))
{
dr = dr.replaceAll("\\[.*\\]", "");
System.out.println(i2 +". " + dr);
i2++;
}
}
Scanner keyboard3 =new Scanner (System.in);
System.out.println("");
System.out.println("Do you want to continue searching? [y/n]");
String string3=keyboard3.next();
if (string3.equals("n"))
{
break;
}
}
}
}
And the sample outputs,
Curriculum
- Making and designing a translator with Jsoup
- Making your own currency tracker with Jsoup!
- Extracting data by using Jsoup
- Improving translators performance by using Jsoup
Posted on Utopian.io - Rewarding Open Source Contributors
Thank you for the contribution. It has been approved.
You can contact us on Discord.
[utopian-moderator]
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Hey @wodsuz I am @utopian-io. I have just upvoted you!
Achievements
Suggestions
Get Noticed!
Community-Driven Witness!
I am the first and only Steem Community-Driven Witness. Participate on Discord. Lets GROW TOGETHER!
Up-vote this comment to grow my power and help Open Source contributions like this one. Want to chat? Join me on Discord https://discord.gg/Pc8HG9x
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit