Hi ! I continue my tour on the Text Summarization techniques, this time is about the keyword extraction, with this survey written by Santosh Kumar Bharti, Korra Sathya Babu.
Source: https://arxiv.org/abs/1704.03242
The Automatic Keyword Extraction is fundamental in order to provide a good representation for the summarization.
It is not a descriptive survey detailing and explaining, it is more a web showing a view of of the current state of the research and citing several papers. I like also this way of doing, it is very neutral, you can quickly select what you are interested into and dive into. This post will be like the survey, minimalist on the descriptive aspect.
If you want a more descriptive post, you can have a look to this one: https://steemit.com/text/@boucaron/text-summarization-techniques-a-brief-survey
This previous figure shows the main categories of the Automatic Keyword Detection. The statistic approach are using non-linguistic features of the document, word occurrences for instance. The linguistic approach allows to extend the previous one using lexical, syntactic, discourse analysis. Machine Learning can also be used, the keyword extraction is a learning problem, it needs a dataset for learning, it can be tuned for a specific field/task.
This figure shows an overview of the different kinds of Text Summarization: single or multiple documents, query based where only the subset of interest is extracted, extractive that builds a summary, abstractive where the "idea" is extracted using linguistic to extract concepts and generate short abstract, supervised based on training through datasets.
The previous figures show a taxonomy of the different summarization approaches. This previous post describes in more detail the Statistical Based (TF-IDF), Graph Based (GPR) and Bayes Machine Learning based (NB): https://steemit.com/text/@boucaron/text-summarization-techniques-a-brief-survey. The Coherent based approach uses words, lexical, grammar to establish the meaning. The Algebraic approach contains all matrix related techniques trying to produce a set of concepts from the text through different techniques: indexing, clustering, classification....
This last figure shows how the performance of the text summarization can be evaluated through different metrics.
Even, if this survey is not descriptive enough and is lacking many definitions, it provides a good big picture of the domain to allow to dig further.
I find your posts are very helpful and honest, thank you for sharing your information boucaron, following you.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Thank you, it means a lot for me.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
You got a 1.50% upvote from @postpromoter courtesy of @boucaron! Want to promote your posts too? Check out the Steem Bot Tracker website for more info. If you would like to support development of @postpromoter and the bot tracker please vote for @yabapmatt for witness!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit