Text is very difficult for human beings
Text summarization in one of the research area in Natu-ral Language Processing(NLP) which provides meaningfulsummary using various NLP tools and techniques.
Sincehuge amount of information is used across the digital worldwhich is very difficult for human beings to manually sum-marize, it is essential to have automatic summarization tech-niques. Summarization techniques are broadly divided intoExtractive and Abstractive. Most automatic summarizationapproaches are extractive which leverage only literal or syn-tactic information in documents. Sentences are extractedfrom the original documents directly by ranking or scoring.On other hand, abstractive summarization is a challengingarea, because it requires deeper analysis of the text and hasthe capability to synthesize a compressed version of the orig-inal sentence or may compose a novel sentence not present inthe original source. The goal of abstractive summarization isto improve the focus of summary, reduce its redundancy andkeeps a good compression rate. This paper is a study of var-ious methods used for abstractive summarization. The mainidea behind these methods has been discussed along with itsstrength and weakness.
1IntroductionToday, information is growing very rapidly over theinternet. People use the internet to find information throughinformation retrieval (IR) tools such as Google, Yahoo, Bingand so on. However, with the exponential growth of infor-mation on the internet, information abstraction or summaryof the retrieved results has become necessary for users. Thisbrings text summarization into the picture. Text summariza-tion help users to quickly understand the large volume of in-formation. A document summary keeps its main content,helps user to understand and interpret large volume of text inthe document and reduce user’s time for finding the key in-formation in the document.
Summarization done by humanmakes a lot of efforts as, first it is required to read the wholearticle or document, then need to find the key concepts ormain ideas from the article, then finally need to generate anew summary using the main key concepts and ideas. Forhumans, generating a summary is a straight forward processbut it is time consuming. Therefore, the need for automatedsummaries become more and more apparent to automaticallygenerate the summary.Text summarization is the process of extracting salient in-formation from the source text and presenting that informa-tion to the user in the form of summary. It can analyze a mas-sive volume of data and represent it in a concise way. Textsummarization process can be broadly classified into cate-gories, extractive summarization and abstractive summariza-tion. The goal of Extractive summarization is to extract themost significant representative sentences from the text docu-ments and group them to produce a summary. However, ab-stractive summarization requires natural language processingtechniques such as semantic representation, natural languagegeneration, and compression techniques.
Abstractive Sum-marization aims to interpret and examines the source text andcreates a concise summary. Extractive summarizers lose alot of information from the input as they only extract a fewimportant sentences from the documents to create the finalsummary. To prevent information loss it is required to aggre-gate information from multiple sentences.
Abstractive sum-marization has gained popularity due to its ability of generat-ing new sentences to convey the important information fromtext documents. An abstractive summarizer should presentthe summarized information in a coherent form that is easilyreadable and grammatically correct. Readability or linguisticquality is an important indicator of the quality of a summary.2Related WorksConcepts fusion and generalization1 focus on de-tecting and reducing generalizable sentences.i.e.
differentconcepts appearing in a sentence are replaced by one conceptwhich covers the meanings of all of them. This methodimproves the quality of the generated text summaries bymentioning key(general) concepts and shorten sentencesby producing more general sentences which is usefulin intelligent systems. This is achieved through (1) thedetection and extraction of generalizable sentences and (2)the generation and reduction of the space of generalizationversions. The system which implements this approachshould able to generate an output sentence like “Sue atefruits and vegetables” or “Sue ate some food” from an inputsentence like “Sue ate bananas, apples and potatoes”.To automatically generate the generalization and fusionof the concepts of a given sentence it is required to gothrough the following steps.
The first step is to segment thegiven text into a set of sentences and performing syntacticparsing on it. The next step is to decide whether a givensentence is generalizable or not. If it is, then differenthyperonymy paths for the generalizable concepts of thegeneralizable sentence are generated. This step leads to thegeneration of different possible generalizations(versions)of the sentence. In next step the space of generalizationversions are reduced in order to get a set of generalizationversions that are acceptable in natural language. A heuristicbased and a Machine Learning based model are proposed toachieve this. Finally, once the best generalization version isfound, the compressed sentence is generated.
The method-ology used in this approach is limited to generalize conceptfrom a single sentence only rather than generating summaryfrom multiple sentences.Semantic graph method used in2 summarizes a textdocument by generating a Rich Semantic Graph(RSG) forthe document, reducing the generated semantic graph andfinally produces abstract summary from the reduced graph.The approach used in this approach consists of three phasesas shown in the figure 1. The first phase represents the in-put document by a Rich Semantic Graph(RSG). In RSG,verbs and nouns of the input document are represented asgraph nodes along with edges corresponding to semantic andtopological relations between them. The graph nodes are in-stances of the corresponding verb and noun classes in thedomain ontology. In this step, all the pre-processing proce-dures are done such as named entity recognition, morpho-logical and syntactic analysis, co-reference and pronominalresolution etc. The next step in RSG creation phase is toconstruct Rich semantic sub graphs for each pre-processedsentence.
For this to happen, it is required to identify differ-ent word concepts for nouns and verbs in the sentence basedon domain ontology. By analysing dependency relation be-tween the word concepts multiple Rich Semantic Sub graphsare generated. The highest ranked sub graphs for each pre-processed sentence are taken and merge them to form thefinal Rich Semantic Graph. The second phase reduces theRich Semantic Graph to more reduced graph by applyingsome heuristic rules. Finally in third phase, the abstractivesummary is generated from the reduced graph.
Fig. 1.tionSemantic Graph Reduction for Abstractive text summariza-The methodology used in 3 creates a chunk graph forthe input sentences and from these chunk graph summary isgenerated.
In this approach initially sentences are clusteredbased on the distance between sentences. Then in next stepchunk graphs are created for each cluster of sentences, whereeach chunk represents a node in graph. In this approach,chunk graphs are used instead of word graph which is to in-clude more semantic information to the summary. Travers-ing these graph generates a large number of different pathsthrough the nodes.
Selecting the most informative path fromthese paths creates a best summary for the cluster of sen-tences. Informative paths are selected by applying a heuristicsearch method called Beam search to the set of paths. To en-sure the linguistic quality of summary, a predetermined Re-current Neural Network is used. After finding out the sum-mary for all clusters, final summary is generated with thehelp of LexRank algorithm, for arranging summaries in anordered and efficient manner.The Integer Linear Programming (ILP) based ap-proach4 consists of two steps(1) Aligning similar sentencesThe methodology used in this approach generates a more in-formative summary with high linguistic quality determinedusing an N- gram language model.
It also prevents redun-dant information from being include in summary by usingan inter-sentence redundancy constraint.