Overwhelming digital text on the internet. So much time to read even selected text. Imagine you have a tool to summarize and you can learn faster. Fortunately, python programming can help to summarize long text into small text. There are several methods to summarize. There are Luhn, Lexrank Edmundson, Latent Semantic Analysis, LSA, TextRank, Sumbasic, KL Sum. They have a different method to get the most important sentence in the text.
One of them is LexRank. It is a graph-based algorithm to identify the most important sentences. The algorithm uses a similarity function between different sentences. It uses a pre-defined threshold to build the graph of the documents, creating an edge between 2 sentences(nodes) every time the similarity is above the threshold. They also used a Pagerank-like scheme to rank the sentences(nodes).
In this article, we’re going to look at three steps to help you use this tool to get a summary of the text. Even, if you have no experience with python, you can do it.
First, we install python.
Next, install summy library.
Finally, we use a script and text file to make python and summy work.
Install Python Programming Language
First, let’s talk about how to install python.
To Install python, we download python installer then install it in desktop or personal computer.
Where is python installer? Go to this link https://www.python.org/.
Then, find download in the menu, you will get download link.
Klik download link. Choose download link that suit to your own computer type. If your computer is apple, choose python installer for mac. If your computer using windows operating system, choose python for windows. If your computer using linux, usually you has your own python. It is pre built program if you have linux.
After download, click and install. Click agree, next and yes. If you already success to install, you can test by using prompt. Open prompt, then type python, if prompt command show python version number and also introduction, you are success to install.
Install Sumy, a Python Library
Next step, after install python, we will install Sumy. Sumy is a python library that has a function to summarize the long text. Okay, to install Sumy we go to command prompt (windows), or terminal (mac and Linux). Then type this code “pip install sumy”. Then click enter on the keyboard. Wait until finished. If you already success to install sumy, you can test it by type python on command prompt/terminal, then type import sumy. If command prompt does not give an error message, if the prompt give you blank next line, you are successful.
Write Code to run Sumy
Finally, we write code to make sumy work. Copy this code below on notepad,
and save as lexrank.py. Dont save as txt, save it as py in a folder. It is code from http://dataexperiments.net/tag/summarization/
#Import library essentials from sumy.parsers.plaintext import PlaintextParser #We're choosing a plaintext parser here, other parsers available for HTML etc. from sumy.nlp.tokenizers import Tokenizer from sumy.summarizers.lex_rank import LexRankSummarizer #We're choosing Lexrank, other algorithms are also built in file = "source.txt" #name of the plain-text file parser = PlaintextParser.from_file(file, Tokenizer("english")) summarizer = LexRankSummarizer() summary = summarizer(parser.document, 5) #Summarize the document with 5 sentences for sentence in summary: print sentence
Create a file in notepad, and give name source.txt. Copy the text that you want to summarize, paste in that file in same folder where lexrank.py file is located.
Open IDLE python program, and by using this program, open file lexrank.py. Run program on IDLE. Then you will get top 5 most important sentences. It is explanation about IDLE Python.
If you want to change number of words, change number in this code below in the lexrank.py file. Change 5 to whatever number you want.
summary = summarizer(parser.document, 5)
Now, you already have tools to summarize long text. You already do three steps to create the tool, install python, install sumy and write and run code. Finally, You will able to create tool to summarize text.
Another computerized method to summarize, check it out.