This article walks you through building a news article summarizer web application using newspaper3k, huggingfaces & streamlit. We will be building a web app in which we can submit the news url and the news summary will be displayed. This article is inspired from here.

Final news summarizer app



The application works as a 3 step process as below,

3 modules in the app

newspaper3k - for extracting news articles

Newspaper3k is a popular library used to extract articles & metadata based on python3. It has good community support and 11k github stars. We use this module to extract article text from the link. Following function takes the url and returns the extracted text.

def download_and_parse_article(url):
    from newspaper import Article
    article = Article(url)
    article.download()
    article.parse()
    return article.text

huggingfaces-transformers - for summarization

Huggingfaces is a large community platform with thousands of NLP models ready to use. Here we used ‘sshleifer/distilbart-cnn-12-6’, which is the default model for summarization in huggingfaces. We use tarnsformers pipeline to load our summarization model to a variable. On calling the below function, it returns the model.

def load_summarize_model():
    from transformers import pipeline
    # model = pipeline("summarization", model='sshleifer/distilbart-cnn-12-6')
    model = pipeline("summarization")
    return model
summ = load_summarize_model()

streamlit - for interactive web UI

Streamlit is a tool used by ML practitioners for building data apps quickly. It turns python scripts into web UI which can be used as a interface for the model, without any front end coding exprience. The code below creates a UI to submit the link & view summary text. It also has parameter to vary max & min output length. source

import streamlit as st
st.title("Article Summarizer")
st.markdown("Paste any article link below and click on the 'Summarize' button.")
st.markdown("*Note:* Article will be truncated if it is lengthy!")
link = st.text_area('Paste your link here...', "https://towardsdatascience.com/a-guide-to-the-knowledge-graphs-bfb5c40272f1", height=30)
button = st.button("Summarize")
min_length = st.sidebar.slider('Min summary length', min_value=10, max_value=100, value=50, step=10)
max_length = st.sidebar.slider('Max summary length', min_value=30, max_value=700, value=100, step=10)
 
with st.spinner("Parsing article and Summarizing..."):
    if button and link:
        # get the text
        text = download_and_parse_article(link)  
        # summarize the text
        summary = summ(text,
                       truncation=True,
                       max_length = max_length,
                       min_length = min_length,
                       #num_beams=num_beams,
                       do_sample=True,
                       early_stopping=True,
                       repetition_penalty=1.5,
                       length_penalty=1.5)[0]
        # display the summary
        st.markdown("**Summary:**")
        st.write(summary['summary_text'])
Integrating model with streamlit UI


Conclusion

To put together, we used 3 pre existing modules to do our job. This made it so easy that we could deploy our app in few lines of python code.

Thank you for reading :blush: