API web scraping Python
Python Tutorials & Methods

Scrape API based site with Python

Welcome to our another, web scraping with Python tutorial series. Today, we gonna do API web scraping with Python. I think scraping API based website is the easiest thing, we’ll ever do in your lifetime. We will be dealing with tokens, sessions, headers and form data. In short, today’s learning will be called API web scraping Python.

The website we gonna scrape is a API based data sharing website, called nepse-data . The only purpose of using this website is education and learning.

API web scraping with Python

I hope you guys know how to create a virtual environment and the required packages for this task, if not you can check the given links. So, let’s start hacking out in the system.

Analysing the website

Analysing a website is the most important part of web scraping. That’s why I’ll be explaining all the way through it.

Inspect Element

We will be using Inspect Element for this task, I hope you know how to do it. If not, use CTRL+SHIFT+I or

Right Click and Inspect

Accessing Form Data of API

In order to gain more information about the function of API, we need to check the Form Data. Also, it helps us to structure our post-request headers.

Getting API headers from Form Data – Network

From the above, Form Data information we got the idea of the headers

{
    'symbol': 'NEPSE',
    'specific_date': '2021-10-05',
    'start_date': '2021-09-15',
    'end_date': '2021-10-05',
    'filter_type': 'date-range',
    '_token': token
}

Later, we will be using the dynamic tokens, so I left a variable name there. Also, we knew that https://nepsealpha.com/nepse-data is the URL that handles GET and POST requests.

Using Request Sessions for API response

Session is set of information where all the web activities gets stored. So, using session and doing requests through it, makes our requests genuine-type.

import requests
import bs4

url = 'https://nepsealpha.com/nepse-data'
session = requests.Session()
response = session.get(url)

So, here we started a session with requests.Session(), and did a GET-request under that session.

We will also be using Beautifulsoup later.

Scraping token information

Now, we need the token-code created within the current session. For that, we need to scrape it, with Beautifulsoup4 and some magic of Python.

soup = bs4.BeautifulSoup(response.text, 'html5lib')
form = soup.find('form', {'id': 'logout-form'})
# this will return the token we wanted
token = form.find('input', {'name':'_token'})['value']

POST request to API

As we got the information about the headers, we can easily send a POST-request within the same session. Because of same session, our POST-request will be recognized as valid request. We’re sending genuine token-code within the session, that’s why we are awesome.

data_response = session.post(url,
    {
        'symbol': 'NEPSE',
        'specific_date': '2021-10-05',
        'start_date': '2021-09-15',
        'end_date': '2021-10-05',
        'filter_type': 'date-range',
        '_token': token
    }
)
print(data_response.json())

Here’s the clean JSON data, learn how to get clean JSON data.

{
    "data": [
        {
            "symbol": "NEPSE",
            "open": "2569.79",
            "high": "2620.32",
            "low": "2569.78",
            "close": "2612.65",
            "f_date": "2021-10-05",
            "percent_change": "1.43",
            "volume": "2996819683.01"
        },
        {
            "symbol": "NEPSE",
            "open": "2577.36",
            "high": "2596.67",
            "low": "2542.09",
            "close": "2570.81",
            "f_date": "2021-10-04",
            "percent_change": "-0.47",
            "volume": "3627799935.4"
        },
        ...,
        {
            "symbol": "NEPSE",
            "open": "2933.23",
            "high": "2936.89",
            "low": "2881.53",
            "close": "2889.03",
            "f_date": "2021-09-15",
            "percent_change": "-1.44",
            "volume": "5358097250.54"
        }
    ]
}

That’s all we did it. If you need source code for this project, check the link.

0 0 votes
Article Rating
Subscribe
Notify of
guest
1 Comment
Most Voted
Newest Oldest
Inline Feedbacks
View all comments

[…] Learn More […]

1
0
Would love your thoughts, please comment.x
()
x