Welcome to our another, web scraping with Python tutorial series. If you are reading this and have been following Part 1 and Part 2, then congratulations! Today, we gonna do API web scraping with Python. I think scraping API based website is the easiest thing, we’ll ever do in your lifetime. We will be dealing with tokens, sessions, headers and form data. In short, today’s learning will be called API web scraping Python.
The website we gonna scrape is a API based data sharing website, called nepse-data . The only purpose of using this website is education and learning.
API web scraping with Python
I hope you guys know how to create a virtual environment and the required packages for this task, if not you can check the given links. So, let’s start hacking out in the system.
Analysing the website
Analysing a website is the most important part of web scraping. That’s why I’ll be explaining all the way through it.
Inspect Element
We will be using Inspect Element for this task, I hope you know how to do it. If not, use CTRL+SHIFT+I or

Inspect
Accessing Form Data of API
In order to gain more information about the function of API, we need to check the Form Data. Also, it helps us to structure our post-request headers.

From the above, Form Data information we got the idea of the headers
{
'symbol': 'NEPSE',
'specific_date': '2021-10-05',
'start_date': '2021-09-15',
'end_date': '2021-10-05',
'filter_type': 'date-range',
'_token': token
}
Later, we will be using the dynamic tokens, so I left a variable name there. Also, we knew that https://nepsealpha.com/nepse-data
is the URL that handles GET
and POST
requests.
Using Request Sessions for API response
Session is set of information where all the web activities gets stored. So, using session and doing requests through it, makes our requests genuine-type.
import requests
import bs4
url = 'https://nepsealpha.com/nepse-data'
session = requests.Session()
response = session.get(url)
So, here we started a session with requests.Session()
, and did a GET-request
under that session.
We will also be using Beautifulsoup
later.
Scraping token information
Now, we need the token-code
created within the current session. For that, we need to scrape it, with Beautifulsoup4 and some magic of Python.
soup = bs4.BeautifulSoup(response.text, 'html5lib')
form = soup.find('form', {'id': 'logout-form'})
# this will return the token we wanted
token = form.find('input', {'name':'_token'})['value']
POST request to API
As we got the information about the headers, we can easily send a POST-request
within the same session. Because of same session, our POST-request
will be recognized as valid request. We’re sending genuine token-code
within the session, that’s why we are awesome.
data_response = session.post(url,
{
'symbol': 'NEPSE',
'specific_date': '2021-10-05',
'start_date': '2021-09-15',
'end_date': '2021-10-05',
'filter_type': 'date-range',
'_token': token
}
)
print(data_response.json())
Here’s the clean JSON data, learn how to get clean JSON data.
{
"data": [
{
"symbol": "NEPSE",
"open": "2569.79",
"high": "2620.32",
"low": "2569.78",
"close": "2612.65",
"f_date": "2021-10-05",
"percent_change": "1.43",
"volume": "2996819683.01"
},
{
"symbol": "NEPSE",
"open": "2577.36",
"high": "2596.67",
"low": "2542.09",
"close": "2570.81",
"f_date": "2021-10-04",
"percent_change": "-0.47",
"volume": "3627799935.4"
},
...,
{
"symbol": "NEPSE",
"open": "2933.23",
"high": "2936.89",
"low": "2881.53",
"close": "2889.03",
"f_date": "2021-09-15",
"percent_change": "-1.44",
"volume": "5358097250.54"
}
]
}
That’s all we did it. If you need source code for this project, check the link.
One thought on “Scrape API based site with Python”