Python BS4 Code
The BS4 stands for BeautifulSoup version 4.x. The BeautifulSoup is a Python library which is used for pulling out data of the HTML & XML files using the Python program. The BeautifulSoup library was created basically for the purpose of web scraping.
What is the web scraping?
Web scraping is a technique through which we can extract the large amount of data from many websites at once. The web scraping plays a very important role when we want to import or scrap data from many other websites by writing a few lines of code. Following are some important points of the web scraping process:
- It extract the data from the websites in the unstructured format.
- By web scraping process, we can convert the extracted unstructured format data of website into structured format.
- One major important point of web scraping is that, it is legal only when we are extracting the data from the sites that allowed to used it legally.
Coming back to Python BS4 code, The BeautifulSoup library works with a parser which then provides a way to us for navigating, modifying and searching the parse tree in the web scrapping process.
As of Now, the latest version of BeautifulSoup or BS4 is 4.9.3 which we will be using in this part of tutorial. Now, let’s move to the installation part of the BeautifulSoup library in our device. Only after that we will start working with the BeautifulSoup library in our Python program.
Installation of Python BeautifulSoup (BS4) library:
The BeautifulSoup library of Python is not an in-built library of it. We have to install this library before we start working on it. To install the BeautifulSoup, we can use the pip installer. We have to follow the below given steps to install the BeautifulSoup library in our device:
Step 1: Open the command prompt terminal in the system.
Step 2: Write the following command in terminal of command prompt:
pip install bs4
Step 3: Now, press the ‘enter’ key and command prompt will start installing latest version of BeautifulSoup in our device.
As we can see in the above image, the latest version of BeautifulSoup library is successfully installed in our system.
Installing a Parser:
The BeautifulSoup library supports third party Python external libraries and HTML parser. We have to install them according to our need while working with BeautifulSoup library. Following is the list of some HTML or XML parsers with their typical usages that BeautifulSoup library supports:
- Python’s html.parser: The typical usage of Python’s html.parser is ‘BeautifulSoup(markup,”html.parser”)’.
- lxml’s XML parser: The typical usage of lxml’s XML parser is ‘BeautifulSoup(markup,”lxml-xml”)’.
- lxml’s HTML parser: The typical usage of lxml’s HTML parser is ‘BeautifulSoup(markup,”lxml”)’.
- Html5lib: The typical usage of Html5lib parser is ‘BeautifulSoup(markup,”html5lib”)’.
Here we will use the Html5lib parser with our BeautifulSoup library and work on it.