The process of scraping a website for data is a technique that is used to extract information from a website. In this blog post, we’ll learn how to scrape product data from Walmart’s website using Python. We’ll utilize the requests library to fetch the HTML content of the webpage, BeautifulSoup to parse the HTML, and save the extracted data in a JSON file.
Prerequisites:
Before we begin, make sure you have the following libraries installed:
You can install them using pip:
pip install requests beautifulsoup4
Step 1:
Import the Required Libraries First, we need to import the necessary libraries:
import requests
from bs4 import BeautifulSoup
import json
Step 2:
Obtain the Access Token To scrape data from Walmart’s website, we’ll use the QuickScraper API, which requires an access token. You can obtain your access token by signing up at app.quickscraper.co.
access_token = 'L5vnM4n1pI1J8WZYNh' #access_token = Get your access token from app.quickscraper.co
Step 3:
Construct the URL Next, we’ll construct the URL to fetch the HTML content of the Walmart search page for mobile phones. We’ll use the access token obtained in the previous step and the requests.get() method to retrieve the HTML content.
url = f"<https://api.quickscraper.co/parse?access_token={access_token}&url=https://www.walmart.com/search?q=mobile>"
print(url)
response = requests.get(url)
html_content = response.content
Step 4:
Parse the HTML Content Using BeautifulSoup After fetching the HTML content, we’ll use BeautifulSoup to parse it and extract the product data. We’re interested in the div elements with the class ['mb0', 'ph1', 'ph0-xl', 'pt0-xl', 'pb3-m', 'bb'], which contain product information.
soup = BeautifulSoup(html_content, 'html.parser')
productItems = soup.find_all('div', class_=['mb0', 'ph1', 'ph0-xl', 'pt0-xl', 'pb3-m', 'bb'])
Step 5:
Extract Product Data Now, we’ll loop through each product item and extract the title, price, and image URL. We’ll store this data in a dictionary and append it to a list.
products = []
for product in productItems:
title = product.find("span", class_="w_iUH7").text.strip() if product.find("span", class_="w_iUH7") else None
price = product.find('div', class_=['lh-title']).text.strip() if product.find('div', class_=['lh-title']) else None
img_tag = product.find("img")["src"]
foundItem = {
"title": title,
"price": price,
"img_tag": img_tag,
}
products.append(foundItem)
Step 6:
Save the Data to a JSON File Finally, we’ll save the extracted product data to a JSON file named products.json.
with open("products.json", "w") as file:
json.dump(products, file, indent=4)
Conclusion:
In this blog post, we learned how to scrape product data from Walmart’s website using Python. We utilized the requests library to fetch the HTML content, BeautifulSoup to parse the HTML, and saved the extracted data in a JSON file. By following these steps, you can easily scrape and extract data from various websites using Python.