How to Scrape Walmart Product Data Using Python

How-to-Scrape-Walmart-Product-Data-Using-Python

 

The process of scraping a website for data is a technique that is used to extract information from a website. In this blog post, we’ll learn how to scrape product data from Walmart’s website using Python. We’ll utilize the requests library to fetch the HTML content of the webpage, BeautifulSoup to parse the HTML, and save the extracted data in a JSON file.

Prerequisites:

Before we begin, make sure you have the following libraries installed:

  • requests
  • beautifulsoup4

You can install them using pip:

pip install requests beautifulsoup4

Step 1:

Import the Required Libraries First, we need to import the necessary libraries:

import requests
from bs4 import BeautifulSoup
import json

Step 2:

Obtain the Access Token To scrape data from Walmart’s website, we’ll use the QuickScraper API, which requires an access token. You can obtain your access token by signing up at app.quickscraper.co.

access_token = 'L5vnM4n1pI1J8WZYNh' #access_token = Get your access token from app.quickscraper.co

Step 3:

Construct the URL Next, we’ll construct the URL to fetch the HTML content of the Walmart search page for mobile phones. We’ll use the access token obtained in the previous step and the requests.get() method to retrieve the HTML content.

url = f"<https://api.quickscraper.co/parse?access_token={access_token}&url=https://www.walmart.com/search?q=mobile>"
print(url)
response = requests.get(url)
html_content = response.content

Step 4:

Parse the HTML Content Using BeautifulSoup After fetching the HTML content, we’ll use BeautifulSoup to parse it and extract the product data. We’re interested in the div elements with the class ['mb0', 'ph1', 'ph0-xl', 'pt0-xl', 'pb3-m', 'bb'], which contain product information.

soup = BeautifulSoup(html_content, 'html.parser')
productItems = soup.find_all('div', class_=['mb0', 'ph1', 'ph0-xl', 'pt0-xl', 'pb3-m', 'bb'])

Step 5:

Extract Product Data Now, we’ll loop through each product item and extract the title, price, and image URL. We’ll store this data in a dictionary and append it to a list.

products = []
for product in productItems:
    title = product.find("span", class_="w_iUH7").text.strip() if product.find("span", class_="w_iUH7") else None
    price = product.find('div', class_=['lh-title']).text.strip() if product.find('div', class_=['lh-title']) else None
    img_tag = product.find("img")["src"]
    foundItem = {
        "title": title,
        "price": price,
        "img_tag": img_tag,
    }
    products.append(foundItem)

Step 6:

Save the Data to a JSON File Finally, we’ll save the extracted product data to a JSON file named products.json.

with open("products.json", "w") as file:
    json.dump(products, file, indent=4)

Conclusion:

In this blog post, we learned how to scrape product data from Walmart’s website using Python. We utilized the requests library to fetch the HTML content, BeautifulSoup to parse the HTML, and saved the extracted data in a JSON file. By following these steps, you can easily scrape and extract data from various websites using Python.

Share on facebook
Share on twitter
Share on linkedin

Related Articles


Get started with 1,000 free API credits.

Get Started For Free
Copyright All Rights Reserved ©
💥 FLASH SALE: Grab 30% OFF on all monthly plans! Use code: QS-ALNOZDHIGQ. Act fast!
+