How to Scrape a E-Commerce Website Using Instant Data Scraper

How to Scrape an E-Commerce Website Using Quick Scraper

 

Data-driven businesses, researchers, and individuals alike use web scraping to gather information. E-commerce websites, in particular, are gold mines of valuable data, ranging from product information to pricing and customer reviews. The process of extracting this data, however, can seem daunting to someone unfamiliar with web scraping.

We’re going to scrape an e-commerce site using Python and the BeautifulSoup library in this blog post. We’ll use eBay’s “Outdoor Sports” category as our case study and walk through the entire process, from setting up the environment to extracting and storing the desired data.

Prerequisites

Before we begin, ensure that you have the following installed on your system:

  • Python 3.x
  • pip (Python’s package installer)
  • requests library (pip install requests)
  • BeautifulSoup library (pip install beautifulsoup4)

Additionally, you’ll need a basic understanding of Python programming, HTML, and web development concepts.

Step 1: Import Required Libraries

First, let’s import the necessary libraries for our project:

import requests
from bs4 import BeautifulSoup
import csv

Here, we’re importing the requests library to fetch the web pages, BeautifulSoup for parsing the HTML content, and csv to store our scraped data in a CSV file.

Step 2: Define the Target URL

Next, we’ll define the target URL that we want to scrape. In our case, it’s the eBay “Outdoor Sports” category:

access_token = '6JQrJqjzL0MwEZ7EB4yap' #access_token = Get you access token from app.quickscraper.co
url = f"<https://api.quickscraper.co/parse?access_token={access_token}&url=https://www.ebay.com/b/Outdoor-Sports/159043/bn_1855398/>"

Step 3: Fetch the HTML Content

We’ll use the requests library to fetch the HTML content of the target URL:

response = requests.get(url)
html_content = response.content

The requests.get(url) function sends a GET request to the specified URL and retrieves the response. We then store the HTML content of the page in the html_content variable.

Step 4: Parse the HTML Content

Now, we’ll use BeautifulSoup to parse the HTML content and create a navigable tree-like structure:

soup = BeautifulSoup(html_content, 'html.parser')

The BeautifulSoup(html_content, 'html.parser') function creates a BeautifulSoup object, which represents the entire HTML document as a nested data structure. The second argument, 'html.parser', specifies the parser to be used for parsing the HTML content.

Step 5: Extract the Desired Data

With the HTML parsed, we can now start extracting the desired data. Let’s assume we want to scrape the following information for each product:

  • Product Title
  • Product Price
  • Product URL

Here’s how we can extract this data:

products = soup.find_all('li', {'class': 'carousel__snap-point'})

product_data = []

for product in products:
    title = product.find('div', {'class': 'b-info__title'}).text.strip() if product.find('div', {'class': 'b-info__title'}) else None
    price = product.find('div', {'class': 'b-info__price clearfix'}).text.strip() if product.find('div', {'class': 'b-info__price clearfix'}) else None
    url_element = product.find('a', {'class': 'b-tile'})
    url = url_element.get('href') if url_element else None

    product_data.append({
        'Title': title,
        'Price': price,
        'URL': url
    })

Let’s break down this code:

  1. products = soup.find_all('li', {'class': 'carousel__snap-point'}): This line finds all the HTML elements (li tags) with the class 'carousel__snap-point', which is where the product information is contained on the website.
  2. product_data = []: We create an empty list to store the extracted product data.
  3. for product in products:: We iterate over each product found on the page.
  4. title = product.find('div', {'class': 'b-info__title'}).text.strip() if product.find('div', {'class': 'b-info__title'}) else None: We find the <div> tag with the class 'b-info__title' within each product, extract its text content, and remove any leading/trailing whitespace using the strip() method. If the title is not found, we set it to None.
  5. price = product.find('div', {'class': 'b-info__price clearfix'}).text.strip() if product.find('div', {'class': 'b-info__price clearfix'}) else None: Similar to the title extraction, we find the <div> tag with the class 'b-info__price clearfix' and extract its text content, stripping any whitespace. If the price is not found, we set it to None.
  6. url_element = product.find('a', {'class': 'b-tile'}): We find the <a> tag with the class 'b-tile' and store it in url_element.
  7. url = url_element.get('href') if url_element else None: We retrieve the href attribute from url_element, which contains the product URL. If url_element is None, we set url to None.
  8. product_data.append({ 'Title': title, 'Price': price, 'URL': url }): We create a dictionary containing the extracted product title, price, and URL, and append it to the product_data list.

Step 6: Store the Data in a CSV File

Finally, we can store the extracted data in a CSV file for further analysis or processing:

with open('product_data.csv', 'w', newline='', encoding='utf-8') as csvfile:
    fieldnames = ['Title', 'Price', 'URL']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()

    for product in product_data:
        writer.writerow(product)

Here’s what’s happening:

  1. with open('product_data.csv', 'w', newline='', encoding='utf-8') as csvfile:: We open a new CSV file named 'product_data.csv' in write mode ('w'). The newline='' argument is used to avoid extra blank lines between rows, and encoding='utf-8' ensures that non-ASCII characters are handled correctly.
  2. fieldnames = ['Title', 'Price', 'URL']: We define the column names (field names) for the CSV file.
  3. writer = csv.DictWriter(csvfile, fieldnames=fieldnames): We create a DictWriter object, which allows us to write dictionaries (rows) to the CSV file. We pass the csvfile object and the fieldnames list as arguments.
  4. writer.writeheader(): This line writes the column headers (field names) to the CSV file.
  5. for product in product_data:: We iterate over each product in the product_data list.
  6. writer.writerow(product): For each product, we write its data (title, price, and URL) as a row in the CSV file.

After running this code, you should have a product_data.csv file in the same directory containing the scraped product data from the eBay “Outdoor Sports” category.

Conclusion

Web scraping can be a powerful tool for extracting valuable data from e-commerce websites, but it should be used responsibly and within legal boundaries. Always ensure that you respect the website’s terms of service and robots.txt file, and avoid overwhelming the server with excessive requests.

In this blog post, we covered the fundamental steps involved in scraping an e-commerce website using Python, BeautifulSoup, and the requests library. We explored how to fetch and parse HTML content, extract desired data, and store it in a CSV file for further analysis or processing.

Remember, the code provided in this blog post is specific to the eBay “Outdoor Sports” category and may need to be adapted for other websites or categories, as the HTML structure and class names can vary. However, the general approach and principles remain the same. 

Share on facebook
Share on twitter
Share on linkedin

Related Articles


Get started with 1,000 free API credits.

Get Started For Free
Copyright All Rights Reserved ©
💥 FLASH SALE: Grab 30% OFF on all monthly plans! Use code: QS-ALNOZDHIGQ. Act fast!
+