Web scraping is the process of extracting data from websites automatically. It is a powerful technique that allows you to gather large amounts of data quickly and efficiently. In this blog post, we’ll learn how to scrape Google Search results data using the Mechanicalsoup library in Python.
Prerequisites
Before we start, you’ll need to have the following installed on your system:
- Python 3.x
- Mechanicalsoup library
- BeautifulSoup4 library
- Requests library
You can install these libraries using pip:
pip install mechanicalsoup
pip install beautifulsoup4
pip install requests
Step 1: Import the Required Libraries
import mechanicalsoup
import requests
from bs4 import BeautifulSoup
import csv
import json
Step 2: Connect to the Website
# Connect to Website
browser = mechanicalsoup.StatefulBrowser()
access_token = '6JrJjz0MVZ7EBN584a' # Access token from app.quickscraper.co
url = f"https://api.quickscraper.co/parse?access_token={access_token}&url=https://www.google.com/search?q=laptop&rlz=1C1CHBF_enIN979IN979&oq=laptop&gs_lcrp=EgZjaHJvbWUqDAgAEEUYOxixAxiABDIMCAAQRRg7GLEDGIAEMgYIARBFGEAyDQgCEAAYgwEYsQMYgAQyCggDEAAYsQMYgAQyDQgEEAAYgwEYsQMYgAQyBggFEEUYPTIGCAYQRRg8MgYIBxBFGDzSAQc5NTVqMGo3qAIAsAIA&sourceid=chrome&ie=UTF-8"
page = browser.get(url)
Note: In the provided code, we’re using the api.quickscraper.co service to bypass Google’s anti-scraping measures. You’ll need to replace the access_token value with your own token from the service.
Step 3: Parse the HTML
# Parse HTML
soup = BeautifulSoup(page.content, 'html.parser')
items = soup.find_all('div', class_=['g', 'Ww4FFb', 'vt6azd', 'asEBEc', 'tF2Cxc'])
Step 4: Extract the Search Results Data
google_search_items = []
for item in items:
title = item.find('h3', class_=['LC20lb', 'MBeuO', 'DKV0Md']).text.strip() if item.find('h3', class_=['LC20lb', 'MBeuO', 'DKV0Md']) else None
desciption = item.find('div', class_=['VwiC3b', 'yXK7lf', 'lVm3ye', 'r025kc', 'hJNv6b', 'Hdw6tb']).text.strip() if item.find('div', class_=['VwiC3b', 'yXK7lf', 'lVm3ye', 'r025kc', 'hJNv6b', 'Hdw6tb']) else None
url_element = item.find('a', {'jsname': 'UWckNb'})
url = url_element.get('href') if url_element else None
foundItem = {
"title": title,
"desciption": desciption,
"url": url,
}
google_search_items.append(foundItem)
Step 5: Save the Data to a JSON File
with open("google_search_items.json", "w") as file:
json.dump(google_search_items, file, indent=4)
Conclusion
Congratulations! You’ve learned how to scrape Google Search results data using the Mechanicalsoup library in Python. This technique can be useful for various purposes, such as data analysis, market research, or content aggregation. However, it’s essential to respect website terms of service and use web scraping responsibly.
Remember to replace the access_token value with your own token from the app.quickscraper.co service, as using the provided token may result in errors or rate limiting.
Happy scraping!