Go With The Flow: Automating Amazon Data Scraping with Bookmarklets and Chrome Extensions

Seller Sessions Amazon FBA and Private Label

Summary

In this episode of Seller Sessions, Danny and Ritu discuss automating the process of extracting information from Amazon product pages, a task typically done manually. They explore different approaches, including bookmarklets and Chrome plugins, highlighting how these tools can be used to scrape data such as product availability and customer reviews. Danny shares his method using Cloud Chrome and Rufus to gather insights, emphasizing the importance of framing questions to understand customer objections. He also previews a new tool designed to reduce the cost of generating high-quality images and videos for Amazon listings, aiming to blend the scientific and design aspects of optimization.

Chapters

00:00:00
Introduction to Automating Amazon Page Scraping
The episode begins with an introduction to the hosts, Danny and Ritu, and a brief overview of their recent projects. Danny mentions completing a taxonomy database for Amazon's catalog after a thousand hours of work, emphasizing the deterministic but filtering nature of the catalog system. He explains how content in a listing can impact the product type, using an example of a product being categorized as orthopedic due to the proximity of the words "pain" and "wrist." The hosts then transition into the main topic: automating the process of scraping information from Amazon pages, noting they both approached the problem from different angles but achieved similar results.
00:04:12
The Basics of Automating Amazon Data Extraction
Ritu introduces the core problem: automating the mundane task of scraping Amazon pages for information. She mentions existing tools like Keeper, which provide API access to some data, but notes that not all information, such as that from Rufus, is captured. The discussion then shifts to different approaches to automation, including gen tech and browser-based methods. Danny highlights his lack of programming background and his focus on building a UI to extract value from the scraped data. Ritu emphasizes the importance of understanding the fundamentals before diving into building or buying solutions.
00:08:36
Dissecting an Amazon Product Page for Scraping
Ritu begins to dissect the structure of an Amazon product page, explaining how the page is constructed server-side and then displayed in the browser. She introduces the concept of the DOM (Document Object Modifier) and how it allows for interactive inspection of page elements. The discussion emphasizes that scrapers read the DOM to extract information, which is organized in frames and containers. The goal is to extract specific frames containing desired information and then use AI, like Claude, to format and analyze the extracted data. Examples include extracting customer reviews to monitor their impact on conversion rates and checking product availability.
00:12:38
Considerations and Approaches to Amazon Scraping
Danny discusses the challenges of using browser automation, including the potential for high token usage and the presence of dynamic HTML. He points out that a single product detail page (PDP) can contain a large amount of data, including information about other products, which can lead to confusion. Danny also mentions that while scraping is against Amazon's terms of service, there's a fair usage understanding, as even Amazon scrapes the web. He contrasts the complexity of setting up automated scraping workflows with the simplicity of copy-pasting data, arguing that manual methods can be more efficient for individual problem diagnosis. He also notes the importance of interrogating the data offline and questioning AI tools like Claude to ensure accuracy.
00:16:09
Utilizing Bookmarklets for Data Extraction
Ritu introduces bookmarklets as a method for running code directly within a browser. She explains that a bookmarklet is essentially a bookmark that contains JavaScript code, which executes when the bookmark is clicked. This allows for automating tasks on a specific page. Ritu demonstrates an autocomplete bookmarklet that simulates typing keywords into the Amazon search bar and extracts the autocomplete suggestions. This hands-free operation copies the results into a pop-up window, showcasing a simple yet effective use case for bookmarklets.

Keywords

Taxonomy database

A structured system for classifying and organizing information, in this case, Amazon's entire product catalog. Danny mentions building one for Amazon, which includes every node, item type keyword, and product type.

GL (Generic Level)

Refers to the classification of a product within Amazon's product catalog. The speakers discuss how content in a listing can influence the product type assigned by Amazon.

Highlights

What people don't realize is I need to show some stuff for people to understand it because when I say your GL might not be wrong, they go, what? Because people try and fix stuff that they think are broken but there's inputs and there's outputs.
00:01:38

What's really interesting at the moment is looking at things through a different lens, which goes back to what we're going to discuss today. We're both doing something, but we're getting the same result from different angles.
00:03:55