Scrape Anything with GPT-4o (Vision Based Scraping)

TABLE OF CONTENTS

What you’ll learn

Key steps

Next step

Links or resource

通过 GPT-4o 实现任意数据抓取（基于视觉的抓取）

Scrape Anything with GPT-4o (Vision Based Scraping)

Learn how to automate web scraping using GPT-4's vision capabilities and Make.com to extract and save data from web page screenshots.

Bear Liu

Jul 11, 2024

Learn how to automate web scraping using GPT-4's vision capabilities and Make.com to extract and save data from web page screenshots.

Use Case: Using GPT-4's vision capabilities to scrape data from screenshots of web pages and save it to a spreadsheet.

Tool: GPT-4 Vision, Make.com, Scraping Bee

Time for Learning: 20-30 minutes to set up the vision-based scraper.

Summary

This tutorial demonstrates how to use GPT-4's vision capabilities to perform web scraping by interpreting and extracting data from screenshots. By combining tools like Make.com and Scraping Bee, the process involves taking screenshots of web pages, analyzing the content using GPT-4, and saving the extracted data into a spreadsheet. The video focuses on setting up the scraper to gather key information from LinkedIn company profiles, such as company name, followers, employees, and website. This vision-based approach bypasses traditional scraping barriers, making it a versatile solution for extracting web data without coding experience.

Bear’s take

This example is a great demonstration of using GPT-4.0 to scrape web data by taking screenshots, analyzing the content, and saving it to a spreadsheet. Using Make.com for this process is particularly helpful for those without coding experience. I found it especially useful for simplifying complex data extraction tasks, and I'll definitely be trying it out for my own projects.

What you’ll learn

You’ll learn how to set up a vision-based web scraper using GPT-4 and Make.com. The video guides you through creating a new scenario in Make.com, setting up a spreadsheet to watch for new rows, and using Scraping Bee to take screenshots of web pages. You'll see how to handle cookies for authenticated sessions and how to configure GPT-4 to analyze the screenshot and extract specific data points. The tutorial also covers encoding images in base64 format for GPT-4 processing and parsing the extracted data to populate a Google Sheet. This method provides an efficient way to scrape web data while overcoming traditional barriers.

Key steps

Set Up Make.com Scenario: Create a new scenario in Make.com for the vision-based scraper.

Connect Spreadsheet: Link your Google Sheet to Make.com to store the scraped data.

Use Scraping Bee: Set up Scraping Bee to take screenshots of web pages.

Handle Cookies: Capture and use cookies to bypass login screens for authenticated sessions.

Encode Images: Convert screenshots to base64 format for GPT-4 processing.

Analyze with GPT-4: Configure GPT-4 to analyze the screenshot and extract specified data points.

Parse and Save Data: Parse the extracted data and update the Google Sheet with the results.

Next step

Experiment with different web pages to see how GPT-4 Vision can extract various types of data.

Explore additional features of Make.com and GPT-4 to enhance your web scraping projects.

Consider using this method for data collection tasks in your personal or professional projects.

Links or resource

Make.com

Scraping Bee

OpenAI GPT-4

Hi there 👋 I'm Bear - a Product Designer, Podcaster, and Author who loves design, tech, and productivity.

Check my learnings at Bearwith.AI, and subscribe to my newsletter for more awesome AI tips: newsletter.bearwith.AI 🐻 & 🤖

通过 GPT-4o 实现任意数据抓取（基于视觉的抓取）

学习如何利用 GPT-4 的视觉功能和 Make.com 来自动抓取网页截图中的数据并保存。

使用场景: 使用 GPT-4 的视觉功能从网页截图中提取数据并保存到电子表格中。

工具: GPT-4 Vision, Make.com, Scraping Bee

学习时间: 20-30 分钟设置基于视觉的数据抓取工具。

https://youtube.com/watch?v=99sfhJXh0ZQ&si=3wccXgamn5vHbniC

摘要

该教程展示了如何使用 GPT-4 的视觉功能，通过解析和提取截图中的数据来进行网页抓取。通过结合 Make.com 和 Scraping Bee 等工具，该过程包括对网页进行截图，利用 GPT-4 分析内容，并将提取的数据保存到电子表格中。视频重点介绍了如何设置抓取工具以收集 LinkedIn 公司资料中的关键信息，如公司名称、关注者、员工人数和网站。这种基于视觉的方法避开了传统抓取障碍，使其成为无需编码经验的多功能网页数据提取解决方案。

Bear 的观点

这个示例很好地展示了如何使用 GPT-4.0 通过截图、分析内容并将其保存到电子表格中来抓取网页数据。使用 Make.com 进行此过程对那些没有编码经验的人特别有帮助。我发现它特别有助于简化复杂的数据提取任务，并且我一定会在自己的项目中尝试它。

你将学到的内容

你将学会如何使用 GPT-4 和 Make.com 设置一个基于视觉的网页抓取工具。视频将指导你在 Make.com 中创建一个新场景，设置一个监控新行的电子表格，并使用 Scraping Bee 对网页进行截图。你将了解到如何处理经过认证会话的 Cookie，以及如何配置 GPT-4 来分析截图并提取特定的数据点。教程还介绍了如何将图像编码为 Base64 格式以供 GPT-4 处理，并解析提取的数据以填充 Google 表格。这种方法提供了一种高效的方式来抓取网页数据，同时克服了传统的障碍。