Scrape Anything with GPT-4o (Vision Based Scraping)

Scrape Anything with GPT-4o (Vision Based Scraping)

Learn how to automate web scraping using GPT-4's vision capabilities and Make.com to extract and save data from web page screenshots.

Jul 11, 2024
Learn how to automate web scraping using GPT-4's vision capabilities and Make.com to extract and save data from web page screenshots.
Learn how to automate web scraping using GPT-4's vision capabilities and Make.com to extract and save data from web page screenshots.
  • Use Case: Using GPT-4's vision capabilities to scrape data from screenshots of web pages and save it to a spreadsheet.
  • Tool: GPT-4 Vision, Make.com, Scraping Bee
  • Time for Learning: 20-30 minutes to set up the vision-based scraper.

Video preview

Summary

This tutorial demonstrates how to use GPT-4's vision capabilities to perform web scraping by interpreting and extracting data from screenshots. By combining tools like Make.com and Scraping Bee, the process involves taking screenshots of web pages, analyzing the content using GPT-4, and saving the extracted data into a spreadsheet. The video focuses on setting up the scraper to gather key information from LinkedIn company profiles, such as company name, followers, employees, and website. This vision-based approach bypasses traditional scraping barriers, making it a versatile solution for extracting web data without coding experience.

Bear’s take

This example is a great demonstration of using GPT-4.0 to scrape web data by taking screenshots, analyzing the content, and saving it to a spreadsheet. Using Make.com for this process is particularly helpful for those without coding experience. I found it especially useful for simplifying complex data extraction tasks, and I'll definitely be trying it out for my own projects.

What you’ll learn

You’ll learn how to set up a vision-based web scraper using GPT-4 and Make.com. The video guides you through creating a new scenario in Make.com, setting up a spreadsheet to watch for new rows, and using Scraping Bee to take screenshots of web pages. You'll see how to handle cookies for authenticated sessions and how to configure GPT-4 to analyze the screenshot and extract specific data points. The tutorial also covers encoding images in base64 format for GPT-4 processing and parsing the extracted data to populate a Google Sheet. This method provides an efficient way to scrape web data while overcoming traditional barriers.

Key steps

  1. Set Up Make.com Scenario: Create a new scenario in Make.com for the vision-based scraper.
  1. Connect Spreadsheet: Link your Google Sheet to Make.com to store the scraped data.
  1. Use Scraping Bee: Set up Scraping Bee to take screenshots of web pages.
  1. Handle Cookies: Capture and use cookies to bypass login screens for authenticated sessions.
  1. Encode Images: Convert screenshots to base64 format for GPT-4 processing.
  1. Analyze with GPT-4: Configure GPT-4 to analyze the screenshot and extract specified data points.
  1. Parse and Save Data: Parse the extracted data and update the Google Sheet with the results.

Next step

  • Experiment with different web pages to see how GPT-4 Vision can extract various types of data.
  • Explore additional features of Make.com and GPT-4 to enhance your web scraping projects.
  • Consider using this method for data collection tasks in your personal or professional projects.

Links or resource


Hi there 👋  I'm Bear - a Product Designer, Podcaster, and Author who loves design, tech, and productivity.
Check my learnings at Bearwith.AI, and subscribe to my newsletter for more awesome AI tips: newsletter.bearwith.AI 🐻 & 🤖

通过 GPT-4o 实现任意数据抓取(基于视觉的抓取)

学习如何利用 GPT-4 的视觉功能和 Make.com 来自动抓取网页截图中的数据并保存。
  • 使用场景: 使用 GPT-4 的视觉功能从网页截图中提取数据并保存到电子表格中。
  • 工具: GPT-4 Vision, Make.com, Scraping Bee
  • 学习时间: 20-30 分钟设置基于视觉的数据抓取工具。


摘要

该教程展示了如何使用 GPT-4 的视觉功能,通过解析和提取截图中的数据来进行网页抓取。通过结合 Make.com 和 Scraping Bee 等工具,该过程包括对网页进行截图,利用 GPT-4 分析内容,并将提取的数据保存到电子表格中。视频重点介绍了如何设置抓取工具以收集 LinkedIn 公司资料中的关键信息,如公司名称、关注者、员工人数和网站。这种基于视觉的方法避开了传统抓取障碍,使其成为无需编码经验的多功能网页数据提取解决方案。

Bear 的观点

这个示例很好地展示了如何使用 GPT-4.0 通过截图、分析内容并将其保存到电子表格中来抓取网页数据。使用 Make.com 进行此过程对那些没有编码经验的人特别有帮助。我发现它特别有助于简化复杂的数据提取任务,并且我一定会在自己的项目中尝试它。

你将学到的内容

你将学会如何使用 GPT-4 和 Make.com 设置一个基于视觉的网页抓取工具。视频将指导你在 Make.com 中创建一个新场景,设置一个监控新行的电子表格,并使用 Scraping Bee 对网页进行截图。你将了解到如何处理经过认证会话的 Cookie,以及如何配置 GPT-4 来分析截图并提取特定的数据点。教程还介绍了如何将图像编码为 Base64 格式以供 GPT-4 处理,并解析提取的数据以填充 Google 表格。这种方法提供了一种高效的方式来抓取网页数据,同时克服了传统的障碍。

关键步骤

  1. 设置 Make.com 场景:Make.com 中创建一个新的基于视觉的抓取工具场景。
  1. 连接电子表格: 将你的 Google 表格链接到 Make.com 以存储抓取的数据。
  1. 使用 Scraping Bee: 设置 Scraping Bee 对网页进行截图。
  1. 处理 Cookie: 捕获和使用 Cookie 以绕过登录屏幕的认证会话。
  1. 图像编码: 将截图转换为 Base64 格式以供 GPT-4 处理。
  1. 使用 GPT-4 分析: 配置 GPT-4 分析截图并提取指定的数据点。
  1. 解析和保存数据: 解析提取的数据并将结果更新到 Google 表格中。

下一步

  • 尝试不同的网页,看看 GPT-4 Vision 如何提取各种类型的数据。
  • 探索 Make.com 和 GPT-4 的附加功能,以增强你的网页抓取项目。
  • 考虑将此方法用于你的个人或专业项目中的数据收集任务。

链接或资源