Tuesday, October 13, 2020

Modern Web Automation With Python and Selenium

 

TOPICS:

Motivation: Tracking Listening Habits

Setup

Test Driving a Headless Browser

Groovin’ on Tunes

Exploring the Catalogue

Building a Class

Collecting Structured Data

What’s Next and What Have You Learned?


Motivation: Tracking Listening Habits#

Suppose that you have been listening to music on bandcamp for a while now, and you find yourself wishing you could remember a song you heard a few months back.


Sure, you could dig through your browser history and check each song, but that might be a pain… All you remember is that you heard the song a few months ago and that it was in the electronic genre.


“Wouldn’t it be great,” you think to yourself, “if I had a record of my listening history? I could just look up the electronic songs from two months ago, and I’d surely find it.”


Today, you will build a basic Python class, called BandLeader that connects to bandcamp.com, streams music from the “discovery” section of the front page, and keeps track of your listening history.


The listening history will be saved to disk in a CSV file. You can then explore that CSV file in your favorite spreadsheet application or even with Python.


If you have had some experience with web scraping in Python, you are familiar with making HTTP requests and using Pythonic APIs to navigate the DOM. You will do more of the same today, except with one difference.


Today you will use a full-fledged browser running in headless mode to do the HTTP requests for you.


A headless browser is just a regular web browser, except that it contains no visible UI element. Just like you’d expect, it can do more than make requests: it can also render HTML (though you cannot see it), keep session information, and even perform asynchronous network communications by running JavaScript code.


If you want to automate the modern web, headless browsers are essential.


https://realpython.com/modern-web-automation-with-python-and-selenium/

No comments: