Build Better Music Tools by Scraping Spotify Playlist Data

in web-scraping •  6 days ago 

Music streaming platforms like Spotify house mountains of data—track titles, artists, album info—that can power analytics, apps, or even personal projects. Extracting that data isn’t always simple, but Python makes it surprisingly accessible.
If you want to dive deep into Spotify playlists, grab detailed info, and automate the process, this guide is for you. We’ll cover everything: setting up the right Python tools, scraping with Selenium and BeautifulSoup, using Spotify’s API, and saving your data cleanly.

What You Need

Run these commands first:

pip install beautifulsoup4 selenium requests

Here’s the deal:

BeautifulSoup is your go-to for parsing static HTML content. Think: extracting a fixed list of tracks from a loaded page.
Selenium handles the tricky parts — dynamic content that loads on scroll or after clicking. It simulates a real browser’s behavior.
Requests is for straightforward API calls or simple HTTP requests.
Each has a unique strength. Use them wisely.

Prepare Selenium Using ChromeDriver

Selenium doesn’t work alone. It needs a web driver—like ChromeDriver—to operate the browser behind the scenes.
Download ChromeDriver from the official source.
Unzip and save it somewhere easy to find.
Test it quickly:

from selenium import webdriver

driver_path = "C:/webdriver/chromedriver.exe"  # Adjust this path
driver = webdriver.Chrome(driver_path)
driver.get("https://google.com")

If Chrome opens and loads Google, you’re good to go.

Scrape Spotify Playlist Data

Spotify playlists load tracks dynamically. So just fetching the page HTML won’t cut it. You need to:
Launch the page with Selenium.
Scroll down to load every song.
Parse the fully loaded HTML with BeautifulSoup.
Extract track title, artist, and duration from HTML elements.
Here’s what the relevant HTML looks like:

<div class="tracklist-row">
  <span class="track-name">Song Title</span>
  <span class="artist-name">Artist</span>
  <span class="track-duration">3:45</span>
</div>

Core Function to Scrape Spotify Playlist

Here’s a practical Python function that does the job:

from selenium import webdriver
from bs4 import BeautifulSoup
import time

def get_spotify_playlist_data(playlist_url):
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")  # Run without opening a window
    driver = webdriver.Chrome(options=options)

    driver.get(playlist_url)
    time.sleep(5)  # Allow page to load

    # Scroll to bottom to load all tracks
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(3)

    html = driver.page_source
    driver.quit()

    soup = BeautifulSoup(html, "lxml")
    tracks = []

    for track in soup.find_all(class_="IjYxRc5luMiDPhKhZVUH UpiE7J6vPrJIa59qxts4"):
        name = track.find(class_="e-9541-text encore-text-body-medium encore-internal-color-text-base btE2c3IKaOXZ4VNAb8WQ standalone-ellipsis-one-line").text
        artist = track.find(class_="e-9541-text encore-text-body-small").find('a').text
        duration = track.find(class_="e-9541-text encore-text-body-small encore-internal-color-text-subdued l5CmSxiQaap8rWOOpEpk").text
        tracks.append({"track title": name, "artist": artist, "duration": duration})

    return tracks

Put It to Work

Replace with your playlist URL:

playlist_url = "https://open.spotify.com/album/7aJuG4TFXa2hmE4z1yxc3n?si=W7c1b1nNR3C7akuySGq_7g"
data = get_spotify_playlist_data(playlist_url)

for track in data:
    print(track)

Watch your console fill up with neatly scraped track info. Simple and effective.

Get Access to Spotify’s Official API

The API is the cleaner, legal, and more robust way to get data — if you have the right credentials.

Step 1: Register your app
Head to the Spotify Developer Dashboard. Create your app. Copy your Client ID and Client Secret.

Step 2: Get your access token

import requests
import base64

CLIENT_ID = "your_client_id"
CLIENT_SECRET = "your_client_secret"

credentials = f"{CLIENT_ID}:{CLIENT_SECRET}"
encoded_credentials = base64.b64encode(credentials.encode()).decode()

url = "https://accounts.spotify.com/api/token"
headers = {
    "Authorization": f"Basic {encoded_credentials}",
    "Content-Type": "application/x-www-form-urlencoded"
}
data = {"grant_type": "client_credentials"}

response = requests.post(url, headers=headers, data=data)
token = response.json().get("access_token")

print("Access Token:", token)

Step 3: Use the token

artist_id = "6qqNVTkY8uBg9cP3Jd7DAH"
url = f"https://api.spotify.com/v1/artists/{artist_id}"

headers = {"Authorization": f"Bearer {token}"}
response = requests.get(url, headers=headers)
artist_data = response.json()
print(artist_data)

Save Data for Later

Don’t lose your scraped gold. Save it as JSON:

import json

with open('tracks.json', 'w', encoding='utf-8') as f:
    json.dump(data, f, ensure_ascii=False, indent=4)
print("Data saved to tracks.json")

Respect the Rules While Scraping

Prefer the official Spotify API for compliance and stability.
If scraping, check robots.txt to respect site rules.
Slow down your requests. Avoid overwhelming servers.
Proxy usage can help avoid blocks but use ethically.

Final Thoughts

Combining Python’s scraping power with Spotify’s API opens the door to rich music data. Whether you’re building analytics dashboards, apps, or just geeking out on playlists, these tools make your workflow smooth and scalable.

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!