Scraping the UK charts – a 2025 guide
There are guides elsewhere on the web, but they don’t seem to be working at the moment, so I’ve had a go at extracting the information myself. I’m using Python and have downloaded the beautifulsoup library to help with the scraping.
The first thing I did was to write a program which downloaded all of the charts. You can find the charts for any date by going to https://www.officialcharts.com/charts/singles-chart/yyyymmdd, so that’s what we’re going to start downloading first.
import requests from bs4 import BeautifulSoup import os import datetime def download_file(url): local_filename = f"{url.strip('/').split('/')[-1]}" if not os.path.exists(local_filename): with requests.get(url, stream=True) as r: r.raise_for_status() with open(local_filename, 'wb') as f: for chunk in r.iter_content(chunk_size=8192): if chunk: f.write(chunk) # f.flush() return local_filename start = datetime.date(1952,11,14) for week in range(0,20): date = start + datetime.timedelta(week*7) print(date) page = f"https://www.officialcharts.com/charts/singles-chart/{date.strftime('%Y%m%d')}" download_file(page)
Here I’ve started at 14 November 1952, the date of the first chart, and I’m downloading 20 weeks (you can see the two lines with these variables on them). It’s just to test it’s working. It is working for me, so I’m going to change the dates and download the rest of them.
Note that the chart data changed occasionally, so we need to change the script and run it a few times. (I could put in a few extra lines to accommodate this, but I preferred to keep an eye on the output in case of a problem).
From 14 November 1952 the charts were calcaulated from 00.01 on Friday to midnight on Thursday
On 10 March 1960 they changed from Thursday to Wednesday
On 5 July 1967 they changed from Wednesdayto Tuesday
On 3 August 1969 they changed from Sunday to Saturday
On 10 July 2015 they changed back to Friday to Thursday
So the chart dates we want to collect are:
14-11-52 to 04-03-60
10-03-60 to 29-06-67
05-07-67 to 30-07-69
03-08-69 to 05-07-15
10-07-15 to date
So now I have 3,769 files from the UK charts website. So far, so good. I’ve put these in a sub-folder called /raw/charts.
My next script will go through these one by one, looking at all the data in the tags. I’ve found a pattern where each chart has “Number 1”, “Number 2” etc in these tags, followed 3 “spans” later by the title, and another “span” later by the artist. So my script needs to look at each of these and resave them to a new .csv file.
from bs4 import BeautifulSoup
import csv
import glob
import os
# Function to process a single file and extract the required information
def process_file(file_path):
# Extract the filename without extension to use as the CSV filename
base_name = os.path.basename(file_path).split('.')[0]
output_file = f"{base_name}.csv"
# Open and read the HTML file
with open(file_path, 'r', encoding='utf-8') as file:
soup = BeautifulSoup(file, 'html.parser')
# Find all elements and extract their text
span_texts = [span.get_text() for span in soup.find_all('span')]
# List to hold the extracted information
extracted_info = []
# Iterate through the list and look for items called "Number "
for idx, text in enumerate(span_texts):
if text.startswith("Number "):
try:
number = text.split()[1]
title = span_texts[idx + 3]
artist = span_texts[idx + 4]
extracted_info.append((number, artist, title))
except IndexError:
# Handle cases where there might not be enough elements following the "Number "
continue
# Write the extracted information to a CSV file
with open(output_file, "w", encoding="utf-8", newline='') as csvfile:
csvwriter = csv.writer(csvfile)
# Write the header
csvwriter.writerow(["Number", "Artist", "Title"])
# Write the extracted information
for number, artist, title in extracted_info:
csvwriter.writerow([number, artist, title])
print(f"Data has been written to {output_file}")
# Process all HTML files in the raw/charts folder
for file_path in glob.glob("raw/charts/*"):
process_file(file_path)
So now I have 3,769 .csv files all looking like this. So far, so good. (This is the one from 2 March 1975, named 19750302.csv) [Note no 7 – if an artist or title has a comma in it, the script saves it in the .csv in quote marks. I’m not sure what happens if an artist or a title has both quote marks and commas in it though …)
Number,Artist,Title 1,TELLY SAVALAS,IF 2,STEVE HARLEY AND COCKNEY REBEL,MAKE ME SMILE (COME UP AND SEE ME) 3,MUD,THE SECRETS THAT YOU KEEP 4,FOX,ONLY YOU CAN 5,FRANKIE VALLI,MY EYES ADORED YOU 6,THE CARPENTERS,PLEASE MR POSTMAN 7,SHIRLEY AND COMPANY,"SHAME, SHAME, SHAME" 8,BAY CITY ROLLERS,BYE BYE BABY 9,THE AVERAGE WHITE BAND,PICK UP THE PIECES 10,WIGAN'S CHOSEN FEW,FOOTSEE 11,JOHNNY MATHIS,I'M STONE IN LOVE WITH YOU 12,DANA,PLEASE TELL HIM THAT I SAID HELLO 13,SUPERTRAMP,DREAMER 14,LOVE UNLIMITED,IT MAY BE WINTER OUTSIDE (BUT IN MY HEART IT'S SPRING) 15,SLADE,HOW DOES IT FEEL? 16,ALVIN STARDUST,GOOD LOVE CAN NEVER DIE 17,SYREETA,YOUR KISS IS SWEET 18,DAVID BOWIE,YOUNG AMERICANS 19,SHOWADDYWADDY,SWEET MUSIC 20,HELEN REDDY,ANGIE BABY 21,BARRY MANILOW,MANDY 22,HAMILTON BOHANNON,SOUTH AFRICAN MAN 23,JOHN LENNON,NO 9 DREAM 24,PILOT,JANUARY 25,MAC AND KATIE KISSOON,SUGAR CANDY KISSES 26,GUYS AND DOLLS,THERE'S A WHOLE LOT OF LOVING 27,ARROWS,MY LAST NIGHT WITH YOU 28,JOHNNY WAKELIN AND THE KINSHASA BAND,BLACK SUPERMAN (MUHAMMAD ALI) 29,ELTON JOHN BAND,PHILADELPHIA FREEDOM 30,BARRY WHITE,WHAT AM I GONNA DO WITH YOU 31,RUBETTES,I CAN DO IT 32,GLITTER BAND,GOODBYE MY LOVE 33,THE OSMONDS,HAVING A PARTY 34,THE STYLISTICS,STAR ON A TV SHOW 35,MOMENTS AND WHATNAUTS,GIRLS 36,KENNY,FANCY PANTS 37,THE DRIFTERS,LOVE GAMES 38,QUEEN,NOW I'M HERE 39,GARY LEWIS AND THE PLAYBOYS,MY HEART'S SYMPHONY 40,KENNY,THE BUMP 41,RUPIE EDWARDS,LEGO SKANGA 42,BETTY WRIGHT,SHOORAH SHOORAH 43,JOHN HOLT,HELP ME MAKE IT THROUGH THE NIGHT 44,DEAN PARRISH,I'M ON MY WAY 45,GLORIA GAYNOR,REACH OUT I'LL BE THERE 46,BACHMAN-TURNER OVERDRIVE,ROLL ON DOWN THE HIGHWAY 47,DUANE EDDY,PLAY ME LIKE YOU PLAY YOUR GUITAR 48,THE SHADOWS,LET ME BE THE ONE 49,ELVIS PRESLEY,PROMISED LAND 50,SUZI QUATRO,YOUR MAMA WON'T LIKE ME
Thursday 23 January 2025, 30 views
Previous post: How to post text and images to Bluesky using PHP Programming index
- Scraping the UK charts – a 2025 guide
- How to post text and images to Bluesky using PHP
- General PHP code snippets
- Date and time utilities in PHP
- How to get rid of “episodes and shows” from Spotify:
- Example of character counter using Bootstrap and jQuery
- How to create a simple Twitter bot
- Finding the first and last day of a month in PHP
- Javascript countdown timer
- Creating a WordPress theme from scratch
Leave a Reply