HOME ABOUT SITEMAP BLOG LOGIN

Scraping the UK charts – a 2025 guide

There are guides elsewhere on the web, but they don’t seem to be working at the moment, so I’ve had a go at extracting the information myself. I’m using Python and have downloaded the beautifulsoup library to help with the scraping.

The first thing I did was to write a program which downloaded all of the charts. You can find the charts for any date by going to https://www.officialcharts.com/charts/singles-chart/yyyymmdd, so that’s what we’re going to start downloading first.

import requests
from bs4 import BeautifulSoup
import os
import datetime

def download_file(url):
    local_filename = f"{url.strip('/').split('/')[-1]}"
    if not os.path.exists(local_filename):
        with requests.get(url, stream=True) as r:
            r.raise_for_status()
            with open(local_filename, 'wb') as f:
                for chunk in r.iter_content(chunk_size=8192):
                    if chunk:
                        f.write(chunk)
                        # f.flush()
    return local_filename

start = datetime.date(1952,11,14)

for week in range(0,20):
    date = start + datetime.timedelta(week*7)
    print(date)
    page = f"https://www.officialcharts.com/charts/singles-chart/{date.strftime('%Y%m%d')}"
    download_file(page)

Here I’ve started at 14 November 1952, the date of the first chart, and I’m downloading 20 weeks (you can see the two lines with these variables on them). It’s just to test it’s working. It is working for me, so I’m going to change the dates and download the rest of them.

Note that the chart data changed occasionally, so we need to change the script and run it a few times. (I could put in a few extra lines to accommodate this, but I preferred to keep an eye on the output in case of a problem).

From 14 November 1952 the charts were calcaulated from 00.01 on Friday to midnight on Thursday
On 10 March 1960 they changed from Thursday to Wednesday
On 5 July 1967 they changed from Wednesdayto Tuesday
On 3 August 1969 they changed from Sunday to Saturday
On 10 July 2015 they changed back to Friday to Thursday

So the chart dates we want to collect are:
14-11-52 to 04-03-60
10-03-60 to 29-06-67
05-07-67 to 30-07-69
03-08-69 to 05-07-15
10-07-15 to date

So now I have 3,769 files from the UK charts website. So far, so good. I’ve put these in a sub-folder called /raw/charts.

My next script will go through these one by one, looking at all the data in the tags. I’ve found a pattern where each chart has “Number 1”, “Number 2” etc in these tags, followed 3 “spans” later by the title, and another “span” later by the artist. So my script needs to look at each of these and resave them to a new .csv file.

from bs4 import BeautifulSoup
import csv
import glob
import os

# Function to process a single file and extract the required information
def process_file(file_path):
    # Extract the filename without extension to use as the CSV filename
    base_name = os.path.basename(file_path).split('.')[0]
    output_file = f"{base_name}.csv"

    # Open and read the HTML file
    with open(file_path, 'r', encoding='utf-8') as file:
        soup = BeautifulSoup(file, 'html.parser')

    # Find all  elements and extract their text
    span_texts = [span.get_text() for span in soup.find_all('span')]

    # List to hold the extracted information
    extracted_info = []

    # Iterate through the list and look for items called "Number "
    for idx, text in enumerate(span_texts):
        if text.startswith("Number "):
            try:
                number = text.split()[1]
                title = span_texts[idx + 3]
                artist = span_texts[idx + 4]
                extracted_info.append((number, artist, title))
            except IndexError:
                # Handle cases where there might not be enough elements following the "Number "
                continue

    # Write the extracted information to a CSV file
    with open(output_file, "w", encoding="utf-8", newline='') as csvfile:
        csvwriter = csv.writer(csvfile)
        # Write the header
        csvwriter.writerow(["Number", "Artist", "Title"])

        # Write the extracted information
        for number, artist, title in extracted_info:
            csvwriter.writerow([number, artist, title])

    print(f"Data has been written to {output_file}")

# Process all HTML files in the raw/charts folder
for file_path in glob.glob("raw/charts/*"):
    process_file(file_path)

So now I have 3,769 .csv files all looking like this. So far, so good. (This is the one from 2 March 1975, named 19750302.csv) [Note no 7 – if an artist or title has a comma in it, the script saves it in the .csv in quote marks. I’m not sure what happens if an artist or a title has both quote marks and commas in it though …)

Number,Artist,Title
1,TELLY SAVALAS,IF
2,STEVE HARLEY AND COCKNEY REBEL,MAKE ME SMILE (COME UP AND SEE ME)
3,MUD,THE SECRETS THAT YOU KEEP
4,FOX,ONLY YOU CAN
5,FRANKIE VALLI,MY EYES ADORED YOU
6,THE CARPENTERS,PLEASE MR POSTMAN
7,SHIRLEY AND COMPANY,"SHAME, SHAME, SHAME"
8,BAY CITY ROLLERS,BYE BYE BABY
9,THE AVERAGE WHITE BAND,PICK UP THE PIECES
10,WIGAN'S CHOSEN FEW,FOOTSEE
11,JOHNNY MATHIS,I'M STONE IN LOVE WITH YOU
12,DANA,PLEASE TELL HIM THAT I SAID HELLO
13,SUPERTRAMP,DREAMER
14,LOVE UNLIMITED,IT MAY BE WINTER OUTSIDE (BUT IN MY HEART IT'S SPRING)
15,SLADE,HOW DOES IT FEEL?
16,ALVIN STARDUST,GOOD LOVE CAN NEVER DIE
17,SYREETA,YOUR KISS IS SWEET
18,DAVID BOWIE,YOUNG AMERICANS
19,SHOWADDYWADDY,SWEET MUSIC
20,HELEN REDDY,ANGIE BABY
21,BARRY MANILOW,MANDY
22,HAMILTON BOHANNON,SOUTH AFRICAN MAN
23,JOHN LENNON,NO 9 DREAM
24,PILOT,JANUARY
25,MAC AND KATIE KISSOON,SUGAR CANDY KISSES
26,GUYS AND DOLLS,THERE'S A WHOLE LOT OF LOVING
27,ARROWS,MY LAST NIGHT WITH YOU
28,JOHNNY WAKELIN AND THE KINSHASA BAND,BLACK SUPERMAN (MUHAMMAD ALI)
29,ELTON JOHN BAND,PHILADELPHIA FREEDOM
30,BARRY WHITE,WHAT AM I GONNA DO WITH YOU
31,RUBETTES,I CAN DO IT
32,GLITTER BAND,GOODBYE MY LOVE
33,THE OSMONDS,HAVING A PARTY
34,THE STYLISTICS,STAR ON A TV SHOW
35,MOMENTS AND WHATNAUTS,GIRLS
36,KENNY,FANCY PANTS
37,THE DRIFTERS,LOVE GAMES
38,QUEEN,NOW I'M HERE
39,GARY LEWIS AND THE PLAYBOYS,MY HEART'S SYMPHONY
40,KENNY,THE BUMP
41,RUPIE EDWARDS,LEGO SKANGA
42,BETTY WRIGHT,SHOORAH SHOORAH
43,JOHN HOLT,HELP ME MAKE IT THROUGH THE NIGHT
44,DEAN PARRISH,I'M ON MY WAY
45,GLORIA GAYNOR,REACH OUT I'LL BE THERE
46,BACHMAN-TURNER OVERDRIVE,ROLL ON DOWN THE HIGHWAY
47,DUANE EDDY,PLAY ME LIKE YOU PLAY YOUR GUITAR
48,THE SHADOWS,LET ME BE THE ONE
49,ELVIS PRESLEY,PROMISED LAND
50,SUZI QUATRO,YOUR MAMA WON'T LIKE ME
Thursday 23 January 2025, 30 views


Leave a Reply

Your email address will not be published. Required fields are marked *