SQL in iPython in VS Code

3 minute read

Published:

Yesterday, during working hours, I was studying for a data engineering certification, which I need to know in my new role. To practice anything in this course, I wanted to practice - however not on the company server. It is possible to set up a data engineering platform on cloud services such as Azure, however I do not have the money to invest in something like that, nor do I want to use the bulky Jupyter notebooks. So, I needed a solution to this problem. I needed to set a platform like databricks on my own computer. I wanted to use python in notebooks, but also be able to use SQL in the same notebook, without using Jupyter, Datalore, Collab, or other cloud or bulky software. This post describes that attempt on a MacBook.

Setting up a project in VS Code

First, lets create the project directory:

mkdir notebooks
cd notebooks

Then I installed a new project an package manager for Python:

curl -LsSf https://astral.sh/uv/install.sh | sh

…and initialise the project:

uv python list
uv init --name notebooks --vcs git --author-from git --python cpython-3.13.1-macos-x86_64-none

In this project I will be using Python and iPython in VS Code, so

uv pip install ipython ipykernel
source ./.venv/bin/activate

Now we can open this project in VS Code and run de code.

SQL in the notebooks

Next, we wanna use a database as well, locally. but without creating a database connection manually. For this we will use a plugin for iPython. Let’s install even more dependencies.

brew install postgresql libpq
brew services start postgresql@14
uv pip install ipython-sql SQLAlchemy psycopg2

Now this almost works. However, if you try to create a connection, as described below, you will get an error saying that a file is missing. So, lets find and create a softlink to that file.

which postgres
# /usr/local/bin/postgres

ls -Alh /usr/local/bin/postgres
# /usr/local/bin/postgres -> ../Cellar/postgresql@14/14.15/bin/postgres

cd /usr/local/Cellar/libpq/17.2/

find . -type f -iname "*dylib"
# ./lib/libpq.5.dylib # this is that file
# ./lib/libecpg.6.dylib
# ./lib/libpgtypes.3.dylib
# ./lib/libecpg_compat.3.dylib

ln -s /usr/local/Cellar/libpq/17.2/lib/libpq.5.dylib /usr/local/lib/libpq.5.dylib

Setting up a database

Lets set up a database

psql postgres

Lets create a user

CREATE USER etl_user;
ALTER USER etl_user WITH PASSWORD '5ome_h4rd_P4s5w0r7';

We can list the databases/schemas using \l. but we can also just create one like so:

CREATE DATABASE etl_test;
\c etl_test
CREATE TABLE cars (
    id BIGSERIAL PRIMARY KEY,
    model VARCHAR(255),
    year INT
);

And fill that database manually:

INSERT INTO cars (model, year) VALUES ('test_2', 2024);
INSERT INTO cars (model, year) VALUES ('test', 2008);
INSERT INTO cars (model, year) VALUES ('test_2', 2020);
INSERT INTO cars (model, year) VALUES ('test_3', 2024);
INSERT INTO cars (model, year) VALUES ('test_2', 2021);
INSERT INTO cars (model, year) VALUES ('test', 2019);

Working in the notebook in VS Code

We first need to load the iPython plugin, and set its table style:

%load_ext sql
%config SqlMagic.style = '_DEPRECATED_DEFAULT'

I will set the variables in an .env file:

.env
PSQL_USER=vannijnatten
PSQL_PASSWORD=postgresql
PSQL_SCHEMA=vannijnatten

Then create a connection string for the SQL plugin:

import os

user = os.getenv('PSQL_USER')
password = os.getenv('PSQL_PASSWORD')
schema = os.getenv('PSQL_SCHEMA')
connection_string = f"postgresql://{user}:{password}@localhost/{schema}"
print(f"{connection_string=}")
%sql $connection_string

And test the connection

%%sql
SELECT id, model, year FROM cars;

…or…

%%sql
SELECT
    id,
    model,
    year,
    rank() OVER w,
    sum(year) OVER w
FROM cars
WINDOW w AS (PARTITION BY model ORDER BY year)
ORDER BY model, rank;

For me, this works and is very lightweight, in my favorite IDE (and in an environment in Python, so a clean setup). If I wanted to make things easier, I would just have used Marimo with SQL and no IDE.