How to run Apache Cassandra through Jupyter Notebook in Docker Desktop

Photo by Ian Taylor on Unsplash

As a popular NoSQL database, Apache Cassandra is introduced in Udacity Data Engineering Nanodegree. In the second project, a workspace has been created, the connection between Cassandra and Jupyter Notebook has been set by Udacity. As a student, you don’t need to do anything for the connection if you work in the workspace.

However, you might want to run it on your local computer, try to set up the connections by yourself, and be more confident about what you have learned. But it might be frustrated by searching for a solution. As Cassandra has no official installation document for Windows, it made the exploring journey time consuming and maybe be without fruits at the end, and from the other hand, there is a time limitation for the Nanodegree, and you have to move forwards.

Therefore, I write this story to summarize my experience and hope it can help you save time on the learning journey.

What is the problem

I believe that the main purpose of checking this article is to solve the following problem: when running Cassandra in Notebook as below

import cassandra
from cassandra.cluster import Cluster
try:
cluster = Cluster(['127.0.0.1']) #If you have a locally installed Apache Cassandra instance
session = cluster.connect()
except Exception as e:
print(e)

The problem happens:

('Unable to connect to any servers', {'127.0.0.1:9042': ConnectionRefusedError(10061, "Tried connecting to [('127.0.0.1', 9042)]. Last error: No connection could be made because the target machine actively refused it")})

There is no local Apache Cassandra instance being created, and we have to find the solution.

After investigating various methods, I chose to use Docker Desktop to solve this problem.

Deploying Docker Desktop

From the website of Docker, you can find the system requirements for Docker Desktop.

System Requirements

  • Windows 10 64-bit: Pro, Enterprise, or Education (Build 16299 or later).

For Windows 10 Home, see Install Docker Desktop on Windows Home.

  • Hyper-V and Containers Windows features must be enabled.
  • The following hardware prerequisites are required to successfully run Client Hyper-V on Windows 10:
  • 64 bit processor with Second Level Address Translation (SLAT)
  • 4GB system RAM
  • BIOS-level hardware virtualization support must be enabled in the BIOS settings. For more information, see Virtualization.

Windows 10 home vs Windows 10 Pro

In the beginning, my computer is with the system of Windows 10 Home. When I install Docker Desktop on Windows 10 Home, it required me to use Docker Toolbox. The most obvious difference between Home and Pro is that the Hyper-V can manage the virtual machine, which I think is necessary for the success of deploying Cassandra, so I stopped playing with Windows 10 Home and upgrade to Windows 10 Pro.

Even after Windows 10 Pro is installed, some of the features we need, e.g. Hyper-V and CPU visualization, might be disabled by default. You can check them and change them manually.

How to check whether the settings of your computer meet the requirements:

  1. Turn on Windows Features: you can reach it through: control panel->Programs

Turn Windows features on, and check whether Hyper-V and Virtual Machine Platform are checked as below

2. To check whether the CPU visualization is enabled, go to:

Task manager ->Performance. If the CPU visualization is on, it should be as:

If the CPU visualization is disabled, you have to enable it at the BIOS level. How to go to BIOS level depends on the brand of your computer.

Now Docker Desktop can be installed. You can create a Cassandra container and play it with Cassandra Query Langauge (CQL). But in order to run Cassandra through Jupyter Notebook, I found that a Docker Compose is needed(please correct me, if you can run it without Docker Compose).

Run Cassandra through Jupyter Notebook

Build docker-compose (for Cassandra and Jupyter Notebook)

  1. Create a docker-compose.yml (exact the same name)file and save it under the start path. For me, it is c:\users\user_name\docker-compose.yml
version: '3.13'services:
services:
my_cassandra:
image: cassandra:latest
jupyter:
build:
context: ./jupyter
dockerfile: Dockerfile
ports:
- 8888:8888
volumes:
- ./notebooks:/home/jovyan/work

version: the Docker version you have installed

my_Cassandra: this is the Cluster name which will be used connecting to Jupyter Notebook

context: ./jupyter: indicate the directory containing a Dockerfile

volumes:
— ./notebooks:/home/jovyan/work: with this setting, the notebook can be saved to your local folder. Without the volumes, you will find that the notebook can’t be saved locally.

2. The Dockerfile and requirement files should be organized as below to be in line with the path in docker-compose.yml file :

The Dockerfile:

FROM jupyter/datascience-notebook
COPY ./requirements.txt /tmp/
RUN pip install --requirement /tmp/requirements.txt

The requirements.txt file:

cassandra-driver==3.24.0
boto3==1.15.6
ipython-sql==0.4.0

If it is too difficult to understand, please check the structure of the folder from my Github .

How to run it

  1. Use the command below at Command-Line to create composed containers, which will include Cassandra and Jupyter Notebook
docker-compose up

In Command-Line, it looks like:

In Docker Desktop, it will look like:

cassandra_1 is another Cassandra cluster created later with the same composed file but a different cluster name, both ‘cassandra_1’ and ‘my_cassandra’ can be connected through Jupyter Notebook. I created two just for testing purposes.

2. Leave the command window as it is, open another command line window, and run the below codes:

docker-compose logs jupyter

In the Command-Line, it will look like:

Copy the site http://127.0.0.1:8888/?token=xxxxxx to your web browser, the notebook will appear. Now try the following code again:

from cassandra.cluster import Cluster
try:
cluster = Cluster(['my_cassandra'])
session = cluster.connect()
print("Connection Established")
except Exception as e:
print("Connection Failed")

The connection is created successfully, as the picture below shows:

After the connection is created, there is no problem running the codes in that project. As the purpose of this story focuses on creating the connection, I will not explain the syntax of Cassandra here.

Hope this will help you and have fun!

passionate about data analysis and data science

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store