README.md 10.5 KB
Newer Older
Póra Krisztián's avatar
Póra Krisztián committed
1
2
# JupyterLab Reference Architecture

3
4
[![pipeline status](https://git.sztaki.hu/science-cloud/reference-architectures/jupyterlab/badges/main/pipeline.svg)](https://git.sztaki.hu/science-cloud/reference-architectures/jupyterlab/-/commits/main) [![Latest Release](https://git.sztaki.hu/science-cloud/reference-architectures/jupyterlab/-/badges/release.svg)](https://git.sztaki.hu/science-cloud/reference-architectures/jupyterlab/-/releases)

5
[JupyterLab](https://jupyterlab.readthedocs.io/en/3.4.x/) is a flexible development environment for notebooks, code, and data. It provides a web-based interface for data science, computing, and machine learning. Thanks to its modular design, it is also highly expandable.
Póra Krisztián's avatar
Póra Krisztián committed
6

7
Using the JupyterLab reference architecture, you can provision a virtual machine hosting a JupyterLab instance initialized within a Docker container. The reference architecture also enables optional GPU usage.
Póra Krisztián's avatar
Póra Krisztián committed
8

Administrator's avatar
Administrator committed
9
You can provision this reference architecture [manually](https://git.sztaki.hu/science-cloud/reference-architectures/jupyterlab#manual-deployment-on-elkh-cloud) or using [Terraform and Ansible](https://git.sztaki.hu/science-cloud/reference-architectures/jupyterlab#prerequisites).
10

11
## Environments ##
12

13
The JupyterLab reference architecture includes different environments that provide additional packages on top of the JupyterLab IDE. You can choose an environment during deployment. The currently available environments are the following:
14

15
- The base environment is a bare-bones JupyterLab installation. Other environments are built on top of the base option.
16
17
- [TensorFlow](https://www.tensorflow.org/overview) is an open-source platform for machine learning. Its rich and flexible tools, libraries and community resources make it optimal for modern, machine learning based application development.
- [R](https://www.r-project.org/about.html) is a programming language and free software providing an environment for statistical computing and graphics. Its integrated software suite and its extensibility make it ideal for data analysis and display.
18
19
20
21

## Manual deployment on ELKH Cloud

Since this reference architecture consists of just a single virtual machine and docker container, it can also be deployed manually on [ELKH Cloud](https://science-cloud.hu/en) fairly easily, with the following steps:
22
- Navigate to **Compute > Instances**, and click Launch Instance.
23
- In the popup window, you have to set basic parameters such as Instance name, flavor, image, network, and key pair (optional).
24
25
- Please make sure your firewall settings permit ingress traffic on the 8888 port, as this port will be used by JupyterLab.
- In the **Configuration** tab, you have to insert a customization script which will run the JupyterLab container:
26

27
![jupyterlab_cloud_init](docs/pics/jupyterlab-cloud-init.png "Customization script for starting the JupyterLab container")
28
- You can copy the script from here:
29
### CPU-only ###
30
31
32
33
```
#cloud-config

runcmd:
34
35
  - mkdir -p /home/ubuntu/jupyterlab && chown -R ubuntu:ubuntu /home/ubuntu/jupyterlab
  - docker run -d --name jupyterlab -p 8888:8888 -v /home/ubuntu/jupyterlab:/home --restart=always -e JUPYTER_PASSWORD='elkhcloud' git.sztaki.hu:5050/science-cloud/reference-architectures/jupyterlab/jupyterlab-cpu:latest
36
```
37
38
39
40
41
42
43
44
45
46
### GPU ###
```
#cloud-config

runcmd:
  - mkdir -p /home/ubuntu/jupyterlab && chown -R ubuntu:ubuntu /home/ubuntu/jupyterlab
  - docker run -d --name jupyterlab -p 8888:8888 -v /home/ubuntu/jupyterlab:/home --restart=always -e JUPYTER_PASSWORD='elkhcloud' --gpus all git.sztaki.hu:5050/science-cloud/reference-architectures/jupyterlab/jupyterlab-gpu:latest
```
- The containers will start with the base JupyterLab environment by default (latest tag). If you wish to work with one of the other [available environments](https://git.sztaki.hu/science-cloud/reference-architectures/jupyterlab#environments), please add its name to the tag of the image with lowercase letters. (e. g. jupyterlab-gpu:latest-tensorflow or jupyterlab-gpu:latest-r)
- By default, the password for accessing JupyterLab is set to `elkhcloud`. You can change the password by editing the value of the `JUPYTER_PASSWORD` variable in the docker run command.
47
- After your instance was successfully created, you will find it in the **Instances** menu. You must associate a floating IP with it under **Actions** in order to access it. You can access your JupyterLab instance at the associated floating IP, on port 8888. It might take a few minutes for JupyterLab to become available after deployment.
48
- You can mount files into the container by connecting to the JupyterLab server with SSH and placing them in the directory `/home/ubuntu/jupyterlab`. Your files will be available in the file browser of the JupyterLab interface.
49
- For additional details regarding JupyterLab, please refer to the [official documentation](https://jupyterlab.readthedocs.io/en/3.4.x/).
50

51
## Deployment using Terraform and Ansible
Póra Krisztián's avatar
Póra Krisztián committed
52
53

### Prerequisites
Póra Krisztián's avatar
Póra Krisztián committed
54

55
- Configuring an SSH key on ELKH Cloud
Póra Krisztián's avatar
Póra Krisztián committed
56
- Terraform and Ansible are required
57
  - You can install them by running the commands below, or omit the installation and use the [RefArch Toolset Docker image](https://git.sztaki.hu/science-cloud/reference-architectures/jupyterlab#deployment-using-the-refarch-toolset-docker-image), which includes these tools.
Póra Krisztián's avatar
Póra Krisztián committed
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74

Installation of Terraform in accordance with the [Official guide](https://learn.hashicorp.com/tutorials/terraform/install-cli):

```
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common curl
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update && sudo apt-get install terraform=1.1.4
```

Installation of Ansible in accordance with the [Official guide](https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html):

```
sudo apt install python3-pip
sudo python3 -m pip install ansible==5.2.0
```

75
### Deployment
Póra Krisztián's avatar
Póra Krisztián committed
76
77
78
79
80
81
82
83
84
85
86

1. Download and extract descriptor files:

```
wget https://git.sztaki.hu/science-cloud/reference-architectures/jupyterlab/-/archive/main/jupyterlab-main.tar.gz -O jupyterlab-ra.tar.gz
tar -zxvf jupyterlab-ra.tar.gz
```

2. Enter the directory of OpenStack descriptors:

```
87
cd jupyterlab-main/terraform_openstack
Póra Krisztián's avatar
Póra Krisztián committed
88
89
90
91
```

3. Customize OpenStack descriptors:

92
93
94
95
96
- An **application credential** and your **authentication URL** is needed to enable Terraform to access your resources

    - You can create an application credential on the OpenStack web interface under **Identity > Application Credentials**. Please note that application credentials are valid only for the project selected at the time of their creation.
    - You can find your authentication URL under **Project > API Access**, as the endpoint of the entry labeled 'Identity'
- In the `auth_data.auto.tfvars` file, authentication information must be set according to the following format:
Póra Krisztián's avatar
Póra Krisztián committed
97
98
99
100
```
auth_data = ({
    credential_id= "SET_YOUR_CREDENTIAL_ID"
    credential_secret= "SET_YOUR_CREDENTIAL_SECRET"
101
    auth_url = "SET_YOUR_AUTH_URL"
Póra Krisztián's avatar
Póra Krisztián committed
102
103
})
```
104

105
- In the `resources.auto.tfvars` file, properties of the deployment must be set:
106
  - In the `jupyterlab_server` block, the desired properties of the JupyterLab server node must be set. Please choose an appropiate volume size for you needs. The minimum recommended volume size is 15 GB with GPU enabled, and 5 GB with GPU disabled.
107
  - In the `jupyterlab_network` block, the name and subnet range of the network we wish to connect the JupyterLab server to must be set.
Póra Krisztián's avatar
Póra Krisztián committed
108
  - In the `user_config` block, we can perform additional customization, using variables. The following options are available:
109
    - `jupyter_password`: Set the password for accessing JupyterLab.
Póra Krisztián's avatar
Póra Krisztián committed
110
    - `enable_gpu`: Utilization of GPU resources in the JupyterLab containter. Requires NVIDIA drives and the NVIDIA Container Runtime on the virtual machines.
111
    - `environment`: Choose one of the available environments for additional packages to be installed (e. g. 'tensorflow' or 'r'). Leave empty for base environment.
Póra Krisztián's avatar
Póra Krisztián committed
112
113
114
115

4. Adding the private SSH key to the SSH agent:

```
116
eval $(ssh-agent -s) && echo "$(cat PATH_TO_YOUR_KEY)" | tr -d '\r' | ssh-add -
Póra Krisztián's avatar
Póra Krisztián committed
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
```
- In place of `PATH_TO_YOUR_KEY`, the path to the private part of the previously configured SSH key must be set.

5. Provisioning the JupyterLab server:

```
terraform init
terraform apply --auto-approve
```

6. (Optional) Terminating the JupyterLab server:

```
terraform destroy
```

133
### Deployment using the RefArch Toolset Docker image
Póra Krisztián's avatar
Póra Krisztián committed
134
135
136
137
138
139
140
141
- Using the RefArch Toolset image, you can omit the installation of extra tools such as Terraform and Ansible.

When running the container, you must mount a working directory and the private part of your configured SSH key into the container:
```
docker run -it -v PATH_TO_WORKDIR:/home/refarch -v PATH_TO_PRIVATE_KEY:/root/.ssh/id_rsa:ro git.sztaki.hu:5050/science-cloud/reference-architectures/refarch-toolset bash
```
- In place of 'PATH_TO_WORKDIR', the path to your working directory must be set.
  - This directory will contain the descriptor files.
142
  - You can perform the download and customization of the descriptor files (steps 1-3 of [Deployment](https://git.sztaki.hu/science-cloud/reference-architectures/jupyterlab#deployment)) in this working directory before running the container, or inside the container after running it.
Póra Krisztián's avatar
Póra Krisztián committed
143
144
145
  - The mounted working directory will be the entrypoint of the container.
- In place of 'PATH_TO_PRIVATE_KEY', the path to the file containing the private part of your configured SSH key must be set.
  - The private key file is mounted with the read only option.
146
147
  - Adding the private SSH key to the SSH agent (step 4 of [Deployment](https://git.sztaki.hu/science-cloud/reference-architectures/jupyterlab#deployment)) can be omitted when using the container.
- After the initial steps, you can perform provisioning and termination (steps 5, 6 of [Deployment](https://git.sztaki.hu/science-cloud/reference-architectures/jupyterlab#deployment)) within the container freely.
Póra Krisztián's avatar
Póra Krisztián committed
148

149
### Accessing the JupyterLab interface
150
- JupyterLab will be accessible at the configured floating ip of the JupyterLab server, on port 8888.
151
152
- The password will be the one configured in `resources.auto.tfvars`, or 'elkhcloud' by default.
- You can mount files into the container by connecting to the JupyterLab server with SSH and placing them in the directory `/home/ubuntu/jupyterlab`. Your files will be available in the file browser of the JupyterLab interface.
153
- For additional details regarding JupyterLab, please refer to the [official documentation](https://jupyterlab.readthedocs.io/en/3.4.x/).