README.md 9.38 KB
Newer Older
1
# RStudio Reference Architecture
Póra Krisztián's avatar
Póra Krisztián committed
2

3
4
[![pipeline status](https://git.sztaki.hu/science-cloud/reference-architectures/rstudio/badges/main/pipeline.svg)](https://git.sztaki.hu/science-cloud/reference-architectures/rstudio/-/commits/main) [![Latest Release](https://git.sztaki.hu/science-cloud/reference-architectures/rstudio/-/badges/release.svg)](https://git.sztaki.hu/science-cloud/reference-architectures/rstudio/-/releases)

5
6
7
[R](https://www.r-project.org/about.html) is a programming language and free software providing an environment for statistical computing and graphics. Its integrated software suite and its extensibility make it ideal for data analysis and display.

[RStudio](https://support--rstudio-com.netlify.app/products/rstudio/features/) is an open-souce IDE for the R language. It includes a variety of coding tools, a user-friendly and highly productive interface, and support for numerous different file types and interactive graphics. RStudio provides a toolset suitable for data-science on both personal and enterprise level.
Póra Krisztián's avatar
Póra Krisztián committed
8

9
Using the RStudio reference architecture, you can provision a virtual machine hosting RStudio initialized within a Docker container. The reference architecture also enables GPU usage.
Póra Krisztián's avatar
Póra Krisztián committed
10

11
You can provision this reference architecture [manually](https://git.sztaki.hu/science-cloud/reference-architectures/rstudio#manual-deployment-on-elkh-cloud) or using [Terraform and Ansible](https://git.sztaki.hu/science-cloud/reference-architectures/rstudio#prerequisites).
Póra Krisztián's avatar
Póra Krisztián committed
12

13
## Manual deployment on ELKH Cloud
Póra Krisztián's avatar
Póra Krisztián committed
14

15
16
17
18
19
Since this reference architecture consists of just a single virtual machine and docker container, it can also be deployed manually on [ELKH Cloud](https://science-cloud.hu/en) fairly easily, with the following steps:
- Navigate to **Compute > Instances**, and click Launch Instance.
- In the popup window, you have to set basic parameters such as Instance name, flavor, image, network, and key pair (optional).
- Please make sure your firewall settings permit ingress traffic on the 8787 port, as this port will be used by RStudio.
- In the **Configuration** tab, you have to insert a customization script which will run the RStudio container:
Póra Krisztián's avatar
Póra Krisztián committed
20

21
![rstudio_cloud_init](docs/pics/rstudio-cloud-init.png "Customization script for starting the RStudio container")
Administrator's avatar
Administrator committed
22
23
- You can copy the script from below (please choose appropriately):
### CPU-only ###
24
25
```
#cloud-config
Póra Krisztián's avatar
Póra Krisztián committed
26

27
runcmd:
28
  - mkdir -p /home/ubuntu/rstudio && chown -R ubuntu:ubuntu /home/ubuntu/rstudio
29
  - docker run -d --name rstudio -p 8787:8787 -v /home/ubuntu/rstudio:/home/rstudio/mount --restart=always -e RSTUDIO_PASSWORD='elkhcloud' git.sztaki.hu:5050/science-cloud/reference-architectures/rstudio/rstudio-cpu
Póra Krisztián's avatar
Póra Krisztián committed
30
```
31
### GPU ###
Póra Krisztián's avatar
Póra Krisztián committed
32
```
33
#cloud-config
Póra Krisztián's avatar
Póra Krisztián committed
34

35
runcmd:
36
  - mkdir -p /home/ubuntu/rstudio && chown -R ubuntu:ubuntu /home/ubuntu/rstudio
37
  - docker run -d --name rstudio -p 8787:8787 -v /home/ubuntu/rstudio:/home/rstudio/mount --restart=always -e RSTUDIO_PASSWORD='elkhcloud' --gpus all git.sztaki.hu:5050/science-cloud/reference-architectures/rstudio/rstudio-gpu
38
39
```
- You can log into RStudio with the username `rstudio` and the password `elkhcloud`. You can change the password by editing the value of the `RSTUDIO_PASSWORD` variable in the docker run command.
40
- After your instance was successfully created, you will find it in the **Instances** menu. You must associate a floating IP with it under **Actions** in order to access it. You can access your RStudio at the associated floating IP, on port 8787. It might take RStudio a few minutes to become available.
41
- You can mount files into the container by connecting to the RStudio server with SSH and placing them in the directory `/home/ubuntu/rstudio`. Your files will be available in the `mount` directory in the file browser of the RStudio interface.
42
- For additional details regarding RStudio, please refer to the [official documentation](https://support--rstudio-com.netlify.app/products/rstudio/features/).
Póra Krisztián's avatar
Póra Krisztián committed
43

44
## Deployment using Terraform and Ansible
Póra Krisztián's avatar
Póra Krisztián committed
45

46
### Prerequisites
Póra Krisztián's avatar
Póra Krisztián committed
47

48
49
50
- Configuring an SSH key on ELKH Cloud
- Terraform and Ansible are required
  - You can install them by running the commands below, or omit the installation and use the [RefArch Toolset Docker image](https://git.sztaki.hu/science-cloud/reference-architectures/rstudio#deployment-using-the-refarch-toolset-docker-image), which includes these tools.
Póra Krisztián's avatar
Póra Krisztián committed
51

52
Installation of Terraform in accordance with the [Official guide](https://learn.hashicorp.com/tutorials/terraform/install-cli):
Póra Krisztián's avatar
Póra Krisztián committed
53

54
55
56
57
58
59
```
sudo apt-get update && sudo apt-get install -y gnupg software-properties-common curl
curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update && sudo apt-get install terraform=1.1.4
```
Póra Krisztián's avatar
Póra Krisztián committed
60

61
Installation of Ansible in accordance with the [Official guide](https://docs.ansible.com/ansible/latest/installation_guide/intro_installation.html):
Póra Krisztián's avatar
Póra Krisztián committed
62

63
64
65
66
```
sudo apt install python3-pip
sudo python3 -m pip install ansible==5.2.0
```
Póra Krisztián's avatar
Póra Krisztián committed
67

68
### Deployment
Póra Krisztián's avatar
Póra Krisztián committed
69

70
1. Download and extract descriptor files:
Póra Krisztián's avatar
Póra Krisztián committed
71

72
73
74
75
```
wget https://git.sztaki.hu/science-cloud/reference-architectures/rstudio/-/archive/main/rstudio-main.tar.gz -O rstudio-ra.tar.gz
tar -zxvf rstudio-ra.tar.gz
```
Póra Krisztián's avatar
Póra Krisztián committed
76

77
2. Enter the directory of OpenStack descriptors:
Póra Krisztián's avatar
Póra Krisztián committed
78

79
80
81
```
cd rstudio-main/terraform_openstack
```
Póra Krisztián's avatar
Póra Krisztián committed
82

83
3. Customize OpenStack descriptors:
Póra Krisztián's avatar
Póra Krisztián committed
84

85
- An **application credential** and your **authentication URL** is needed to enable Terraform to access your resources
Póra Krisztián's avatar
Póra Krisztián committed
86

87
88
89
90
91
92
93
94
95
96
    - You can create an application credential on the OpenStack web interface under **Identity > Application Credentials**. Please note that application credentials are valid only for the project selected at the time of their creation.
    - You can find your authentication URL under **Project > API Access**, as the endpoint of the entry labeled 'Identity'
- In the `auth_data.auto.tfvars` file, authentication information must be set according to the following format:
```
auth_data = ({
    credential_id= "SET_YOUR_CREDENTIAL_ID"
    credential_secret= "SET_YOUR_CREDENTIAL_SECRET"
    auth_url = "SET_YOUR_AUTH_URL"
})
```
Póra Krisztián's avatar
Póra Krisztián committed
97

98
99
100
101
102
103
- In the `resources.auto.tfvars` file, properties of the deployment must be set:
  - In the `rstudio_server` block, the desired properties of the RStudio server node must be set. Please choose an appropiate volume size for you needs. The minimum recommended volume size is 15 GB with GPU enabled, and 5 GB with GPU disabled.
  - In the `rstudio_network` block, the name and subnet range of the network we wish to connect the RStudio server to must be set.
  - In the `user_config` block, we can perform additional customization, using variables. The following options are available:
    - `rstudio_password`: Set the password for accessing RStudio.
    - `enable_gpu`: Utilization of GPU resources in the RStudio containter. Requires NVIDIA drives and the NVIDIA Container Runtime on the virtual machines.
Póra Krisztián's avatar
Póra Krisztián committed
104

105
4. Adding the private SSH key to the SSH agent:
Póra Krisztián's avatar
Póra Krisztián committed
106

107
108
109
110
```
eval $(ssh-agent -s) && echo "$(cat PATH_TO_YOUR_KEY)" | tr -d '\r' | ssh-add -
```
- In place of `PATH_TO_YOUR_KEY`, the path to the private part of the previously configured SSH key must be set.
Póra Krisztián's avatar
Póra Krisztián committed
111

112
5. Provisioning the RStudio server:
Póra Krisztián's avatar
Póra Krisztián committed
113

114
115
116
117
```
terraform init
terraform apply --auto-approve
```
Póra Krisztián's avatar
Póra Krisztián committed
118

119
6. (Optional) Terminating the RStudio server:
Póra Krisztián's avatar
Póra Krisztián committed
120

121
122
123
```
terraform destroy
```
Póra Krisztián's avatar
Póra Krisztián committed
124

125
126
### Deployment using the RefArch Toolset Docker image
- Using the RefArch Toolset image, you can omit the installation of extra tools such as Terraform and Ansible.
Póra Krisztián's avatar
Póra Krisztián committed
127

128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
When running the container, you must mount a working directory and the private part of your configured SSH key into the container:
```
docker run -it -v PATH_TO_WORKDIR:/home/refarch -v PATH_TO_PRIVATE_KEY:/root/.ssh/id_rsa:ro git.sztaki.hu:5050/science-cloud/reference-architectures/refarch-toolset bash
```
- In place of 'PATH_TO_WORKDIR', the path to your working directory must be set.
  - This directory will contain the descriptor files.
  - You can perform the download and customization of the descriptor files (steps 1-3 of [Deployment](https://git.sztaki.hu/science-cloud/reference-architectures/rstudio#deployment)) in this working directory before running the container, or inside the container after running it.
  - The mounted working directory will be the entrypoint of the container.
- In place of 'PATH_TO_PRIVATE_KEY', the path to the file containing the private part of your configured SSH key must be set.
  - The private key file is mounted with the read only option.
  - Adding the private SSH key to the SSH agent (step 4 of [Deployment](https://git.sztaki.hu/science-cloud/reference-architectures/rstudio#deployment)) can be omitted when using the container.
- After the initial steps, you can perform provisioning and termination (steps 5, 6 of [Deployment](https://git.sztaki.hu/science-cloud/reference-architectures/rstudio#deployment)) within the container freely.

### Accessing RStudio
- RStudio will be accessible at the configured floating ip of the RStudio server, on port 8787.
143
- You can log in with the username `rstudio` and the password configured in `resources.auto.tfvars`, or `elkhcloud` by default.
144
- You can mount files into the container by connecting to the RStudio server with SSH and placing them in the directory `/home/ubuntu/rstudio`. Your files will be available in the `mount` directory in the file browser of the RStudio interface.
145
- For additional details regarding RStudio, please refer to the [official documentation](https://support--rstudio-com.netlify.app/products/rstudio/features/).