Horovod tags
https://git.sztaki.hu/science-cloud/reference-architectures/horovod/-/tags
2024-03-07T15:20:06Z
https://git.sztaki.hu/science-cloud/reference-architectures/horovod/-/tags/v0.4.0
v0.4.0
<p data-sourcepos="1:1-1:9" dir="auto"><strong data-sourcepos="1:1-1:9">Added</strong></p>
<ul data-sourcepos="3:1-6:0" dir="auto">
<li data-sourcepos="3:1-3:39">Support for HUN-REN Cloud Wigner site</li>
<li data-sourcepos="4:1-4:54">RefArch Toolset (Terraform + Ansible) install script</li>
<li data-sourcepos="5:1-6:0">Architecture diagram</li>
</ul>
<p data-sourcepos="7:1-7:11" dir="auto"><strong data-sourcepos="7:1-7:11">Changed</strong></p>
<ul data-sourcepos="9:1-11:35" dir="auto">
<li data-sourcepos="9:1-9:39">Upgrade to Prometheus version v2.48.0</li>
<li data-sourcepos="10:1-10:35">Upgrade to Grafana version 9.5.14</li>
<li data-sourcepos="11:1-11:35">Updated documentation and diagram</li>
</ul>
2024-03-07T15:20:06Z
Póra Krisztián
https://git.sztaki.hu/science-cloud/reference-architectures/horovod/-/tags/v0.3.0
v0.3.0
<p data-sourcepos="1:1-1:9" dir="auto"><strong data-sourcepos="1:1-1:9">Added</strong></p>
<ul data-sourcepos="3:1-6:0" dir="auto">
<li data-sourcepos="3:1-4:39">Customizable JupyterLab port
<ul data-sourcepos="4:3-4:39">
<li data-sourcepos="4:3-4:39">With dynamic security rule creation</li>
</ul>
</li>
<li data-sourcepos="5:1-6:0">Self-signed certificate generation</li>
</ul>
<p data-sourcepos="7:1-7:11" dir="auto"><strong data-sourcepos="7:1-7:11">Changed</strong></p>
<ul data-sourcepos="9:1-18:0" dir="auto">
<li data-sourcepos="9:1-9:36">Upgrade to Horovod version v0.28.1</li>
<li data-sourcepos="10:1-10:37">Upgrade to JupyterLab version 4.0.2</li>
<li data-sourcepos="11:1-11:37">JupyterLab is now served over HTTPS</li>
<li data-sourcepos="12:1-12:50">Ansible roles rearranged into separate directory</li>
<li data-sourcepos="13:1-14:104">Updated readme
<ul data-sourcepos="14:3-14:104">
<li data-sourcepos="14:3-14:104">Documented performance evaluation results, retention size, JupyterLab port setting and HTTPS access.</li>
</ul>
</li>
<li data-sourcepos="15:1-15:26">Updated welcome notebook</li>
<li data-sourcepos="16:1-16:45">SSH key generation is now done with OpenSSL</li>
<li data-sourcepos="17:1-18:0">JupyterLab password is now set with PasswordIdentityProvider (ServerApp.password is deprecated)</li>
</ul>
<p data-sourcepos="19:1-19:9" dir="auto"><strong data-sourcepos="19:1-19:9">Fixed</strong></p>
<ul data-sourcepos="21:1-22:0" dir="auto">
<li data-sourcepos="21:1-22:0">Query for attached disk device can now handle different storage types</li>
</ul>
<p data-sourcepos="23:1-23:12" dir="auto"><strong data-sourcepos="23:1-23:12">Versions</strong></p>
<table data-sourcepos="24:1-31:23" dir="auto">
<thead>
<tr data-sourcepos="24:1-24:21">
<th data-sourcepos="24:2-24:10">package</th>
<th data-sourcepos="24:12-24:20">version</th>
</tr>
</thead>
<tbody>
<tr data-sourcepos="26:1-26:21">
<td data-sourcepos="26:2-26:10">horovod</td>
<td data-sourcepos="26:12-26:20">0.28.1</td>
</tr>
<tr data-sourcepos="27:1-27:23">
<td data-sourcepos="27:2-27:13">tensorflow</td>
<td data-sourcepos="27:15-27:22">2.9.2</td>
</tr>
<tr data-sourcepos="28:1-28:18">
<td data-sourcepos="28:2-28:8">keras</td>
<td data-sourcepos="28:10-28:17">2.9.0</td>
</tr>
<tr data-sourcepos="29:1-29:25">
<td data-sourcepos="29:2-29:8">torch</td>
<td data-sourcepos="29:10-29:24">1.12.1+cu113</td>
</tr>
<tr data-sourcepos="30:1-30:24">
<td data-sourcepos="30:2-30:14">tensorboard</td>
<td data-sourcepos="30:16-30:23">2.9.1</td>
</tr>
<tr data-sourcepos="31:1-31:23">
<td data-sourcepos="31:2-31:13">jupyterlab</td>
<td data-sourcepos="31:15-31:22">4.0.2</td>
</tr>
</tbody>
</table>
2023-06-30T10:22:53Z
Póra Krisztián
https://git.sztaki.hu/science-cloud/reference-architectures/horovod/-/tags/v0.2.2
v0.2.2
<p data-sourcepos="1:1-1:9" dir="auto"><strong data-sourcepos="1:1-1:9">Added</strong></p>
<ul data-sourcepos="3:1-9:0" dir="auto">
<li data-sourcepos="3:1-3:25">GPU resource monitoring</li>
<li data-sourcepos="4:1-4:27">General Horovod dashboard</li>
<li data-sourcepos="5:1-5:32">Welcome notebook in JupyterLab</li>
<li data-sourcepos="6:1-6:42">Playbook based CI testing for monitoring</li>
<li data-sourcepos="7:1-7:33">Package version table in readme</li>
<li data-sourcepos="8:1-9:0">Prometheus snapshot support</li>
</ul>
<p data-sourcepos="10:1-10:11" dir="auto"><strong data-sourcepos="10:1-10:11">Changed</strong></p>
<ul data-sourcepos="12:1-21:0" dir="auto">
<li data-sourcepos="12:1-12:36">Upgrade to Horovod version v0.26.1</li>
<li data-sourcepos="13:1-13:37">Upgrade to JupyterLab version 3.5.1</li>
<li data-sourcepos="14:1-14:35">Monitoring stack version upgrades</li>
<li data-sourcepos="15:1-15:55">Optional floating IP assignment for monitoring server</li>
<li data-sourcepos="16:1-16:33">Improved cleanup solution in CI</li>
<li data-sourcepos="17:1-17:60">Prometheus data retention configuration based on resources</li>
<li data-sourcepos="18:1-18:30">Updated architecture diagram</li>
<li data-sourcepos="19:1-19:43">Updated toolkit versions in documentation</li>
<li data-sourcepos="20:1-21:0">Default shared volume size set to 128GB</li>
</ul>
<p data-sourcepos="22:1-22:9" dir="auto"><strong data-sourcepos="22:1-22:9">Fixed</strong></p>
<ul data-sourcepos="24:1-26:0" dir="auto">
<li data-sourcepos="24:1-24:34">Apt lock conflict in Docker role</li>
<li data-sourcepos="25:1-26:0">Condition based task execution in monitoring roles</li>
</ul>
<p data-sourcepos="27:1-27:12" dir="auto"><strong data-sourcepos="27:1-27:12">Versions</strong></p>
<table data-sourcepos="28:1-35:22" dir="auto">
<thead>
<tr data-sourcepos="28:1-28:21">
<th data-sourcepos="28:2-28:10">package</th>
<th data-sourcepos="28:12-28:20">version</th>
</tr>
</thead>
<tbody>
<tr data-sourcepos="30:1-30:19">
<td data-sourcepos="30:2-30:10">horovod</td>
<td data-sourcepos="30:12-30:18">0.26.1</td>
</tr>
<tr data-sourcepos="31:1-31:22">
<td data-sourcepos="31:2-31:13">tensorflow</td>
<td data-sourcepos="31:15-31:21">2.9.2</td>
</tr>
<tr data-sourcepos="32:1-32:17">
<td data-sourcepos="32:2-32:8">keras</td>
<td data-sourcepos="32:10-32:16">2.9.0</td>
</tr>
<tr data-sourcepos="33:1-33:24">
<td data-sourcepos="33:2-33:8">torch</td>
<td data-sourcepos="33:10-33:23">1.12.1+cu113</td>
</tr>
<tr data-sourcepos="34:1-34:23">
<td data-sourcepos="34:2-34:14">tensorboard</td>
<td data-sourcepos="34:16-34:22">2.9.1</td>
</tr>
<tr data-sourcepos="35:1-35:22">
<td data-sourcepos="35:2-35:13">jupyterlab</td>
<td data-sourcepos="35:15-35:21">3.5.1</td>
</tr>
</tbody>
</table>
2023-01-26T10:34:41Z
Póra Krisztián
https://git.sztaki.hu/science-cloud/reference-architectures/horovod/-/tags/v0.2.1
v0.2.1
<p data-sourcepos="1:1-1:9" dir="auto"><strong data-sourcepos="1:1-1:9">Added</strong></p>
<ul data-sourcepos="2:1-3:0" dir="auto">
<li data-sourcepos="2:1-3:0">JupyterLab and Grafana test playbooks</li>
</ul>
<p data-sourcepos="4:1-4:11" dir="auto"><strong data-sourcepos="4:1-4:11">Changed</strong></p>
<ul data-sourcepos="5:1-6:0" dir="auto">
<li data-sourcepos="5:1-6:0">Updated documentation</li>
</ul>
<p data-sourcepos="7:1-7:9" dir="auto"><strong data-sourcepos="7:1-7:9">Fixed</strong></p>
<ul data-sourcepos="8:1-8:19" dir="auto">
<li data-sourcepos="8:1-8:19">Apt lock conflict</li>
</ul>
2022-08-04T13:32:07Z
Póra Krisztián
https://git.sztaki.hu/science-cloud/reference-architectures/horovod/-/tags/v0.2
v0.2
<p data-sourcepos="1:1-1:9" dir="auto"><strong data-sourcepos="1:1-1:9">Added</strong></p>
<ul data-sourcepos="2:1-5:0" dir="auto">
<li data-sourcepos="2:1-2:71">ELKH Cloud dedicated network optionality with dynamic resource blocks</li>
<li data-sourcepos="3:1-3:37">Custom password setting for Grafana</li>
<li data-sourcepos="4:1-5:0">Option for NVIDIA Driver and NVIDIA Container Runtime install with official Ansible roles</li>
</ul>
<p data-sourcepos="6:1-6:11" dir="auto"><strong data-sourcepos="6:1-6:11">Changed</strong></p>
<ul data-sourcepos="7:1-14:0" dir="auto">
<li data-sourcepos="7:1-7:43">Security groups declared in separate file</li>
<li data-sourcepos="8:1-8:38">OpenStack auth url in auth vars file</li>
<li data-sourcepos="9:1-9:81">Default shell of JupyterLab terminal set to bash, default directory to /horovod</li>
<li data-sourcepos="10:1-10:58">Separated shared volume for neural network training data</li>
<li data-sourcepos="11:1-11:59">Separated volume for monitoring data on monitoring server</li>
<li data-sourcepos="12:1-12:34">Upgrade to Horovod version v0.25</li>
<li data-sourcepos="13:1-14:0">Upgrade to JupyterLab version 3.4.3</li>
</ul>
<p data-sourcepos="15:1-15:9" dir="auto"><strong data-sourcepos="15:1-15:9">Fixed</strong></p>
<ul data-sourcepos="16:1-19:0" dir="auto">
<li data-sourcepos="16:1-16:103">Cluster description and hostlist generation now happens sequentially to prevent accidental overwrites</li>
<li data-sourcepos="17:1-17:75">Depreceated Notebook classes replaced with Server in docker-entrypoint.sh</li>
<li data-sourcepos="18:1-19:0">Stronger default password for JupyterLab</li>
</ul>
<p data-sourcepos="20:1-20:11" dir="auto"><strong data-sourcepos="20:1-20:11">Removed</strong></p>
<ul data-sourcepos="21:1-22:0" dir="auto">
<li data-sourcepos="21:1-22:0">Floating IP access mode for monitoring server</li>
</ul>
<p data-sourcepos="23:1-23:12" dir="auto"><strong data-sourcepos="23:1-23:12">Versions</strong></p>
<table data-sourcepos="24:1-32:22" dir="auto">
<thead>
<tr data-sourcepos="24:1-24:21">
<th data-sourcepos="24:2-24:10">package</th>
<th data-sourcepos="24:12-24:20">version</th>
</tr>
</thead>
<tbody>
<tr data-sourcepos="26:1-26:19">
<td data-sourcepos="26:2-26:10">horovod</td>
<td data-sourcepos="26:12-26:18">0.25.0</td>
</tr>
<tr data-sourcepos="27:1-27:22">
<td data-sourcepos="27:2-27:13">tensorflow</td>
<td data-sourcepos="27:15-27:21">2.6.5</td>
</tr>
<tr data-sourcepos="28:1-28:17">
<td data-sourcepos="28:2-28:8">keras</td>
<td data-sourcepos="28:10-28:16">2.6.0</td>
</tr>
<tr data-sourcepos="29:1-29:23">
<td data-sourcepos="29:2-29:8">torch</td>
<td data-sourcepos="29:10-29:22">1.8.1+cu111</td>
</tr>
<tr data-sourcepos="30:1-30:23">
<td data-sourcepos="30:2-30:14">tensorboard</td>
<td data-sourcepos="30:16-30:22">2.6.0</td>
</tr>
<tr data-sourcepos="31:1-31:19">
<td data-sourcepos="31:2-31:10">ipython</td>
<td data-sourcepos="31:12-31:18">8.4.0</td>
</tr>
<tr data-sourcepos="32:1-32:22">
<td data-sourcepos="32:2-32:13">jupyterlab</td>
<td data-sourcepos="32:15-32:21">3.4.3</td>
</tr>
</tbody>
</table>
2023-01-26T10:00:39Z
Póra Krisztián
https://git.sztaki.hu/science-cloud/reference-architectures/horovod/-/tags/v0.1
v0.1
<p data-sourcepos="1:1-1:12" dir="auto"><strong data-sourcepos="1:1-1:12">Features</strong></p>
<ul data-sourcepos="2:1-11:1" dir="auto">
<li data-sourcepos="2:1-2:65">Highly customizable reference architecture for OpenStack clouds</li>
<li data-sourcepos="3:1-3:73">Automated provisioning and configuration based on Terraform and Ansible</li>
<li data-sourcepos="4:1-4:59">Automatically created network settings and firewall rules</li>
<li data-sourcepos="5:1-5:66">Dockerized Horovod cluster supporting distributed deep learning</li>
<li data-sourcepos="6:1-6:44">Accelerated deep learning with NVIDIA GPUs</li>
<li data-sourcepos="7:1-7:27">File sharing based on NFS</li>
<li data-sourcepos="8:1-8:26">JupyterLab web interface</li>
<li data-sourcepos="9:1-9:41">Prometheus and Grafana based monitoring</li>
<li data-sourcepos="10:1-11:1">Automated testing using GitLab CI/CD</li>
</ul>
<p data-sourcepos="12:1-12:12" dir="auto"><strong data-sourcepos="12:1-12:12">Versions</strong></p>
<table data-sourcepos="13:1-21:22" dir="auto">
<thead>
<tr data-sourcepos="13:1-13:21">
<th data-sourcepos="13:2-13:10">package</th>
<th data-sourcepos="13:12-13:20">version</th>
</tr>
</thead>
<tbody>
<tr data-sourcepos="15:1-15:19">
<td data-sourcepos="15:2-15:10">horovod</td>
<td data-sourcepos="15:12-15:18">0.23.0</td>
</tr>
<tr data-sourcepos="16:1-16:22">
<td data-sourcepos="16:2-16:13">tensorflow</td>
<td data-sourcepos="16:15-16:21">2.5.0</td>
</tr>
<tr data-sourcepos="17:1-17:17">
<td data-sourcepos="17:2-17:8">keras</td>
<td data-sourcepos="17:10-17:16">2.6.0</td>
</tr>
<tr data-sourcepos="18:1-18:23">
<td data-sourcepos="18:2-18:8">torch</td>
<td data-sourcepos="18:10-18:22">1.8.1+cu111</td>
</tr>
<tr data-sourcepos="19:1-19:23">
<td data-sourcepos="19:2-19:14">tensorboard</td>
<td data-sourcepos="19:16-19:22">2.6.0</td>
</tr>
<tr data-sourcepos="20:1-20:20">
<td data-sourcepos="20:2-20:10">ipython</td>
<td data-sourcepos="20:12-20:19">7.31.1</td>
</tr>
<tr data-sourcepos="21:1-21:22">
<td data-sourcepos="21:2-21:13">jupyterlab</td>
<td data-sourcepos="21:15-21:21">3.2.8</td>
</tr>
</tbody>
</table>
2022-08-05T08:56:15Z
Administrator