<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://ladco.org/wiki/index.php?action=history&amp;feed=atom&amp;title=Ramboll_Modeling_on_the_Cloud_Contract_Results</id>
	<title>Ramboll Modeling on the Cloud Contract Results - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://ladco.org/wiki/index.php?action=history&amp;feed=atom&amp;title=Ramboll_Modeling_on_the_Cloud_Contract_Results"/>
	<link rel="alternate" type="text/html" href="https://ladco.org/wiki/index.php?title=Ramboll_Modeling_on_the_Cloud_Contract_Results&amp;action=history"/>
	<updated>2026-06-13T08:53:57Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.31.16</generator>
	<entry>
		<id>https://ladco.org/wiki/index.php?title=Ramboll_Modeling_on_the_Cloud_Contract_Results&amp;diff=548&amp;oldid=prev</id>
		<title>Zac: Created page with &quot; = WRF Benchmarking = * Emulating WRF 2016 12/4/1.3 grids * Purpose for estimating costing for CPUs, RAM and Storage * CPU: 8 Cores: 5.5 day run = 4 days; 24 Cores: 3 days * R...&quot;</title>
		<link rel="alternate" type="text/html" href="https://ladco.org/wiki/index.php?title=Ramboll_Modeling_on_the_Cloud_Contract_Results&amp;diff=548&amp;oldid=prev"/>
		<updated>2021-03-26T17:18:44Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot; = WRF Benchmarking = * Emulating WRF 2016 12/4/1.3 grids * Purpose for estimating costing for CPUs, RAM and Storage * CPU: 8 Cores: 5.5 day run = 4 days; 24 Cores: 3 days * R...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&lt;br /&gt;
= WRF Benchmarking =&lt;br /&gt;
* Emulating WRF 2016 12/4/1.3 grids&lt;br /&gt;
* Purpose for estimating costing for CPUs, RAM and Storage&lt;br /&gt;
* CPU: 8 Cores: 5.5 day run = 4 days; 24 Cores: 3 days&lt;br /&gt;
* RAM: ~22 Gb RAM/run (2.5 Gb/core)&lt;br /&gt;
* Storage&lt;br /&gt;
** test netCDF4 and netCDF with no compression&lt;br /&gt;
** with compression saves a lot of space (1/3 of the output) relative to uncompressed NCF (~70% compression)&lt;br /&gt;
** need to link in the HDF and NC4 libraries with compression to downstream programs&lt;br /&gt;
** estimate about 5.8 Tb for the year, goes to 16.9 Tb without compression&lt;br /&gt;
&lt;br /&gt;
= Conceptual Approach to WRF on the Cloud =&lt;br /&gt;
* Cluster management would launch a head node and compute nodes&lt;br /&gt;
* 77 5.5 day chunks, 20 computers for 16 days (or 80 computers for 4 days)&lt;br /&gt;
* Head node running constantly &lt;br /&gt;
* Compute nodes running over the length of project&lt;br /&gt;
* Memory optimized machines performed better than compute optimized for CAMx&lt;br /&gt;
&lt;br /&gt;
= Cost Analysis =&lt;br /&gt;
* [https://www.ladco.org/wp-content/uploads/Projects/WRF-Cloud/WRF_cloud_computing_costs.pdf Analysis Spreadsheet]&lt;br /&gt;
&lt;br /&gt;
= Storage Analysis =&lt;br /&gt;
* AWS&lt;br /&gt;
** Don&amp;#039;t want to use local because it will need to be moved/migrated&lt;br /&gt;
** Put the data on a storage appliance (S3) while running, and then push off to longer term storage (Glacier)&lt;br /&gt;
** Glacier is archived and need to submit access through the console, response times listed as 1-5 minutes&lt;br /&gt;
* Azure&lt;br /&gt;
** Fast and slower lake storage for offline&lt;br /&gt;
** Managed disks for online&lt;br /&gt;
&lt;br /&gt;
= Data Transfer Analysis =&lt;br /&gt;
* estimate based on 5.8 Gb&lt;br /&gt;
* AWS&lt;br /&gt;
** Internet transfer will cost ~ $928 for 5.5 Gb&lt;br /&gt;
** Snowball 10 days to get data off a disk, costs $200 for entire WRF run (smallest was 50 Tb)&lt;br /&gt;
* Azure&lt;br /&gt;
** Online transfer&lt;br /&gt;
** Databox option (like snowball)&lt;br /&gt;
&lt;br /&gt;
= Cluster Management Tools (interface analysis) =&lt;br /&gt;
* 3-4 seemed to work best across several cloud solutions&lt;br /&gt;
* Alsys Flight (works on AWS and Azure), used to bring up 40 nodes; set up a Tor queuing system; trouble with using an AMI, need to pay for an AMI with this solution; can use Docker if we want to use containers, but Ramboll not positioned to use containers for this project&lt;br /&gt;
* CFN: slower development, but now has an AWS parallel cluster (CFN reincarnated), improved tools and built in the Python package index (can be installed with PIP); let&amp;#039;s you spin everything up from the command line and could be scripted &lt;br /&gt;
* Haven&amp;#039;t yet explored AWS Parallel Cluster/CFN in detail; similar to experience with Star Cluster; seems to be the best solution because you can use your own custom AMI; instance types are independent of the cluster management tools&lt;br /&gt;
&lt;br /&gt;
= Next Steps =&lt;br /&gt;
* LADCO to create a WRF AMI on AWS: WRF 3.9.1, netCDF4 with compression, MPICH2, PGI compiler, AMET&lt;br /&gt;
* LADCO to create a login for Ramboll in our AWS organization&lt;br /&gt;
* Ramboll to explore AWS Parallel cluster and then prototype with LADCO WRF AMI&lt;br /&gt;
* Next call 12/5 @ 3 Central&lt;br /&gt;
&lt;br /&gt;
= Ramboll Recommendations =&lt;br /&gt;
== WRF ==&lt;br /&gt;
* Use netCDF4 with compression&lt;br /&gt;
* Use 8 cores per 5.5-day segment and submit all segments for annual run to cluster at once&lt;br /&gt;
== Cloud Service ==&lt;br /&gt;
* Costs are equivalent between Azure and AWS so use AWS because of familiarity&lt;br /&gt;
* Use one memory optimized instance (EC2-r5.2xlarge, 8 cores, 64 GB RAM) for each segment&lt;br /&gt;
* Use Standard S3 storage for the lifetime of the project and migrate to Infrequent S3 or Glacier for longterm storage&lt;br /&gt;
* Use Snowball to transfer completed project to local site&lt;br /&gt;
&lt;br /&gt;
== HPC Platforms ==&lt;br /&gt;
* Use AWS ParallelCluster (formerly CfnCluster)&lt;br /&gt;
** Provides CLI-interface, allowing for linux-script automation&lt;br /&gt;
** Allows for custom AMIs&lt;br /&gt;
** Provides a variety of schedulers: sge, torque, slurm, or awsbatch&lt;br /&gt;
** Is actively being developed and enhanced&lt;br /&gt;
** Additional investigation/test of WRF/CAMx test cases needed to verify tool integrity and performance&lt;br /&gt;
* Other HPC have demonstrated issues&lt;br /&gt;
** StarCluster: Problematic auto-scaling; outdated and inactive&lt;br /&gt;
** AlcesFlight: Fee-based ability to use custom AMIs, problems with auto-scaling for large instance counts&lt;/div&gt;</summary>
		<author><name>Zac</name></author>
		
	</entry>
</feed>