WRF Benchmarking

Emulating WRF 2016 12/4/1.3 grids
Purpose for estimating costing for CPUs, RAM and Storage
CPU: 8 Cores: 5.5 day run = 4 days; 24 Cores: 3 days
RAM: ~22 Gb RAM/run (2.5 Gb/core)
Storage
- test netCDF4 and netCDF with no compression
- with compression saves a lot of space (1/3 of the output) relative to uncompressed NCF (~70% compression)
- need to link in the HDF and NC4 libraries with compression to downstream programs
- estimate about 5.8 Tb for the year, goes to 16.9 Tb without compression

Conceptual Approach to WRF on the Cloud

AWS
- Don't want to use local because it will need to be moved/migrated
- Put the data on a storage appliance (S3) while running, and then push off to longer term storage (Glacier)
- Glacier is archived and need to submit access through the console, response times listed as 1-5 minutes
Azure
- Fast and slower lake storage for offline
- Managed disks for online

estimate based on 5.8 Gb
AWS
- Internet transfer will cost ~ $928 for 5.5 Gb
- Snowball 10 days to get data off a disk, costs $200 for entire WRF run (smallest was 50 Tb)
Azure
- Online transfer
- Databox option (like snowball)

3-4 seemed to work best across several cloud solutions
Alsys Flight (works on AWS and Azure), used to bring up 40 nodes; set up a Tor queuing system; trouble with using an AMI, need to pay for an AMI with this solution; can use Docker if we want to use containers, but Ramboll not positioned to use containers for this project
CFN: slower development, but now has an AWS parallel cluster (CFN reincarnated), improved tools and built in the Python package index (can be installed with PIP); let's you spin everything up from the command line and could be scripted
Haven't yet explored AWS Parallel Cluster/CFN in detail; similar to experience with Star Cluster; seems to be the best solution because you can use your own custom AMI; instance types are independent of the cluster management tools

LADCO to create a WRF AMI on AWS: WRF 3.9.1, netCDF4 with compression, MPICH2, PGI compiler, AMET
LADCO to create a login for Ramboll in our AWS organization
Ramboll to explore AWS Parallel cluster and then prototype with LADCO WRF AMI
Next call 12/5 @ 3 Central

Use netCDF4 with compression
Use 8 cores per 5.5-day segment and submit all segments for annual run to cluster at once

Costs are equivalent between Azure and AWS so use AWS because of familiarity
Use one memory optimized instance (EC2-r5.2xlarge, 8 cores, 64 GB RAM) for each segment
Use Standard S3 storage for the lifetime of the project and migrate to Infrequent S3 or Glacier for longterm storage
Use Snowball to transfer completed project to local site

Use AWS ParallelCluster (formerly CfnCluster)
- Provides CLI-interface, allowing for linux-script automation
- Allows for custom AMIs
- Provides a variety of schedulers: sge, torque, slurm, or awsbatch
- Is actively being developed and enhanced
- Additional investigation/test of WRF/CAMx test cases needed to verify tool integrity and performance
Other HPC have demonstrated issues
- StarCluster: Problematic auto-scaling; outdated and inactive
- AlcesFlight: Fee-based ability to use custom AMIs, problems with auto-scaling for large instance counts