Difference between revisions of "WRF on the Cloud"

From LADCO Wiki
Jump to: navigation, search
Line 19: Line 19:
 
* Memory optimized machines performed better than compute optimized for CAMx
 
* Memory optimized machines performed better than compute optimized for CAMx
 
* Storage
 
* Storage
 +
** Don't want to use local because it will need to be moved/migrated
 +
** Put the data on a storage appliance (S3) while running, and then push off to longer term storage
 +
** Glacier is archived and need to submit access through the console, response times listed as 1-5 minutes
 +
* Storage (Azure)
 +
** Fast and slower lake storage for offline
 +
** Managed disks for online
 +
* Transfer (estimate based on 5.8 Gb)
 +
** Internet transfer will cost ~ $928 for 5.5 Gb
 +
** Snowball 10 days to get data off a disk, costs $200 for entire WRF run (smallest was 50 Tb)
 +
* Transfer Azure
 +
** Online transfer
 +
** Databox option (like snowball)

Revision as of 21:29, 28 November 2018

LADCO is seeking to understand the best practices for submitting and managing multiprocessor computing jobs on a cloud computing platform. In particular, LADCO would like to develop a WRF production environment that utilizes cloud-based computing. The goal of this project is to prototype a WRF production environment on a public, on-demand high performance computing service in the cloud to create a WRF platform-as-a-service (PaaS) solution. The WRF PaaS must meet the following objectives:

  • Configurable computing and storage to scale, as needed, to meet that needs of different WRF applications
  • Configurable WRF options to enable changing grids, simulation periods, physics options, and input data
  • Flexible cloud deployment from a command line interface to initiate computing clusters and spawn WRF jobs in the cloud

Call Notes

  • WRF Benchmarking (emulating WRF 2016 12/4/1.3 grids) costing for CPUs, RAM and Storage
  • CPU: 8 Cores: 5.5 day run = 4 days; 24 Cores: 3 days
  • RAM: ~22 Gb RAM/run (2.5 Gb/core)
  • Storage: test netCDF4 and netCDF no compression; with compression saves a lot of space (1/3 of the output) relative to uncompressed NCF (~70% compression); need to link in the HDF and NC4 libraries with compression to downstream programs; estimate about 5.8 Tb for the year, goes to 16.9 without compression

Costing analysis

  • Cluster management would launch a head node and compute nodes
  • 77 chunks, 20 computers for 16 days
  • Head node running constantly
  • Compute nodes running over the length of project
  • Can probably use 80 computers 4 days insteady of 20 in 16 days
  • Memory optimized machines performed better than compute optimized for CAMx
  • Storage
    • Don't want to use local because it will need to be moved/migrated
    • Put the data on a storage appliance (S3) while running, and then push off to longer term storage
    • Glacier is archived and need to submit access through the console, response times listed as 1-5 minutes
  • Storage (Azure)
    • Fast and slower lake storage for offline
    • Managed disks for online
  • Transfer (estimate based on 5.8 Gb)
    • Internet transfer will cost ~ $928 for 5.5 Gb
    • Snowball 10 days to get data off a disk, costs $200 for entire WRF run (smallest was 50 Tb)
  • Transfer Azure
    • Online transfer
    • Databox option (like snowball)