Contact Us!
Follow Us!

  https://vnn.species360.org/images/facebook-icon.png   https://vnn.species360.org/images/twitter_icon.png   https://vnn.species360.org/images/linkedin_icon.png  
 

 

 

ZIMS Infrastructure and Network: Business Continuity/Disaster Recovery Planning

 

Contact Species360 Support  
 

Revised 5 March 2025

 

Species360 Business Continuity/Disaster Recovery Plan

ZIMS Infrastructure and Network:  Disaster Recovery Planning

  1. Outage Mitigation - Species360 takes a proactive approach to avoid or mitigate the impact of unexpected outages for the ZIMS platform to provide an always-on, globally available species management system for our members. This includes:
    1.       Co-location in a high-availability datacenter with multiple isolated internet routers and redundant internal networks.
    2.       Multiple Network load balanced virtual front end (web) servers, which can be expanded or replaced as needed.
    3.       Fully replicated backend database architecture with less than 15-minute lag on database transactions, which can be switched in less than 5 minutes
    4.       Identified several scenarios where a data restoration may be required and have written up playbooks for the team to use for mitigation.
       
  2. Outage Detection - We do understand that all outages and disasters cannot be avoided as many are out of our control like natural disasters, construction, and denial of service attacks.  To help ensure that we can react to these types of disasters as quickly as possible, we continually monitor our services from multiple points around the globe and alert for any detected outages.  We make every effort to partner with services providers that provide the highest level of support, transparency, and communications.  We monitor the services provided and the related costs, assessing opportunities to improve and/or find more cost-effective, equivalent solutions. 
     
  3. Disaster Recovery - When disasters do strike, we have prepared ourselves with a recovery strategy to prioritize speed of recovery and limit data loss.  Depending on the type of outage or disaster, the recovery time and amount of data loss can vary.
    1.       ZIMS maintains a full database replication on a separate server that normally handles reporting tasks for performance considerations. This replication is within just a few minutes of the ZIMS production database. Timing is dependent on the amount of activity on each server.  
    2.       Full ZIMS database backups are performed once per week.  These backups are stored within the same datacenter, separate from the live servers.
    3.       Database backups are also transferred securely to an alternate Species360 location should a need arise for a cold restore of the data.
    4.       Weekly backups are kept for 3 months.
    5.       Monthly backups are kept indefinitely.
    6.        Database transactions are also backed up every 15 minutes and stored within the same datacenter, separate from the live servers.
    7.       An alternate ZIMS cold site with the needed infrastructure is available 24x7.
    8.       The database can be online from cold restore within 12 hours with a maximum data loss of 15 minutes.
    9.         Additional front-end (web) servers can be online and available within minutes.
    10.         Validation restores are executed periodically to ensure they are viable for use in disaster recovery.
    11.       Live ZIMS is in a datacenter in Illinois, US. The cold backup site is in Minnesota, US. 

 

  1. Future - We are currently evaluating our disaster recovery strategy and are in the process of updating the plans to improve multiple aspects. Each of these implementations will need to be assessed for their feasibility, value, and likely additional costs.  Member or external sponsorship for these enhancements are always welcome.  
    1.       Inclusion of a possible hot/warm alternative site.
    2.       Reducing cold restore time.
    3.       Reducing the window for data loss from a cold restore.
    4.       Monitoring system availability and performance externally, from multiple locations around the world, including locations in Europe, Asia and Australia.
    5.       Accelerating outage response times by improving outage alerts notifications to the appropriate staff via e-mail, SMS, and voice.