Renaming or deleting a replicated RDS DB for SQL Server

If you ever tried to either rename or delete a replicated SQL Server RDS database, you would have noticed AWS does not allow us to perform the tasks.

The error we get when trying to rename a database is:

Renaming RDS SQL Server database error

User does not have permission to alter database 'XYZ', the database does not exist, or the database is not in a state that allows access checks. (5011)

The fix is running a Stored Procedure which AWS gives us access to. The SQL to execute database rename is:

EXEC rdsadmin.dbo.rds_modify_db_name N'XYZ', N'NEW_XYZ'
GO

A similar error is thrown when we try to drop a database:

Deleting RDS SQL Server database error

The database 'XYZ' is enabled for database mirroring. Database mirroring must be removed before you drop the database. (3743)

The issue we encounter this time is that using RDS we do not have access to enable or disable mirroring for a specific database, in a Multi-AZ RDS deployment all databases are always synchronised.

Luckily AWS puts at our disposal another Stored Procedure which will allow us to perform the deletion task.

EXECUTE msdb.dbo.rds_drop_database  N'XYZ'
GO

An error while trying to drop a database will not be seen if the database has not yet been fully replicated.

AWS Color Palette

If you even wondered what is the palette AWS is using in the official diagrams and presentations, below you will find my first attempt of publishing an unofficial color pallete. Unfortunatelly I could not find an official one, if you ever find one please do let me know.

Both the dark and the light palettes are using 4 Background Tones with 2 Foreground Tones each, complemented by 8 Accent Colors. Pretty simple but with a consistent and fascinating effect.

First things first, the dark background palette:

And probably the most used one, the light background palette:

I added the RGB values as well if you ever want to recreate it in OmniGraffle or Visio for you own needs. This is Generic RGB.

I do not yet know the reasons why these specific colors where chosen; I do have the feeling they started with Red 700 as a base Monotone color and worked from there, but this is just a feeling for now.

PS: Please do not ask for an Azure one because I already looked into it and everything is just a mesmerising rainbow there for now.

Deleting Lambda@Edge functions

If you ever tried to delete a Lambda@Edge function, you probably noticed that if the function would have been previously deployed to AWS edge locations, you would encounter an error and would not be able to delete it. Which is pretty frustrating, right?

You would get an error similar to:

Error - Deleting Lambda Edge

An error occurred when deleting this version: Lambda was unable to delete arn:aws:lambda:us-east-1:__ACCOUNT__>:function:__FUNCTION_NAME__:1 because it is a replicated function. Please see our documentation for Deleting Lambda@Edge Functions and Replicas.

To de able to delete the function, follow the steps below:

  1. For every version of the deployed function, delete the triggers of that specific function.
  2. Wait several hours until AWS will automatically delete all deployed replicas of that specific function.
  3. Once all replicas are automatically deleted, try again to delete the Lambda function. You should succeed.

This feature became publically available in March 2018. Before, it was completly impossible to get rid of Lambda@Edge functions, even after the Cloudfront triggers would have been removed.

Clean-up your AWS account

If you did have such issues in the past, now it is a good time to start cleaning up your AWS account of all the unused functions which might still be there.

I hope you found this post informative.

AWS - S3 restricted access using OAI

A typical setup for the Cloudfront CDN distribution would be to configure an S3 bucket as a public website and configure the S3 pucket as the distribution origin for Cloudfront.

The side effect of such a configuration is that your website can now be accessed both using the Cloudfront domain name (www.example.com) or using the public website generated by AWS for the bucket on which website hosting is enabled (http://BUCKET-NAME.s3-website-us-east-1.amazonaws.com/).

AWS offers the posibility for Cloudfront to serve a website stored in a private S3 bucket, as long as access to read the files from the specific bucket is given.

Please follow the steps below to accomplish this configuration.

In S3, it is recommended to Disable Static website hosting and Block all public access into the private bucket.

Disable static website hosting

In S3, select the bucket where we are hosting the static file which will be server by Cloudfront. Make sure at the end, Access field mentions “Bucket and objects not public”.

Once you selected the bucket, click on bucket properties.

Inside the properties tab, make sure Static website hosting is disabled. Edit the settings and disabled static website hosting if needed.

Save settings to make sure static website hosting is disabled.

Block all public access

To block all public access to the private bucket, select Permissions tab.

In the Permissions tab, click on Block public access tab.

Set settings as needed and save the configuration.

Not that the S3 configuration is done, let’s move to the Cloudfront configuration.

First we will need to create an Origin access identity. This will be done under main Cloudfront settings, under the Security settings.

Click on Create Origin Access Identity:

Give it a name and save the newly create OAI.

At the time of writting AWS had a limit of 100 different Origin Access Identities for Cloudfront, but because it can be reused between multiple distributions, in reality a single identity is all we might need.

Now that the new Origin Access Identity is created, it needs to be assigned to the Cloudfront distribution which requires access into the private S3 bucket. To do that, select the Cloudfront distribution:

Under the Origins and Origin Groups tab, select the distribution we want to edit and click Edit.

If restrict Bucket Access setting is not present, make sure you empty the Origin Domain Name field and choose a new bucket from the list, even if it is the same bucket as before. Select Restrict Bucket Access, use the existing Identity which we created earlier, allow Cloudfront to Update Bucket Policy and save.

Cloudfront should create a bucket policy similar to the one below on your behalf and update the bucket settings.

{
    "Version": "2008-10-17",
    "Id": "PolicyForCloudFrontPrivateContent",
    "Statement": [
        {
            "Sid": "1",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity <ID>"
            },
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::<BUCKET>/*"
        }
    ]
}
Poorly defined link anchors

A big difference between hosting an S3 bucket as an website and restricting the access as explain in this post is that now /index.html must always be used in internal links. If an internal link will have an anchor of /about/ for example instead of /about/index.html, Cloudfront will reply with a 404 Error. Setting OAI as a Security measure can break an existing website which was poorly written and did not take into account fully defined anchors. Test before production, as always.

I hope you enjoyed this short tutorial and do let me know in the comments if you ever broke a production website by implementing this security measure.

AWS EBS RAID configurations

First things first, EBS is the AWS service providing Elastic Block Storage for EC2 instances. As announced by AWS in 2017, it has an SLA of at least 99.99%, which means that probably will fail at a rate of 1 in every 10.000+ EBS volumes per year. It can also be backed up “on-the-fly” using the snapshot feature which will create an EBS snapshot.

But before going any further, I want to explain two important concepts which are used in architecture when talking about backup and recovery strategies: Recovery Point Objective and Recovery Time Objective.

RPO means how old the data can be, once the system has recovered.

RTO means how long will it take for the service to be back up after a failure.

Defining RPO and RTO

Before Architecting a system, RPO and RTO are clearly defined by the business owners. These values are a hard requirement and are not decided during the design phase.

Recovery Point Objective can be achieved ONLY when:

(RTO + Backup Frequency) < RPO
Common mistake

Less experienced designers believe that if the backup frequency is let's say 6 hours, they can comply with an RPO of 6 hours. Unfortunately, it takes times to restore a failed volume, and how long it will take should be provided by the operational teams who are actually responsible for the restoration. Please note that it is important also how many snapshots are kept, as the last few ones taken can be corrupted as well, being taken from a corrupted volume.

In a typical production scenario, RTO is provided by the Operational teams based on what they feel comfortable with. It can be affected by the types of shifts they have, how is 24/7 support handled, internal procedures, etc.

Now, coming back to the AWS EBS concept, we notice that if a RAID0 configuration is used, on-the-fly snapshots of EBS volumes become useless, because data on the volumes will not be aligned and the RAID will not be able to recover from multiple non-aligned volumes. To be able to make snapshots of a RAID0 configuration, the applications requiring write access to the disk should be stopped, all data flushed to disk, and only then the snapshots should be taken.

Hard rules for using RAID0 (stripped mode):

  1. Use RAID0 when RPO is undefined because you will have to disable volume snapshots. Or use some other more complex backup mechanism.
  2. Use RAID0 when you want to achieve performance higher than what AWS can provide. Basically, if you need more than 32.000 IOPS or 16TB per volume. Nitro based instances are capable of 64.000 IOPS.

Hard rules for using RAID1 (mirrored mode):

  1. Use RAID1 if your SLA has to be higher than 99.99% which is the one AWS offers.
  2. Having multiple EBS volumes in the same Availability Zone has no guarantee that only a maximum of one EBS volume will fail at a time. AWS does not permit yet to attach an EBS volume across availability zones.

What about RAID10?

RAID10 will have a mix of constraints from both RAID0 and RAID1. Volume snapshots become useless; if RPO is a requirement, another mechanism for backups will need to be created. It will not be the cheapest option.

Volumes using XFS as the journaling file system have the advantage of xfs_freeze which basically will freeze all access to a mounted volume. Used in combination with EBS snapshots could be used to achieve stripped RAID levels where the volumes will not be corrupted in case of a recovery. Again, the application will have to be tolerant to a temporary freeze of the storage access.

What about other RAID configurations like RAID 5 or RAID 6?

Amazon does not recommend these variations as the usable IOPS of the created volume will drop by 20-30% performance wise. This is because some of the IOPS are used for parity checks. Cost becomes prohibitive, a better ROI can be achieved with off the shelf EBS volumes or a combination of RAID0 and/or RAID1 volumes.

AWS Reference Architecture 1 - IAM with Review Control

As large companies want to control the security of their cloud environment as much as possible for their mission-critical applications, there is a need for a well-architectured review control where no one user has the power anymore of submitting IAM (Identity and Access Management) changes without a well-defined process.

The diagram below shows one option of implementing a full review process for IAM changes, which can include for example a Cloud SME, an End to End Architect and a Security Architect.

AWS Reference Architecture #1

Detailed workflow during production as below:

  1. Users will submit via a Git Push a JSON file with the IAM changes which they would like to be accepted. This change will be submitted into a separate branch and will not be part of the master branch until approved and cherry-picked.

  2. Power Users, which have the approval rights, will review the submitted change and either propose changes or approved from their own perspective.

  3. Once all approvals are in place, the change can be cherry-picked and merged into the master branch.

  4. While Gerrit supports an Active-Active scenario where both servers act as a master and are capable of processing requests, the backend database which Gerrit supports (either MySQL or PostgreSQL) will be deployed in a Master-Slave scenario.

  5. One-way replication of the Master Database will happen into the Slave Database. With a Multi-AZ deployment feature of AWS, automatic failover will be in place in case the Master Database will go down, the Slave Database automatically being promoted to a master if the sync link will be lost.

  6. Once the Git Push is cherry-picked into the master branch, Gerrit supports the mirroring of the master branch into a remote repository, which in this scenario is hosted in AWS using the CodeCommit functionality. Users do not require access into the CodeCommit repository, they require access only for submitting changes into Gerrit.

  7. Using a Git Webhook, once a new commit is present into the master branch, a Lambda function will run the JSON file and activate the required access which the user requested for.

Minimum requirement!

Users will require basic knowledge of how Git and Gerrit systems work from a User perspective.

For security reasons, the architecture is using two different VPCs so that a Security Group is in place between the two tiers of the solution.

For high-availability, the architecture is spanning two different Availability Zones, with no single point of failure.