SQL Azure Archives - Thomas LaRock

SQL Plan Warnings

Thomas LaRock — Tue, 24 Mar 2020 17:39:07 +0000

There are many methods available for optimizing the performance of SQL Server. One method in particular is examining your plan cache, looking for query plan warnings. Plan warnings include implicit conversions, key or RID lookups, and missing indexes to name a few. Each of these warnings is the optimizer giving you the opportunity to take action and improve performance. Unfortunately, these plan warnings are buried inside the plan cache, and not many people want to spend time mining their plan cache. That sounds like work.

That’s why last year our company (SolarWinds) launched a free tool called SQL Plan Warnings. Often mining the plan cache involves custom scripts and forcing you to work with text output only. We wanted to make things easier by providing a graphical interface. A GUI will allow for the user to have basic application functionality. Things like connecting to more than one instance at a time, or filtering results with a few clicks.

Let me give a quick tour of SQL Plan Warnings.

Connect to an instance

The first thing noteworthy here is how SQL Plan Warnings supports connecting to a variety of flavors of SQL Server. There’s the Earthed version, Azure QL Database, Azure SQL Database Manage Instance, and Amazon RDS for SQL Server as shown here:

From there you fill in your connection details. The login you choose will need either the VIEW SERVER STATE or SELECT permission for the following DMVs: dm_exec_query_stats, dm_exec_sql_text, and dm_exec_text_query_plan. I’ve provided links to the Microsoft docs for each, so you can review the permissions defined there.

Being able to easily connect to instance of SQL Server, no matter where they are located, is a must-have these days.

SQL Plan Warnings Settings

After you connect to your instance, SQL Plan Warnings will return the top 100 plans, with a default sort by CPU time. However, it is possible after connecting you may see no results. This is likely due to the default settings for SQL Plan Warnings. You get to the settings by clicking on the gear icon in the upper-right corner. Here is what the default settings look like:

If you are not seeing any results, change the default settings and refresh plan analysis. For me, I simply made the change to filter by executions, with 1 as the minimum. This returns a lot of noise, so you need to discover what makes the most sense for your particular instance.

Please note these default settings apply to each connected instance. Think of these settings as the highest level filter for all your connected sources. It may be possible you spend time adjusting these settings frequently, depending on the instance, the workload, and your query tuning goals.

Reviewing the SQL Plan Warnings Results

After plan analysis is complete, you will see a list of warnings found. It should look like this:

Note that a plan can have multiple warnings. So this list could be generated by one or more plans found.

From here we are able to filter on a specific warning type with a simple click. This allows us to narrow our focus. Perhaps today we want to focus on Key and RID lookups. We select that filter, then open the plan:

From here we can zoom and scroll, and view the node that has the lookup warning:

If we select the node a properties dialogue that opens to the right. We also see other warnings are included in this plan, if we want or need to investigate those at this time. We also have the ability to download the plan, if desired.

Summary

The SQL Plan Warnings tool is easy to use and allows for you to be proactive in optimizing your environment. The use of a GUI allows for quick filtering at the plan cache level as well as plan warnings themselves. This allows you to focus on the plan warnings with the most impact.

One thing to note is the size of the plan cache you choose to analyze. Instances with larger plan cache sizes (1GB or greater) may require a larger number of plans to parse for warnings.

You can download the SQL Plan Warnings tool here.

The post SQL Plan Warnings appeared first on Thomas LaRock.

Use SQLMap to Connect Directly to Azure SQL Database

Thomas LaRock — Thu, 12 Mar 2020 00:58:44 +0000

I’ve written before about using sqlmap to perform sql injection testing against a website. It is also possible to use sqlmap to connect directly against a database. In this post I will show you how to use sqlmap to connect directly to Azure SQL Database. Once connected you can enumerate objects, open a shell, or run custom SQL injection scripts.

The sqlmap documentation is good, but not perfect. For example, if you go looking for details and examples on how to direct connect to a database you will find the following:

There is no example given for SQL Server, so I assume ‘mssql’ is the correct choice for DBMS. A quick test against my Contoso Clinic website database had me trying the following code (you will need to put it correct login, password, and server host names should you try to replicate my scenraios):

c:\python38\python.exe .\sqlmap.py --batch --flush-session -d "mssql://login:password@dbserver.database.windows.net:1433/Clinic"

This resulted in an error:

[CRITICAL] SQLAlchemy connection issue ('InterfaceError: (pyodbc.InterfaceError) ('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')')

At first I focused my attention on the driver, thinking that my Surface laptop was not configured properly. I had just rebuilt the machine a few weeks ago, so it was reasonable to think something was amiss. However, it soon dawned on me that my attention should focus on SQLAlchemy, as that was being used by sqlmap to create the connection. So I decided that I would start running some tests using SQLAlchemy.

Use SQLAlchemy to Connect Directly to Azure SQL Database

Here’s the Python script I used as a first test:

import sqlalchemy as sa 

engine = sa.create_engine('mssql+pymssql://login:password@dbserver.database.windows.net:1433/Clinic')

connection = engine.connect()
result = connection.execute("select username from users")
for row in result:
    print("username:", row['username'])
connection.close()

This script threw the same error message, so I considered that to be a sign of progress. Now I set about researching how to connect to Azure SQL Database using SQLAlchemy. A few Google results later and I arrived at the following syntax as allowing for a successful connection:

"mssql+pymssql://login@dbserver:password@dbserver.database.windows.net:1433/Clinic"

I needed to add the @dbserver to the end of the login, and I needed to assign a default driver. Here I chose to use pymssql. This syntax allows me to connect SQLAlchemy to an Azure SQL Database. Now that I was able to make a connection from my laptop, I went back to sqlmap.

Use SQLMap to Connect Directly to Azure SQL Database

The first thing I tried was the following:

c:\python38\python.exe .\sqlmap.py --batch --flush-session -d "mssql+pymssql://login@dbserver:password@dbserver.database.windows.net:1433/Clinic"

This resulted in the following error:

[CRITICAL] invalid target details, valid syntax is for instance 'mysql://USER:PASSWORD@DBMS_IP:DBMS_PORT/DATABASE_NAME' or 'access://DATABASE_FILEPATH'

Again, I consider this to be a sign of progress. It is a different error message, here sqlmap is clearly telling me there is a syntax error. Since I made two changes to the string, I decided to remove one and see if that works. My next test was the following:

c:\python38\python.exe .\sqlmap.py --batch --flush-session -d "mssql://login@dbserver:password@dbserver.database.windows.net:1433/Clinic"

Success! We are able to create a connection:

[INFO] connection to Microsoft SQL Server server 'dbserver.database.windows.net:1433' established

Summary

Connecting to Azure SQL Database with sqlmap is easy, just remember the login@dbserver format. From there you can enumerate objects, open a shell, or run custom SQL injection scripts. This flexibility makes sqlmap a great tool to use for penetration testing. I also use sqlmap to test alerts configured with Advanced Threat Protection.

The post Use SQLMap to Connect Directly to Azure SQL Database appeared first on Thomas LaRock.

Reviewing the GigaOM SQL Transactional Processing Price-Performance Testing

Thomas LaRock — Wed, 30 Oct 2019 19:47:35 +0000

Earlier this month Microsoft and GigaOM announced a new benchmark study comparing AWS RDS to Azure SQL Database. This study was authored by the same people that wrote the previous GigaOM data warehouse benchmark last year. I enjoyed the data warehouse study. I found it to be fair and thorough enough to help the reader understand how to conduct their own benchmark testing. I was eager to read the new SQL Transactional Processing Price-Performance Testing study.

I found this latest effort to be a good start, but it fell short of the effort the authors put forth in their previous benchmark for data warehousing.

Before I go any further, I want to thank the authors for putting together their results. I recognize that these are humans, working hard, putting forth their best efforts at being fair and thorough. Comparing cloud services is not an easy task. I found this latest effort to be good, but not great. If they were students of mine I would grade this latest paper from them a solid B-.

Let’s break it down.

The Good Stuff

First, the good stuff. I love how they drove everything towards a formula, price/performance, where performance is tracked in transactions per second. The downside to price/performance is that not every workload is focused on transactions per second. Still, I’d like to see this formula adopted as a standard way of comparing services.

In the past I’ve focused only on total price as shown by the online pricing calculators. This is because (1) you aren’t supposed to publish benchmarks without permission from the company (Microsoft, AWS) and (2) I can’t bankroll this level of test AND maintain my scotch and bacon addictions. By using price/performance you level the playing field somewhat. A service may cost more, but if it runs your query in half the time, the cost may be worth it.

I also liked the choice of using TPC-E as their test, I believe that to be a fair way to compare how the services will handle a workload. And I liked how they explained the difficulties in comparing services and the associated hardware. That’s something I’ve written about previously. Many times, really.

It is frustrating to compare the data services being offered between Azure and AWS. Part of me thinks this is done on purpose by both companies in an effort to win our favor without giving away more information than is necessary. This is a common practice, and I’m not bashing either company for doing what has been done for centuries. I’m here to help others figure out how to make the right choice for their needs. At the end of the day, I believe both Amazon and Microsoft want the same thing: happy customers.

But it is not in their best interest to make it easy for anyone to compare costs. This is how utilities operate. Make no mistake, AWS and Azure are the new electric company.

Now, for the items that I didn’t like as much. I’ll capture the quote from the article and explain my concern.

The Not As Good Stuff

“There are no exact matches in processors or memory.” – This is a bit of nitpicking, but I took issue here with the use of the word “or”. As someone who charges (and receives) top dollar for performing technical reviews of books, it bugged me. The authors are correct in saying that it is hard to find exact matches. However, I can certainly find a match for vCPU, but not for memory. Azure publishes memory as weird increments, starting at 10.2 GB while AWS shows traditional increments of 8, 16, etc. So, yeah, it’s a nitpick. But it was this exact item what caught my eye and made me dig deeper to fact check everything. Warrants mentioning.

“Thus, R4 seemed a suitable instance class to use for AWS RDS.” – The authors explain why they chose R4 (memory optimized instance) versus the M4 (general purpose). I have no issue with this except that neither M5 or R5 was considered. This study just came out, why were those instances not considered? And since the authors went out of their way to tell us what AWS says about the R4, let me tell you what AWS also says about the R5:

“R5 instances deliver 5% additional memory per vCPU than R4 and the largest size provides 768 GiB of memory. In addition, R5 instances deliver a 10% price per GiB improvement and a ~20% increased CPU performance over R4.”

I can’t think of any reason why the authors chose R4 here. But let’s move past this, because now is time for the hard part: finding a suitable match for Azure SQL Database.

“On the Azure side, we expect customers to gravitate towards SQL Database Business Critical (BC) offerings…” – Well, Azure doesn’t offer a memory optimized version of SQL Database, so I guess using BC is fine. But the question I have now is why not consider using Managed Instance? In the data warehouse benchmark study they tried a variety of sizes against the workload. This study focused ONLY on one size machine. This is part of the reason they got a B-, they weren’t thorough enough for my liking. I’d send them back and tell them to run more tests against different sized machines and include Managed Instance. At the very least they could have made an effort to simply use general purpose, it would have been closer to an apples-to-apples comparison.

“Therefore, we chose the BC_Gen5_80 instance, which has more CPUs than R4.16xlarge, but less memory at 408 GB.” – Yes, finding an exact match is difficult. Here’s a breakdown of what they chose:

But this image shows AWS at 64,000 provisioned IOPS, and further in the study they say they tested against 32,000 provisioned IOPS. So, which is it? I’ve no idea. Neither do you. But I do know that provisioning 32,000 IOPS added about $6k to the monthly bill.

“…the monthly cost of Microsoft Azure comes to $40,043.71. The monthly cost for AWS comes to $65,239.43.” – Verified, I can get the same prices using the AWS and Azure calculators. But the small detail that is glossed over here is single versus multi-zone. The AWS calculator is clear, if you deploy multi-zone, the price doubles. The Azure calculator doesn’t have this option, it only exists when you create your SQL Database. I’d be shocked to find out that deploying multi-zone in Azure didn’t bump the price as well. But the chart above clearly states “in a single availability zone”. So, which is it?

I’ve no idea. Neither do you.

Summary

Some quick math tells me that if we drop the multi-zone from AWS RDS the price/performance result comes in at $1,269.85, slightly cheaper than the $1,410.04 for SQL Database. And this is why I like price/performance as a metric. A database service may have a slightly higher price, but offers greater throughput.

This was the exact conclusion from the data warehouse study, too. The cost for Azure SQL Data Warehouse was just a tad more than AWS Redshift, but the performance with Azure was better. I wanted to see a similar conclusion in this study.

Instead, we have a report with a handful of inaccuracies. Perhaps in an effort to rush to publish ahead of Ignite, they simply used a wrong graph, or missed doing one final round of edits. When you are doing this work it is easy to have such things fall through the cracks.

I’d love to see this study republished without these errors. I’d also love for AWS and Azure to find a way to make it easier to compare costs and services.

REFERENCES:

Azure vs. AWS Data Services Comparison
Updated Data Services Comparison: AWS vs. Azure
Azure vs AWS Analytics and Big Data Services Comparison
Updated Analytics and Big Data Comparison: AWS vs. Azure
Azure SQL Data Warehouse Costs vs AWS Redshift
Azure pricing calculator
AWS pricing calculator
Amazon EC2 Instance Types
Sizes for Windows virtual machines in Azure
Azure SQL Database pricing
Data Warehouse in the Cloud Benchmark
SQL Transactional Processing Price-Performance Testing

The post Reviewing the GigaOM SQL Transactional Processing Price-Performance Testing appeared first on Thomas LaRock.

No, You Don’t Need a Blockchain

Thomas LaRock — Thu, 01 Nov 2018 19:11:57 +0000

The hype around blockchain technology is reaching a fever pitch these days. Visit any tech conference and you’ll find more than a handful of vendors offering blockchain. This includes Microsoft, IBM, and AWS. Each of those companies offers a public blockchain as a service.

Blockchain is also the driving force behind cryptocurrencies, allowing Bitcoin owners to purchase drugs on the internet without the hassle of showing their identity. So, if that sounds like you, then yes, you should consider using blockchain. A private one, too.

Or, if you’re running a large logistics company with one or more supply chains made up of many different vendors, and need to identify, track, trace, or source the items in the supply chain, then blockchain may be the solution for you as well.

Not every company has such needs. In fact, there’s a good chance you are being persuaded to use blockchain as a solution to a current logistics problem. It wouldn’t be the first time someone has tried to sell you a piece of technology software you don’t need.

Before we can answer the question if you need a blockchain, let’s take a step back and make certain we understand blockchain technology, what it solves, and the issues involved.

What is a blockchain?

The simplest explanation is a blockchain serves as a ledger. This ledger is a long series of transactions. And it uses cryptography to verify each transaction in the chain. Put another way, think of a very long sequence of small files. Each file based upon a hash value of the previous file, combined with new bits of data, and the answer to a math problem.

Put another way, blockchain is a database—one that is never backed up, grows forever, and takes minutes or hours to update a record. Sounds amazing!

What does blockchain solve?

Proponents of blockchain believe it solves the issue of data validation and trust. For systems needing to verify transactions between two parties, you would consider blockchain. Supply chain logistics is one problem people believe solved by blockchain technology. Food sourcing and traceability are good examples.

Other examples include Walmart requiring food suppliers to use a blockchain provided by IBM starting in 2019. Another is Albert Heijn using blockchain technology along with the use of QR codes to solve issues with orange juice. Don’t get me started on the use of QR codes; we can save it for a future post.

The problem with blockchain

Blockchain should make your system more trustworthy, but it does the opposite.

Blockchain pushes the burden of trust onto individuals adding transactions to the blockchain. This is how all distributed systems work. The burden of trust goes from a central entity to all participants. And this is the inherent problem with blockchain.

[Warrants mentioning – many cryptocurrencies rely on trusted third parties to handle payouts. So, they use blockchain to generate coins, but don’t use blockchain to handle payouts. Because of the issues involved around trust. Let that sink in for a moment.]

Here’s another issue with blockchain: data entry. In 2006, Walmart launched a system to help track bananas and mangoes from field to store, only to abandon the system a few years later. The reason? Because it was difficult to get everyone to enter their data. Even when data is entered, blockchain will not do anything to validate that the data is correct. Blockchain will validate the transaction took place but does nothing to validate the actions of the entities involved. For example, a farmer could spray pesticides on oranges but still call it organic. It’s no different than how I refuse to put my correct cell phone number into any form on the internet.

In other words, blockchain, like any other database, is only as good as the data entered. Each point in the ledger is a point of failure. Your orange, or your ground beef, may be locally sourced, but that doesn’t mean it’s safe. Blockchain could show the point of contamination, but it won’t stop it from happening.

Do you need a blockchain?

Maybe. All we need to do is ask ourselves a few questions.

Do you need a [new] database? If you need a new database, then you might need a blockchain. If an existing database or database technology would solve your issue, then no, you don’t.

Let’s assume you need a database. The next question: Do you have multiple entities needing to update the database? If no, then you don’t need a blockchain.

OK, let’s assume we need a new database and we have many entities needing to write to the database. Are all the entities involved known, and trust each other? If the answer is yes, then you don’t need a blockchain. If the entities have a third party everyone can trust, then you also don’t need a blockchain. A blockchain should remove the use of a third party.

OK, let’s assume we know we need a database, with multiple entities updating it, all trusting each other. The final question: Do you need this database distributed in a peer-to-peer network? If the answer is no, then you don’t need a blockchain.

If you have different answers, then a private or public blockchain may be the right solution for you.

Summary

No, you don’t need a blockchain.

Unless you do need one, but that’s not likely.

And it won’t solve basic issues of data validation and trust between entities. If we can trust each other, then we would be able to trust a central clearinghouse, too.

Don’t buy a blockchain solution unless you know for certain you need one.

[This article first appeared on Orange Matter. Head over there and check out the great content.]

The post No, You Don’t Need a Blockchain appeared first on Thomas LaRock.

The Future Isn’t In Databases, But In the Data

Thomas LaRock — Mon, 04 Jun 2018 14:53:32 +0000

In the past year, you may have heard me mention my certificates from the Microsoft Professional Program. One certificate was in Data Science, the other in Big Data. I’m currently working on a third certificate, this one in Artificial Intelligence.

You might be wondering why a database guy would be spending so much time on data science, analytics, and AI. Well, I’ll tell you.

The future isn’t in databases, but in the data.

Let me explain why.

Databases Are Cheap and Plentiful

Take a look at the latest DB-Engines rankings. You will find there are 343 distinct database systems listed, 138 of those are relational databases. And I’m not sure it is a complete list, either. But it should help make my point: you have no idea which one of 343 database systems is the right one. It could be none of them. It could be all of them.

Sure, you can narrow the list of options by looking at categories. You may know you want a relational, or a key-value pair, or even a graph database. Each category will have multiple options, and it will be up to you to decide which one is the right one.

Decisions are made to go with whatever is easiest. And “easiest” doesn’t always mean “best.” It just means you’ve made a decision allowing the project to move forward.

Here’s the fact I want you to understand: Data doesn’t care where or how it is stored. Neither do the people curating the data. Nobody ever stops and says “wait, I can’t use that, it’s stored in JSON.” If they want (or need) the data, they will take it, no matter what format it is stored in to start.

And the people curating the data don’t care about endless debates on MAXDOP and NUMA and page splits. They just want their processing to work.

And then there is this #hardtruth – It’s often easier to throw hardware at a problem than to talk to the DBA.

Technology Trends Over the Past Ten Years

Here’s a handful of technology trends over the past ten years. These trends are the main technology drivers for the rise of data analytics during this timeframe.

Business Intelligence software – The ability to analyze and report on data has become easier with each passing year. The Undisputed King of all business analytics, Excel, is still going strong. Tableau shows no signs of slowing down. PowerBI has burst onto the scene in just the past few years. Data analytics is embedded into just about everything. You can even run R and Python through SQL Server.

Real-time analytics – Software such as Hadoop, Spark, and Kafka allow for real-time analytic processing. This has allowed companies to gather quality insights into data at a faster rate than ever before. What used to take weeks or months can now be done in minutes.

Data-driven decisions – Companies can use real-time analytics and enhanced BI reporting to build a data-driven culture. We can move away from “hey, I think I’m right, and I found data to prove me right” to a world of “hey, the data says we should make a change, so let’s make the change and not worry about who was right or wrong.” In other words, we can remove the human factor from decision making, and let the data help guide our decisions instead.

Cloud computing – It’s easy to leverage cloud providers such as Microsoft Azure and Amazon Web Services to allocate hardware resources for our data analytic needs. Data warehousing can be achieved on a global scale, with low latency and massive computing power. What once cost millions of dollars to implement can be done for a few hundred dollars and some PowerShell scripts.

Technology Trends Over the Next Ten Years

Now, let’s look at a handful of current trends. These trends will affect the data industry for the next ten years.

Predictive analytics – Artificial intelligence, machine learning, and deep learning are just starting to become mainstream. AWS is releasing DeepLens this year. Azure Machine Learning makes it easy to deploy predictive web services. Azure Machine Learning Workbench lets you build your own facial recognition program in just a few clicks. It’s never been easier to develop and deploy predictive analytic solutions.

DBA as a service – Every company making database software (Microsoft, AWS, Google, Oracle, etc.) is actively building automation for common DBA tasks. Performance tuning and monitoring, disaster recovery, high availability, low latency, auto-scaling based upon historical workloads, the lists go on. The current DBA role, where lonely people work in a basement rebuilding indexes, is ending, one page at a time.

Serverless functions – Serverless functions are also hip these days. Services such as IFTTT make it easy for a user to configure an automated response to whatever trigger they define. Azure Functions and AWS Lambda are where the hipster programmers hang out, building automated processes to help administrators do more with less.

More chatbots – We are starting to see a rise in the number of chatbots available. It won’t be long before you are having a conversation with a chatbot playing the role of a DBA. The only way you’ll know it is a chatbot and not a DBA is because it will be a pleasant conversation for a change. Chatbots are going to put a conversation on top of the automation of the systems underneath. As new people enter the workforce, interaction with chatbots will be seen as the norm.

Summary

There is a dearth of people able to analyze data today.

Data analytics is the biggest growth opportunity I see for the next ten years. The industry needs people to help collect, curate, and analyze data.

We also need people to build data visualizations. Something more than an unreadable pie chart. But I will save that rant for a different post.

We are always going to need an administrator to help keep the lights on. But as time goes on we will need fewer administrators. This is why I’m advocating a shift for data professionals to start learning more about data analytics.

Well, I’m not just advocating it, I’m doing it.

The post The Future Isn’t In Databases, But In the Data appeared first on Thomas LaRock.

Azure Cosmos DB Pricing Compared to DynamoDB and NeptuneDB

Thomas LaRock — Thu, 10 May 2018 20:29:15 +0000

This week at the Microsoft Build conference a new provisioning option for Cosmos DB was announced. The new option, to provision throughput for a set of containers, is a wonderful new feature. However, this meant I needed to take some time to understand Azure Cosmos DB pricing compared to DynamoDB and NeptuneDB.

This new provisioning feature for Cosmos DB offers more granularity than previously. Now, we are allowed to provision a set of containers, say with 50,000 RU/s to be shared. Then, you can create collections to have a piece of the 50,000 instead of needed to create new Cosmos DB for your applications that have different throughput needs.

The 50,000 number isn’t something I pulled out of thin air. It is the minimum number you are allowed to use. A quick look at the pricing calculator show us the cost, at a minimum, for this new feature:

At $3k a month, this new provisioning option seems expensive. The minimum for CosmosDB is 400 RU/s, and that’s only about $30/month to get started. This 50k minimum and $3k/month costs have been discussed a bit online, mostly by people complaining that the cost is too much. My first thought to seeing such complaints was “this isn’t the right solution for you”, followed by “how much would a similar offering from AWS cost?”

That’s what I want to do here today. Let’s break this down.

Pricing Specifications

We need to set the stage for our comparison. Here’s the reference I will use as a starting point:

That table shows us some examples of throughput capacity. It breaks down item size along with reads and writes. This shows us the total number of RU/s needed.

The fifth line is the one we will use as our base. We are going to assume an item size of 64kb, and a 5:1 ratio of reads to writes. At a 50,000 RU/s minimum for the new provisioning option, that implies we should have about 2500 reads and 500 writes per second. We will also use 100GB as our storage requirement.

Lastly, we will split the workload in Cosmos DB to be 50-50 between graph and non-graph. The reason for this is because AWS doesn’t have an all-in-one service like Cosmos DB. We will need to compare to AWS DynamoDB and AWS NeptuneDB.

Amazon DynamoDB Pricing

I am going to configure the AWS monthly calculator for 50Gb of storage, and we will assume that 40GB is egress, and 10Gb is ingress. We will set the item size to be 64k, the reads/sec to 1250, and the writes/sec to 250. The Cosmos DB numbers were set to a consistency state of “session”, which is in the middle between strong and eventual. Since AWS doesn’t offer this level, I will go with eventual consistency, which lowers the price overall:

So, that would cost over $9k, and wouldn’t include our graph database needs. So, let’s get that number next.

Amazon NeptuneDB Pricing

The AWS monthly calculator does not have NeptuneDB as an option yet. I am guessing this is because NeptuneDB is still in preview. So we need to do this by hand.

From the Neptune pricing page, it the billing involves the size of the instance, storage, I/O, and data transfer. For our purposes, we will use the lower end instance (db.r3.large). We will use 50GB for storage. We will use 40GB egress and 10GB ingress. For the requests, we need to do some math. We need to calculate the total number of requests per month in Cosmos DB. We need to do this in order to convert into the pricing metric for NeptuneDB. Half the Cosmos DB workload would be 25k RU/s, and that works out to be (25000 * 720 * 3600) 64,800,000,000 requests a month. NeptuneDB charges $0.20 per million, so that works out to be (64800000000 / 1000000 * $0.20) = $12,960.

So, the total for NeptuneDB would be:

– The db.r3.large instance is $252/month
– The 50 Gb storage is $5/month
– The 64,800 million requests are $12,960/month
– The data transfer rate is $3.60/month

That’s a total of $13220.60, just for NeptuneDB. And that makes the AWS offering(s) a total of $22636.43/month.

Just a tad more expensive than the CosmosDB monthly price.

Summary

The new provisioning option for Cosmos DB allows for greater flexibility in how to manage workloads for your specific containers. Previously you would have had to provision new Cosmos DB instances in order to meet your RU/s requirements. Allowing for customers to group a set of containers and share throughput is a wonderful new feature.

But this feature has a minimum requirement of 50,000 RU/s, which has a price tag of $3k/month. That is likely going to price some people out for now. I can see a scenario where Microsoft reduces the 50k minimum. They’ve reduced costs before, doing so again would not be unprecedented.

And when you compare the throughput evenly across AWS DynamoDB and NeptuneDB, $3k is a bargain. This becomes more apparent when you consider additional things such as performance, recovery, availability, and multi-master.

The post Azure Cosmos DB Pricing Compared to DynamoDB and NeptuneDB appeared first on Thomas LaRock.

Azure vs. AWS Data Services Comparison

Thomas LaRock — Tue, 20 Mar 2018 20:34:32 +0000

Both Microsoft Azure and Amazon Web Services offer a lot of data services. So many services that it can be hard to comprehend how the compare without a scorecard. So, that’s what I did here, I put together a quick image to help you make sense of all the offerings current available (as of March, 2018). Essentially, I wanted to build a cheatsheet for Azure vs. AWS data services comparison purposes.

It is my hope that this post will be a starting guide for you when you need to research these services. I have included relevant links for each service, along with some commentary, in the text of this post below. I’ve done my best to align the services, but there is some overlap between offerings. Some offerings, like data warehousing and cache, are easy to discern.

OK, let’s break these down into groups. I’m not going to do a feature comparison here because these systems evolve so quickly I’d spend all day updating the info. Instead, you get links to the documentation for everything and you can do your own comparisons as needed.

Relational

Azure offerings: SQL Database, Database for MySQL, Database for PostgreSQL

AWS offerings: RDS, Aurora

RDS is an umbrella term, as it is six engines in total, and it includes Amazon Aurora, MySQL, MariaDB, Oracle, Microsoft SQL Server, and PostgreSQL. I’ve listed Aurora as a distinct offering because it is the high-end service dedicated to MySQL and PostgreSQL. Since Azure also offers those distinct services it made sense to break Aurora out from RDS. (Or, to put it another way, if I didn’t call out Aurora here you’d finish this post and say ‘what about Aurora’, and now you don’t have to ask that question.)

NoSQL – Key/Value

Azure offerings: Cosmos DB, Table Storage

AWS offerings: DynamoDB, SimpleDB

Cosmos DB is the major NoSQL player for Azure, as it does everything (key/value, document, graph) except relational.

NoSQL – Document

Azure offerings: Cosmos DB

AWS offerings: DynamoDB

Azure used to offer DocumentDB, but that platform was sunset when Cosmos DB came alive.

NoSQL – Graph

Azure offerings: Cosmos DB

AWS offerings: Neptune

As of March 2018, Neptune is in Preview, so the documentation is likely to change in the coming weeks (well, that’s my assumption, because Neptune has been in Preview since November.)

Data Warehouse

Azure offerings: SQL Data Warehouse

AWS offerings: Redshift

It feels like these two services have been around forever. That’s because, in internet years, they have. Redshift goes back to 2012, and SQL DW goes back to 2009. That’s a lot of time for both Azure and AWS to learn about data warehousing as a service.

Cache

Azure offerings: Redis Cache

AWS offerings: ElastiCache

Both of these services are built upon Redis, so the real question here is if you want to use Redis-as-a-service from a 3rd party provider as opposed to just using it Redis itself.

Pricing

Azure Pricing calculator: https://azure.microsoft.com/en-us/pricing/calculator/

AWS Pricing Calculator: https://calculator.s3.amazonaws.com/index.html

The pricing calculators give you the best understanding of capacity. You could spend days trying to figure out the resource limits for each service listed on this page. But if you start with the calculator you get an idea of the most important thing, the cost of the service. Here’s an example of what I mean. Let’s look at something that should be an easy comparison: SQL Data Warehouse versus Redshift. I will compare a 100% utilized instance for each.

Here is the pricing summary for Azure SQL Data Warehouse, optimized for capacity, and with storage of 10 TB:

The calculator tells me the two most important things I need to know: That I pay for storage, and for something called a DWU. So, that’s the stuff to research next.

For Redshift, we have this:

AWS seems to be charging for compute power only and not for storage. Also, this is the cost for only one node, whereas SQL Data Warehouse will use more than one node to distribute the workload. And this doesn’t help explain failovers, maintenance, disaster recovery, etc.

It can be frustrating to compare the data services being offered between Azure and AWS. Part of me thinks this is done on purpose by both companies in an effort to win our favor without giving away more information than is necessary. This is a common practice, and I’m not bashing either company for doing what has been done for centuries. I’m here to help others figure out how to make the right choice for their needs. At the end of the day, I believe both Amazon and Microsoft want the same thing: happy customers.

By starting at the pricing pages I can then dive into the specific costs, and use that as a first level comparison between the services. If you start by looking at resource limits and maximums you will spend a lot of time trying to compare apples to oranges. Just focus on costs, those resources, throughput, and DR. That should be a good start to help you determine the cost, benefit, and risk of each service.

[UPDATE: I did a quick comparison of Azure Cosmos DB costs vs DynamoDB and Neptune costs in this post. You’re welcome]

Summary

I hope you find this page useful for referencing the many data service offerings from both Microsoft Azure and Amazon Web Services. I will do my best to update this page as necessary, and offer more details and use cases as I am able.

The post Azure vs. AWS Data Services Comparison appeared first on Thomas LaRock.