What is AWS Athena | A Complete Guide about AWS Athena


What is AWS Athena?

AWS Athena is a service offered by Amazon. It enables data analysts to query data from S3 (Simple Storage Service) by using Standard SQL (Structures Query Language) syntax. AWS is a leader in cloud computing technology. The most popular service in the Amazon analytics domain is known as AWS Athena.  As Athena is a serverless query service, therefore there is no requirement for managing infrastructure or loading S3 data for analysis also. An analyst can access data via the AWS Management Console, an application programming interface. The user has to then design a scheme and start building to execute SQL queries.

What is Amazon Athena used for?

An Amazon Athena user can query the encrypted data with the help of keys that are managed by AWS Key Management Service and then encrypt query results. It is thereby an analytical tool for two-way and responsive query service which helps organisations to analyse data of Amazon S3.  It also enables cross-account access to any S3 buckets owned by other users. 

Athena users can manage data catalogues to store all the information and schemes related to searches on Amazon S3 data.  Thus, it can process unstructured, semistructured, and also structured data sets. This can be further used for research, log analysis, and also online analytical processing.

How does AWS Athena work?

AWS Athena works directly with all the data that is stored in S3.  It is used to run queries with the help of Presto which is a distributed SQL engine.  It also uses Apache Spark hive for the creation and alteration of tables and partitions. 

Before we find out how Athena works, let us understand all the prerequisites to start working on it!

A user needs to have an AWS account

  • Make sure that you have enabled your account to export the cost and sausages data into an S3 bucket.
  • The user has to prepare buckets for AWS Athena to connect.
  • Every time Athena writes to the bucket AWS creates manifest files using the metadata.
  • To simplify data, the user can use one region. I.e. US-West 2 region.
  • The last step is to download the credentials for the new IAM user. These credentials will then directly map to the database credentials to connect.

  Become a AWS Certified professional by learning this HKR AWS Training !

AWS Training

  • Master Your Craft
  • Lifetime LMS & Faculty Access
  • 24/7 online expert support
  • Real-world & Project Based Learning

1. Create Databases – To begin,  create both an Athena database and a table. Remember you need to create a table that will match CSV formats and files in the S3 billing bucket.
Few unseen errors that can be viewed are as follows 

  • Users can parse the CSV file by using the open CSV serde plugin.
  • The plugin only supports gzip files and not zip files.  You will need to convert the compression format to gzip or any of the supporting formats.
  • The plugin claims to support skip. Line. header. Count to skip header rows, it seems to be broken though. Then, you will have to rewrite the CSV files manually without the header.
  • You can run the Data Definition Language using either the AWS web console or via the product for creating databases.

2. Partitioning Data –  Users generally store their data in time series format and require to query specific data considering period time i.e day, months, and year. Without partitioning data, Athena will scan all the data without executing queries.  By partitioning the data you can restrict Athena, thus reducing time, and effort, and lowering cost. This will help to improve performance also.

3. Convert data into columnar facts –   Customers can use open sources columnar formats such as Apache ORC and Apache Parquet. It will save cost as well as improve the performance.

4. Compare the Performance – The user can compare the performance of the same query between Parquet files and text files.

AWS Athena charges the users by the amount of data that is scanned per query.  By converting the data to columnar formats, Partitioning, and compressing it, you will save cost and also get better performance.

AWS Athena Benefits

1. Serverless: As AWS Athena is serverless there is no need for infrastructure to use the tool. You can quickly query the data without the need of setting up servers or data warehouses. It allows the user to tap all the data easily.  All you need is to set up a schema and begin querying by using the inbuilt query editor.

2. Cost Effective: For using AWS Athena users have to only pay for the queries they run. They only have to shell money for the amount of data that is scanned per query.  Also, there are no additional charges for storage as the user has to perform queries directly in S3.

3. Widely Accessible: It is accessible on a larger scale for everyone from developer’s engineers to business analysts to other data specialists, as it is merely a query tool that uses normal Structure Query Language. The queries are simple & easy hence can be used by anyone.

4. Flexibility: A versatile and open architecture of AWS Athena guarantees that the user is not limited or tied up with a single provider, tool, or technology.  For instance, the user can work with the Nth number of file formats that are open source without changing the schema swap amongst query engines.

5. Secure: Amazon provides its users with services that are secured. Security is its priority. The effectiveness of which is regularly tested and verified by third-party auditors which is a part of AWS compliance programs. 

6. Integration:  AWS Athena can integrate with a variety of tools such as AWS Glue, Key Management Service, and Amazon Quicksight. For example, If a user integrates Athena with Glue then the user can access the Glue catalogue which will help him to create a metadata repository across various services.

Join our AWS Technical Training today and enhance your skills to new heights!

HKR Trainings Logo

Subscribe to our YouTube channel to get new updates..!

Amazon Athena Use case

Amazon Athena Use case

Let’s take an example. As shown in the diagram given above. It depicts a data pipeline whereby data is retrieved and then put in S3 buckets taken from many sources. As the image indicates, unprocessed data depicts that they are not transferred yet. So, the user now can connect the data to S3 using AWS Athena and start analyzing them.

There is no requirement of setting up a database or using external tools for querying the raw data, hence a very straight approach. Once you complete your research and get the desired results. You can use EMR as seen in the diagram to do data transformations and complex analytics  and return to S3 after cleaning and processing the data that is raw

The user can then use Athena for querying the processed data to analyse it further. Quick sight can also connect directly with AWS Athena and create images of the data which is stored on S3. You can also migrate the data to Amazon RedShift which is a MPP data warehouse for data analysis, then the user should use QuickSight to view the data from the Redshift.

Optimization Techniques for AWS Athena

AWS Athena has proved to be decently priced. But its billing process is not simple.  It will let you know the funds used but would be difficult to see how and when.

Following are the optimization techniques that you can use to optimise AWS Athena

Partitioning the data in S3: Partitioning the data will help to divide it into parts and keep the related data in one place based on column values such as date, region, and country. It acts as virtual columns. First, you need to define the table and then reduce the data that needs to be scanned for the query. You can restrict the query i.e., the amount of data that should be scanned based on the partition. Therefore, Athena will only scan the necessary data and charge accordingly.

Use Data Compression Techniques:  AWS Athena supports many compression formats for both readings and writing the data.  It can successfully read the data from a table that uses Parquet file format when it is compressed with snappy and others with GZIP. The same condition applies to text files, ORC, and JSON storage formats. Various compression formats that Athena supports are GZIP, LZO, SNAPPY, DEFLATE, and others.

Optimise JOIN conditions in queries: Selecting the right join order is important for better query performance.  In case you are joining two tables, mention the larger table and smaller table on the left and right side of the join respectively.  It distributes the table to worker notes which is on the right and then the left side of the table is streamed to do the join.  If the right-side table is smaller then less memory is utilised and also the query runs faster.

Top 50+ frequently asked AWS interview questions !

AWS Training

Weekday / Weekend Batches

Amazon Athena Pricing

AWS Athena is very cost-effective. The charges are calculated on the quantity of data that is scanned during query execution.  It costs around $5 per Terabyte of data. An additional advantage of using Athena is it allows users to compress data and have Columnar formats, and also routinely delete the old results sets.

Additional charges: Since Amazon  Athena reads your data that is stored in Amazon S3 there will be nominal charges for the storage of the data depending on the nature of its storage. History & results are also stored in the S3 bucket therefore, there will be charges which are very nominal for the data stored in that bucket too. 

For Instance – Let us consider a table with equal size on S3 with three columns with a size of 3 TB in total as an uncompressed file. Since the text formats are impossible to be divided, running queries to extract the data from each column of the table would require AWS Athena to scan the whole file. 

Thus the query would cost approximately  $15. Price of 3 TB ( Terabyte) of data scanned i.e. 3*5= 15 $

Pros and Cons AWS Athena

Pros:
  • The query of the data is possible for 24 X 7 without running servers.
  • It is a cost-effective method as compared to traditional databases.
  • Glue and Athena use the same data catalog. Therefore, data analysis becomes easy.
  • No need for any configuration.
  • Ad hoc to make check the data
  • It integrates easily with QuickSight. 
Cons:
  • Data optimization is not possible. It can only optimise the queries but not the underlying data.
  • All the users of AWS Athena across the globe share the same sources while running the queries.
  • It does not support data manipulation operations. Because AWS Athena provides merely a query service. It does not have a DTL interface to insert, delete, and update the data.
  • It requires data partitioning.
  • It does not have an indexing feature. 
Conclusion

AWS Athena is thus very cost-effective when compared with its competitors in the market. As it charges only when the user utilises it. Athena helps the user with the ability to use the standard SQL statements on data that is stored in S3 buckets. As seen, there are many benefits of using Amazon Athena along with a few limitations. All you need is to understand the requirements of your organisation and make a sound decision.



Source link

Leave a Reply

Subscribe to Our Newsletter

Get our latest articles delivered straight to your inbox. No spam, we promise.

Recent Reviews


What is Salesforce?

Salesforce is defined as the world’s trusted Customer Relationship Management Platform allowing the business organizations and their teams to work together under a single technology. This helps the business organizations get in contact with the customers and clients online and lead to business growth. 

Do you know what the background is under the development of Salesforce? Well, let me give you an idea of it. An organization and its teams will dive and strive hard to maintain the customers’ feasibility and accountability related to their business needs. To support such features within an organization, Customer Relationship Management Solutions have to be developed. Infact, the cost would be dollars, and it is undoubtedly a time-taking process. The solution has also to be feasible such that it is accessible to all the users at any point in time. The CRM before Salesforce was found to be difficult as well to handle. To overcome all the drawbacks with the other CRMs, Salesforce is the answer which is built as an affordable CRM software and delivering it online as a service.

Salesforce is booming nowadays as it is very simple, easy to learn and implement as well.I think many of us heard about Salesforce and interest among the people is going high. The sudden improvement and hit to Salesforce as it is a replacement to the length installation process and moving everything offline to online is vital as all the users are provided with the flexibility to do anything, anywhere with an internet connection. Cost is one of the main factors that every organization focuses on, and Salesforce is an answer for the reduced costs.

Salesforce is started as a Software as a Service (SaaS) Customer relationship management company. Salesforce is the platform for the users, and the developers developed based on the multi-tenant architecture allowing multiple users to share the current technology. Salesforce is the CRM Software that provides the users and developers to create and distribute the custom software. Salesforce enables the management of an organization’s customer interactions through different media such as communities, social media, phone calls, etc. Salesforce is capable of handling customer relationships by focusing on multiple aspects like sales, support, marketing processes. In simpler terms, Salesforce is a CRM platform that helps manage and organize customer relationships and provides the flexibility to stay connected with the customers, streamline the processes, and improve the profitability of an organization. 

Interested in learning Salesforce? Join hkr and Learn more about Salesforce CRM  with Salesforce Course!

Why is Salesforce used?

Salesforce, a platform with unique and advanced features is a way ahead these days. Did you ever think why Salesforce is used? Like any idea on why it has been so pervading and trust-worthy? There are many reasons behind its success and growth in a short duration of time. It’s not an easy task to get populated in a short time. I would say it is because of its usage, statistics, and features that bring the connections to the business team online.

Salesforce uses a building strategy that provides the fastest path to the users.
It includes the tools and frameworks specifically designed to enhance the customer relationships, bringing a high profit and growth to the organization. 
As Salesforce is in the cloud, easy accessibility everywhere via the internet is another plus which many of the other CRMs failed to deliver.
Salesforce is known for its integrity with the third party applications. It is easily affordable in a short time, unlike the other CRMs. 
Salesforce is used as it brings the companies and the customers together. It helps in engagement with the customers for developing an environment that is relevant and empathetic to the business. 
It allows selling the products and services in a smarter way to the customers and helps in organizational growth. It is used as it can empower teams to work from anywhere around the globe.

What are the benefits and advantages of using Salesforce?

Salesforce is an advanced CRM which has come up with many benefits that counted to be the best CRM platform these days. I would give a brief idea about some of the key benefits provided by Salesforce. They are listed below: 

Click here to get frequently asked Salesforce interview questions & answers

1.Easily manageable platform: 

The platform that is designed for customer engagement should be flexible and easily manageable as well. Yes, Salesforce falls under the same category, which helps in easy management. It is not specific that only the users who have IT knowledge are allowed or can manage easily. Any of them without technical knowledge can also work on Salesforce and implement the changes required on the administrator side.

2.Standard API links: 

Salesforce also allows linking of applications. One of the most significant benefits of Salesforce is that a great deal of suppliers offer the standard API links on Salesforce. For instance, if you feel that the API that you have selected does not meet your needs, then you have the option to approach the other suppliers for an alternative that can help you out. These tools like Salesforce are the trusted tools that allow linking of the applications within a short period of time.

3.Flexibility:

The Salesforce CRM is known for its high degree of adaptability. The objects available in the Salesforce can be set up as per your desire in a line without any issues. Being a user, you are not restricted to work only on one particular layout, processes, workflows or any. This would build up the ecosystem and environment that makes Salesforce more flexible than the other available systems in the market.

4.Multiple options available with different apps:

Apart from the cloud servers that we customize and design for ourselves, the Salesforce environment also provides applications that are allowed to purchase via AppExchange. Many of the systems would not allow integration of tools while the salesforce ecosystem builds up such an environment. The AppExchange feature in Salesforce allows all the different kinds of apps that are capable of supporting the different processes like finance, marketing, recruitment etc. 

5.Proactive Service and Customer information tracking:

Salesforce maintains and tracks customer information. Salesforce retrieves both quality and quantity. It helps in analyzing the target audience along with the information or the statistics of the business users or accounts. It helps in keeping the organizational profiles in an organized format. Salesforce is comprehensive, convenient and responsive to use.

[Related Articles:salesforce sample resumes]

Salesforce Training

  • Master Your Craft
  • Lifetime LMS & Faculty Access
  • 24/7 online expert support
  • Real-world & Project Based Learning

Cloud Services offered by Salesforce:

Salesforce offers multiple services which would be an essential aspect for enhancement in its growth and popularity. These cloud services are utilized by the users or teams in the organization leading to the development of customer engagement, sales, growth and profits to the organization. Let us have a quick review of the cloud services offered by Salesforce.

Salesforce Sales Cloud: The salesforce Cloud is a platform that helps in the management of the organization sales, marketing and related support facets. The salesforce Cloud brings the customers and their information together in a single platform which is integrated. It helps in achieving the leads, sales, which are the most significant part of the organization. This cloud is best suitable for organizations that are B2B or B2C to achieve business goals and objectives.
Salesforce Marketing Cloud: Salesforce Marketing Cloud is one of the influential digital marketing platforms. It helps in engaging with customers using customer data. The marketers in the organization try to reach out to multiple customers improving the reach by managing the customer journey via email, mobile, web personalization, content creation, web analytics etc.
Salesforce Service Cloud: The Salesforce Service Cloud is developed for the management of the services to the customers. The customers will need support and services, and the support team will work on multiple cases every year. They maintain a log of the cases to enlighten and bring a solution for the problem. It also helps the agents in resolving the problems faster.
Salesforce Community Cloud: The Salesforce Community Cloud is one of the platforms that help in connecting the employees and the organization to connect and communicate with each other. It allows the exchanging of the data and images on a real-time basis.
Salesforce Commerce Cloud: The salesforce Commerce cloud is the platform designed and developed for providing the best customer service and experience. It could be either in online or offline mode. The Salesforce Commerce Cloud is the best choice if you are looking to deliver the best positive, engaging customer experience.
Salesforce Analytics Cloud: The Salesforce Analytics Cloud is the business intelligence platform developed to run a comprehensive analysis of the organizational data. It includes the graphs, charts, data files and other representations of the data that would help in data analytics efficiently. In simpler terms, the salesforce Analytics Cloud helps in visualization of the data.
Salesforce App Cloud: The Salesforce App Cloud is one of the platforms designed to provide the flexibility to develop and run the custom apps. It holds the different tools required for the development of custom applications. Some of the apps are AppExchange, force.com, Heroku, Salesforce Sandbox, etc.
Salesforce IOT Cloud: The Salesforce IOT cloud is used when the organization is looking to store and process the IOT data. It helps in building massive volumes of data that is generated by websites, apps, sensors, etc.
Salesforce Health Cloud: The Salesforce Health Cloud is designed for managing the doctor-patient relationships and record management. If the organization is a Healthcare organization, then Salesforce Health Cloud is the best fit for it.

Cloud Computings, what-is-salesforce-description-0, Cloud Computings, what-is-salesforce-description-1

Subscribe to our YouTube channel to get new updates..!

Which organizations use Salesforce?

Salesforce has become world-wide in a short period of time. Many organizations across many industries use it as it is the secure web-based software CRM with the efficient features that make its way. Now that you are aware of Salesforce, let me give an idea about the organizations that use Salesforce.

Salesforce is used across different industries like Finance, communications, Media, Healthcare, HighTech etc. Let me give a brief idea about each industry with an example company.

  • Communications (Comcast Spectator): Salesforce helps Comcast spectacor to maintain and manage the customer profiles to find the most significant fans to develop an effective marketing technique.
  • Finance (American Express): Salesforce is used by American express to establish, connect and communicate among the organizations, branches, time zones etc.
  • Healthcare (Health Leads): Health Leads organization uses Salesforce to view, update the data related to the patients and coordinate along with the doctors.
  • Media (Coco-Cola): Coco-Cola company uses Salesforce to connect more with the people and the staff to run the organization and profits efficiently.

Salesforce Training

Weekday / Weekend Batches

Conclusion:

I have given you a brief idea on what is Salesforce, why, and its benefits along with products and services offered. Salesforce has turned up to be at the top of innovation. Organizations are relatively growing faster than expected. As Salesforce is booming daily for its extensive support to the organizations, I think learning Salesforce will be an added advantage. Get trained and certified in Salesforce that would indeed help you to be successful in your career.

Other articles:



Source link