Education

Partitioning in Hive – Hadoop Online Tutorials

outoff.com.co

Published

September 19, 2018

What is Hive Partition?

Hive Partitions is a way to organize tables into partitions. In Apache, it is done by dividing tables into different parts based on the partition keys. This is used when a table has one or many partition keys. Partition keys are the basic elements for shaping how the data is stored in a table.

Why Hive Partition is needed:

The Hive Partition was introduced to make the process of data querying easy. The Apache Hive reads the full dataset when we submit a SQL query and then submits it to a cluster. Apache Hive makes the partitions very easy by creating partitions by its partition method at the time of table creation. In this method, all the table data is divided into many partitions and each partition matches to a specific value of partition column. So it reduces the input and output time which is required by the query. As a result of this, the performance speed will be increased.

Types of Hive Partition:

There are two types of Hive Map. They are

Static Partition

Dynamic Partition

Static Partition:

If we insert data files one by one into a partition table, it is called Static Partition. While loading big files into the hive table static partition is preferred. The partition also helps in saving our time in loading data. This is because here we add a partition statically in the table and move the file into the partition of the table and also we can alter it. This static partition will also be in Strict Mode. It can be performed in the Hive manage table or in an external table too.

Dynamic Partition:

Dynamic Partition is the single insertion into a partition table. This partition loads the data from a non-partitioned table. But this Dynamic Partition takes much more time in loading all the data when we compare it to static partition. When there is a large data to be stored in a table this dynamic partition is very useful. It is also useful when we do not know how many columns are to be partitioned. Here we cannot alter the Dynamic Partition.

Advantages of Hive Partition:

This distributes all the elements horizontally

If there is a low volume of data a very fast execution takes place and hence time can be saved.

Disadvantages of Hive Partition:

Here searching a single record in the entire table is very difficult.

If there are many directories too manysmall partitions will be made

If there is a high volume of data it is very difficult to handle as it takes a very long time for execution.

Pig interview questions:

In Apache, Pig, it is an Apache open-source project and spark partitions will be there for qualifying a better developer. This Apache pig can be either in local mode or in a cluster mode. To start Apache Pig in a local mode, the option “-x local” should be used. If no option is stated then by default it will be started in the cluster mode.

In this article:

Click to comment

You must be logged in to post a comment Login

Health

7 Best Home Remedies For Yellow Jacket Stings

Have you ever stung by a yellow jacket? If the answer is “yes”, you certainly suffer from the feeling of pain, itching, redness, swelling...

outoff.com.coJune 9, 2018

Beauty

The best eyeliner for contact lens

Using contact lenses is the most effective means for women who do not like to wear glasses. However, this makes it quite difficult for...

outoff.com.coFebruary 13, 2018

Digital Marketing

Technology has a more significant influence on marketing. Know how?

Do I need to say that modern technology has vastly empowered marketing, and it has become stronger than earlier? Technology and Marketing are intertwined...

outoff.com.coApril 20, 2020

Automotive

Important Things to Keep In Mind When Buying an Off-Road Vehicle

Buying an off-road vehicle is an exciting decision and we’re sure you just want to get out on the open road — or off...

outoff.com.coJune 24, 2020

Outoff.com.co

Education

Partitioning in Hive – Hadoop Online Tutorials