oracle aide

December 14, 2013

Top-N with group by in Hive – without analytic functions or ranking UDFs

Filed under: Uncategorized — Tags: — oracleaide @ 2:31 am

Since Hive 0.11 and its analytic functions are not available for my current project I have to resort to simple remedies.

Such as a massive union of top-N queries:

select * from (

select * from (select * from ( select * from v_diff where group_id = 1 ) aa order by abs(diff) desc ) bb limit 100
union all
select * from (select * from ( select * from v_diff where group_id = 2 ) aa order by abs(diff) desc ) bb limit 100
union all
select * from (select * from ( select * from v_diff where group_id = 3 ) aa order by abs(diff) desc ) bb limit 100
union all
select * from (select * from ( select * from v_diff where group_id = 4 ) aa order by abs(diff) desc ) bb limit 100
union all
select * from (select * from ( select * from v_diff where group_id = 5 ) aa order by abs(diff) desc ) bb limit 100
union all
select * from (select * from ( select * from v_diff where group_id = 6 ) aa order by abs(diff) desc ) bb limit 100
union all
select * from (select * from ( select * from v_diff where group_id = 7 ) aa order by abs(diff) desc ) bb limit 100
union all

) uu;

 

Where v_diff is a simple view comparing two partitions in a single table:

desc v_diff;

group_id string

diff float

Contrary to my expectations – it took a while to complete the query.

According to job tracker, all the individual select statement went into a single queue.

A small change in the hive session settings made a difference. 

set hive.exec.parallel=true;

set hive.exec.parallel.thread.number=16;

With parallel execution turned ON – Hadoop launched 16 queries in parallel and completed the whole select much faster. 

Something I would take for granted in an Oracle database with parallel execution. 

 

Advertisements

Leave a Comment »

No comments yet.

RSS feed for comments on this post. TrackBack URI

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Create a free website or blog at WordPress.com.

%d bloggers like this: