[Hive]Hive SQL 分区间统计问题

需求：

表table中存在一个text字段，需要统计出文本长度为[0,20]、[20,40]、[40,60]…按区间划分的数量。其中，需要自动化到包含最大值max的区间。

实现：

第一种方法：

select count(case when length(text) between 0 and 20 then 1 end) as text1,
count(case when length(text) between 21 and 40 then 1 end) as text2,
count(case when length(text) between 41 and 60 then 1 end) as text3,
count(case when length(text) between 61 and 80 then 1 end) as text4,
count(case when length(text) between 81 and 100 then 1 end) as text5,
count(case when length(text) &gt; 100 then 1 end) as text6
from table;

这样写有一个问题，就是必须先知道max的值，然后随着max的增大，sql语句会越来越长，而且涉及到人工的干预。

第二种方法：

select floor(length(text)/20) * 20, count(*) from table group by floor(length(text)/20);

这种方法，先将每个text的文本长度除以20，向下取整，再乘以20得到一个分组依据，以这个依据进行group by就可以得到最终结果，在规避循环的情况下，用简单的语句，巧妙的解决了自动分区间统计的问题。

CoinIdea的技术博客 Life is random – 记录成长的点滴

发表评论取消

发表评论 取消

发表评论取消