Pages

Wednesday 8 January 2014

Hive Example - Loading From .txt File on HDFS

Hive Example - Loading From .txt File on HDFS


Background

This is just a simple hive script that runs using a .txt file as its data source stored on HDFS. 

The text file that is being loaded is a space separated list with a newline character between each entry.  Here is an example of the format:

Lewis 210210201201 156 Iolaire 2006-11-01T00:00:00Z donald16a ds16a
Sam 21021987501 110 LARS 2006-11-01T00:00:00Z donald16a ds16a
J0hn 210207896201 516 Sproule 2006-11-01T00:00:00Z kayleigh9a k8a
etc
The Hive script loads in the data into the schema specified and then performs the various operations on it.  The parameters here are given through the command line.  HDFS_q1_hive.hql is the name of the Hive script.

Bash Command

hive -f HDFS_q1_hive.hql -hiveconf starttime='2006-11-01T00:00:00Z' -hiveconf endtime='2007-11-11T00:00:00Z'

Hive Script

DROP TABLE IF EXISTS table1;

CREATE EXTERNAL TABLE table1(type STRING, aid BIGINT, rid BIGINT, title STRING, ts STRING, uname STRING, uid STRING) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION '/textfilelocation/textfile';

SELECT aid, rid from table1 WHERE (ts >= '${hiveconf:starttime}' AND ts < '${hiveconf:endtime}' AND type == 'REVISION');

No comments:

Post a Comment