Pages

Wednesday 8 January 2014

Pig Script Example - Loading From .txt File on HDFS

Pig Example - Loading From txt File on HDFS


Background

This is just a simple pig script that runs using a .txt file as its data source. 

The text file that is being loaded is a space separated list with a newline character between each entry.  Here is an example of the format:

16485442 7896 11
21512131 2151516 9761651
20899996 12 7896
etc
The Pig script loads in the data into the schema specified and then performs the various operations on it.  A UDF is used to convert and ISO timestamp to Unix long.  A param_file is also used to store the timestamps that will be tested.  HDFS_iq1 is the name of the pig script.

Bash Command

pig -param_file paramfile.txt HDFS_iq1

Pig Script

data = LOAD '../output_folder/textfile' AS (ts, a_id,rev_id);
b = FILTER data BY (ts >= ISOToUnix('$STARTDATE')) AND (ts < ISOToUnix('$ENDDATE')); 
out = FOREACH b GENERATE $1,$2;
dump out;

No comments:

Post a Comment