Pig Example - Loading From txt File on HDFS
Background
This is just a simple pig script that runs using a .txt file as its data source.
The text file that is being loaded is a space separated list with a newline character between each entry. Here is an example of the format:
The text file that is being loaded is a space separated list with a newline character between each entry. Here is an example of the format:
The Pig script loads in the data into the schema specified and then performs the various operations on it. A UDF is used to convert and ISO timestamp to Unix long. A param_file is also used to store the timestamps that will be tested. HDFS_iq1 is the name of the pig script.16485442 7896 11
21512131 2151516 9761651
20899996 12 7896
etc
Bash Command
pig -param_file paramfile.txt HDFS_iq1
Pig Script
data = LOAD '../output_folder/textfile' AS (ts, a_id,rev_id); b = FILTER data BY (ts >= ISOToUnix('$STARTDATE')) AND (ts < ISOToUnix('$ENDDATE')); out = FOREACH b GENERATE $1,$2; dump out;
No comments:
Post a Comment