Pig Example - Loading Data From HBase
Background
This is just a simple pig script that use HBase as its data source.
SmallIndex is the Column Family in this case.
hbase_iq1_pig is the name of the pig script.
The Pig script loads in the data into the schema specified and then performs the various operations on it. A UDF is used to convert the composite bytesarray into its two separate parts. A param_file is also used to store the timestamps that will be tested.Column1 - timestamp - long
Column2 - composite numbers - bytearray
SmallIndex is the Column Family in this case.
hbase_iq1_pig is the name of the pig script.
Bash Command
pig -param_file paramfile.txt hbase_iq1_pig.pig
Pig Script
REGISTER ./ConvertCompositeKey.jar; DEFINE article allan.myudf.ConvertFirst(); DEFINE revision allan.myudf2.ConvertFirst(); in = LOAD 'hbase://Table' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('smallIndex:1' , '-loadKey true -caster HBaseBinaryConverter')
AS (ts: long, comp:bytearray); modify = FILTER in BY (((long)'$STARTDATE') <= ts) AND (((long)'$ENDDATE') >= ts); titles = FOREACH modify GENERATE article(comp), revision(comp); DUMP titles;