Pages

Tuesday, 10 June 2014

Dreamweaver CS6 and Subversion Integration

Dreamweaver & Subversion Integration

To set up Subversion with Dreamweaver CS6 (Should work with older versions too).
  1. Click on Manage Sites then New Site
  2. Choose a Site Name and Storage Location
  3. Click on the Version Control tab on the left hand side
  4. Set Access to Subversion
  5. Set Protocol to SVN
  6. Set Server Address to pabbay
  7. Set Repository Path to the file(s) you with to check out. Or nothing for the whole project
  8. Set Server Port leave blank unless a port has been configured.
  9. Set Username as your username (allan)
  10. Set Password as the password
  11. Click Test and you should see the prompt "Server and project are accessible"
  12. Click Save then Done
The assumption is being made that you have set up your subversion repository on a server.  Therefore ensure the service or daemon is running on the server first

Checking Out

Dreamweaver is now connected to Subversion. The next step is to Checkout the latest version of the project
  1. Right-Click the Site in the Files Pane
  2. Go to Version Control then Get Latest Versions
This should now begin transferring the files accross the network for you to use within Dreamweaver

Committing Changes (Checking In)

Committing your changes is very simple.
  1. Make the changes to the file you wish
  2. Right Click the file in local view
  3. Go to Version Control then Commit
  4. Leave a comment in the box - it is good practice to leave short succint comments
  5. Click Commit

Reverting To An Older Version

With version control it is easy to revert a file to an older version, or the whole project for that matter.
To revert to a previous version.
  1. Right Click the file or folder you wish to revert
  2. Go to Version Control and then Manage Revisions
  3. Select the file or folder revision you wish to revert to and click Promote To Current

Test Whether a String/Piece of Text is English (Java)

Testing Whether a String/Piece of Text is English

Being able to tell whether a piece of text looks like a valid string has many applications especially in the cryptography field when trying to perform a Ciphertext Only Attack. This form of Brute Force attack will attempt to try every possible key and determine which key is correct by checking the output and selecting the key that formed the plaintext which is most like the English language.  
The extract below was designed for that purpose.  Use the testEnglish() method to find to calculate the score of the key for the piece of plaintext it has output.  Store each of these scores in some kind of structure (ArrayList) and at the end select the key with the highest score.
This was written in the Java language and the letter frequency scores, bigram and trigram scores were all taken based on their appearance in the English language and are percentages.  Remember to edit the alpha, beta and gamma values to your own liking.
//List of letter pairs and percentage score of how common they are in english language
static String[] letterFreq = {" ", "e", "t", "a", "o", "i", "n", "s", "h", "r", "d", "l", "u", "c", "m", "f", "w", "g", "y", "p", "b", "v", "k"};
static Double[] letterFreqScoring = {13.0, 12.6, 9.1, 8.0, 7.6, 7.0, 7.0, 6.3, 6.2, 6.0, 4.3, 4.1, 2.8, 2.6, 2.6, 2.3, 2.2, 2.0, 2.0, 1.8, 1.5, 1.0, 0.7};
 
static String[] bi = {"th", "he", "in", "er", "an", "re", "nd", "on", "en", "at", "ou", "ed", "ha", "to", "or", "it", "is", "hi", "es", "ng"};
static Double[] bigramScoring = {3.8, 3.6, 2.2, 2.2, 2.1, 1.8, 1.6, 1.4, 1.4, 1.3, 1.3, 1.3, 1.3, 1.2, 1.2, 1.1, 1.1, 1.1, 1.1, 1.1};
 
static String[] tri = {"the", "and", "ing", "her", "hat", "his", "tha", "ere", "for", "ent", "ion", "ter", "was", "you", "ith", "ver", "all", "wit", "thi", "tio"};
static Double[] trigramScoring = {8.0, 3.5, 1.6, 1.1, 0.8, 0.7, 0.6, 0.6, 0.6, 0.6, 0.5, 0.5, 0.5, 0.5, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4};
 
static ArrayList<String> bigram = new ArrayList<String>();
static ArrayList<String> trigram = new ArrayList<String>();
static ArrayList<String> letterFrequency = new ArrayList<String>();
 
bigram = new ArrayList<String>(Arrays.asList(bi));
trigram = new ArrayList<String>(Arrays.asList(tri));
letterFrequency = new ArrayList<String>(Arrays.asList(letterFreq));
 
private static double testEnglish(String text){
 
  //Variants to apply a higher score to bigram and trigram and slightly reduce single letter frequency
  int alpha = 0.5;
  int beta = 3;
  int gamma = 7;
 
  int score = 0;
 int i;
 text = text.toLowerCase();
 
 for (i = 0; i < text.length() - 1; i++){   
  if (letterFrequency.contains(text.substring(i, i+1)))
   score += alpha * letterFreqScoring[letterFrequency.indexOf(text.substring(i, i+1))];
 }
 
 for (i = 0; i < text.length() - 2; i++){
  if (bigram.contains(text.substring(i, i+2)))
   score += beta * bigramScoring[bigram.indexOf(text.substring(i, i+2))];
 }
 
 for (i = 0; i < text.length() - 3; i++){
  if (trigram.contains(text.substring(i, i+3)))
   score += gamma * trigramScoring[trigram.indexOf(text.substring(i, i+3))];
 }
 
 return score;
}

Wednesday, 8 January 2014

Pig Example - Loading Data From HBase

Pig Example - Loading Data From HBase


Background

This is just a simple pig script that use HBase as its data source. 

Column1 - timestamp - long
Column2 - composite numbers - bytearray
The Pig script loads in the data into the schema specified and then performs the various operations on it.  A UDF is used to convert the composite bytesarray into its two separate parts.  A param_file is also used to store the timestamps that will be tested. 

SmallIndex is the Column Family in this case.

hbase_iq1_pig is the name of the pig script.

Bash Command

pig -param_file paramfile.txt hbase_iq1_pig.pig

Pig Script

REGISTER ./ConvertCompositeKey.jar; 
DEFINE article allan.myudf.ConvertFirst();
DEFINE revision allan.myudf2.ConvertFirst(); 

in = LOAD 'hbase://Table' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('smallIndex:1' , '-loadKey true -caster HBaseBinaryConverter')
 AS (ts: long, comp:bytearray);
modify = FILTER in BY (((long)'$STARTDATE') <= ts) AND (((long)'$ENDDATE') >= ts);
titles = FOREACH modify GENERATE article(comp), revision(comp);
DUMP titles;

Hive Example - Loading From .txt File on HDFS

Hive Example - Loading From .txt File on HDFS


Background

This is just a simple hive script that runs using a .txt file as its data source stored on HDFS. 

The text file that is being loaded is a space separated list with a newline character between each entry.  Here is an example of the format:

Lewis 210210201201 156 Iolaire 2006-11-01T00:00:00Z donald16a ds16a
Sam 21021987501 110 LARS 2006-11-01T00:00:00Z donald16a ds16a
J0hn 210207896201 516 Sproule 2006-11-01T00:00:00Z kayleigh9a k8a
etc
The Hive script loads in the data into the schema specified and then performs the various operations on it.  The parameters here are given through the command line.  HDFS_q1_hive.hql is the name of the Hive script.

Bash Command

hive -f HDFS_q1_hive.hql -hiveconf starttime='2006-11-01T00:00:00Z' -hiveconf endtime='2007-11-11T00:00:00Z'

Hive Script

DROP TABLE IF EXISTS table1;

CREATE EXTERNAL TABLE table1(type STRING, aid BIGINT, rid BIGINT, title STRING, ts STRING, uname STRING, uid STRING) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' STORED AS TEXTFILE LOCATION '/textfilelocation/textfile';

SELECT aid, rid from table1 WHERE (ts >= '${hiveconf:starttime}' AND ts < '${hiveconf:endtime}' AND type == 'REVISION');

Pig Script Example - Loading From .txt File on HDFS

Pig Example - Loading From txt File on HDFS


Background

This is just a simple pig script that runs using a .txt file as its data source. 

The text file that is being loaded is a space separated list with a newline character between each entry.  Here is an example of the format:

16485442 7896 11
21512131 2151516 9761651
20899996 12 7896
etc
The Pig script loads in the data into the schema specified and then performs the various operations on it.  A UDF is used to convert and ISO timestamp to Unix long.  A param_file is also used to store the timestamps that will be tested.  HDFS_iq1 is the name of the pig script.

Bash Command

pig -param_file paramfile.txt HDFS_iq1

Pig Script

data = LOAD '../output_folder/textfile' AS (ts, a_id,rev_id);
b = FILTER data BY (ts >= ISOToUnix('$STARTDATE')) AND (ts < ISOToUnix('$ENDDATE')); 
out = FOREACH b GENERATE $1,$2;
dump out;

Tuesday, 7 January 2014

Hive UDF - Convert Date to Unix Timestamp Example

Hive UDF - Convert Date to Unix Timestamp


Background

This little UDF will convert a date, in any specified format, into a unix timestamp.  To specify the date just edit the string in the SimpleDateFormat to your liking. So here is how we did it.  

I have also left in the imports and you will need to find the jar files that contain these classes.  

Implementation

package allan.DtoT;
import java.text.ParseException;
import java.text.SimpleDateFormat;

import org.apache.hadoop.hive.ql.exec.UDF;

public class DateToTime extends UDF{
 public long evaluate(final String d){
  try{
   SimpleDateFormat sf = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");
   return sf.parse(d.trim()).getTime();
  } catch (ParseException pe) {return -1;}
  
 }
}

Pig UDF Example

Pig UDF Example


Background

This little UDF will convert the first 8 bytes of an HBase key into a long.  The Key that we had was a composite key made up of two 8 Byte longs and we needed to convert the first 8 bytes and then the second 8 bytes to get them separately. So here is how we did it.  

I have also left in the imports and you will need to find the jar files that contain these classes.  

Implementation

package allan.myudf;
import java.io.IOException;

import org.apache.hadoop.hbase.util.Bytes;
import org.apache.pig.EvalFunc;
import org.apache.pig.backend.hadoop.hbase.HBaseBinaryConverter;
import org.apache.pig.data.DataByteArray;
import org.apache.pig.data.Tuple;


public class ConvertFirst extends EvalFunc<Long> {
 public Long exec(Tuple input) throws IOException {
  if (input != null && input.size() == 1) {
   try {
    DataByteArray a = (DataByteArray) input.get(0);
    HBaseBinaryConverter b = new HBaseBinaryConverter();
    return Bytes.toLong(b.toBytes(a),0,8);
   
                        } catch (IllegalArgumentException e) {
    System.err.println("...");
   }
  }
  return null;
 }
}

Pig UDF - Converting HBase Key to Long

Pig UDF - Converting HBase Key to Long


Background

This little UDF will convert the first 8 bytes of an HBase key into a long.  The Key that we had was a composite key made up of two 8 Byte longs and we needed to convert the first 8 bytes and then the second 8 bytes to get them separately. So here is how we did it.  

I have also left in the imports and you will need to find the jar files that contain these classes.  

Implementation

package allan.myudf;
import java.io.IOException;

import org.apache.hadoop.hbase.util.Bytes;
import org.apache.pig.EvalFunc;
import org.apache.pig.backend.hadoop.hbase.HBaseBinaryConverter;
import org.apache.pig.data.DataByteArray;
import org.apache.pig.data.Tuple;


public class ConvertFirst extends EvalFunc<Long> {
 public Long exec(Tuple input) throws IOException {
  if (input != null && input.size() == 1) {
   try {
    DataByteArray a = (DataByteArray) input.get(0);
    HBaseBinaryConverter b = new HBaseBinaryConverter();
    return Bytes.toLong(b.toBytes(a),0,8);
   
                        } catch (IllegalArgumentException e) {
    System.err.println("...");
   }
  }
  return null;
 }
}