Pages

Showing posts with label string. Show all posts
Showing posts with label string. Show all posts

Tuesday, 10 June 2014

Test Whether a String/Piece of Text is English (Java)

Testing Whether a String/Piece of Text is English

Being able to tell whether a piece of text looks like a valid string has many applications especially in the cryptography field when trying to perform a Ciphertext Only Attack. This form of Brute Force attack will attempt to try every possible key and determine which key is correct by checking the output and selecting the key that formed the plaintext which is most like the English language.  
The extract below was designed for that purpose.  Use the testEnglish() method to find to calculate the score of the key for the piece of plaintext it has output.  Store each of these scores in some kind of structure (ArrayList) and at the end select the key with the highest score.
This was written in the Java language and the letter frequency scores, bigram and trigram scores were all taken based on their appearance in the English language and are percentages.  Remember to edit the alpha, beta and gamma values to your own liking.
//List of letter pairs and percentage score of how common they are in english language
static String[] letterFreq = {" ", "e", "t", "a", "o", "i", "n", "s", "h", "r", "d", "l", "u", "c", "m", "f", "w", "g", "y", "p", "b", "v", "k"};
static Double[] letterFreqScoring = {13.0, 12.6, 9.1, 8.0, 7.6, 7.0, 7.0, 6.3, 6.2, 6.0, 4.3, 4.1, 2.8, 2.6, 2.6, 2.3, 2.2, 2.0, 2.0, 1.8, 1.5, 1.0, 0.7};
 
static String[] bi = {"th", "he", "in", "er", "an", "re", "nd", "on", "en", "at", "ou", "ed", "ha", "to", "or", "it", "is", "hi", "es", "ng"};
static Double[] bigramScoring = {3.8, 3.6, 2.2, 2.2, 2.1, 1.8, 1.6, 1.4, 1.4, 1.3, 1.3, 1.3, 1.3, 1.2, 1.2, 1.1, 1.1, 1.1, 1.1, 1.1};
 
static String[] tri = {"the", "and", "ing", "her", "hat", "his", "tha", "ere", "for", "ent", "ion", "ter", "was", "you", "ith", "ver", "all", "wit", "thi", "tio"};
static Double[] trigramScoring = {8.0, 3.5, 1.6, 1.1, 0.8, 0.7, 0.6, 0.6, 0.6, 0.6, 0.5, 0.5, 0.5, 0.5, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4};
 
static ArrayList<String> bigram = new ArrayList<String>();
static ArrayList<String> trigram = new ArrayList<String>();
static ArrayList<String> letterFrequency = new ArrayList<String>();
 
bigram = new ArrayList<String>(Arrays.asList(bi));
trigram = new ArrayList<String>(Arrays.asList(tri));
letterFrequency = new ArrayList<String>(Arrays.asList(letterFreq));
 
private static double testEnglish(String text){
 
  //Variants to apply a higher score to bigram and trigram and slightly reduce single letter frequency
  int alpha = 0.5;
  int beta = 3;
  int gamma = 7;
 
  int score = 0;
 int i;
 text = text.toLowerCase();
 
 for (i = 0; i < text.length() - 1; i++){   
  if (letterFrequency.contains(text.substring(i, i+1)))
   score += alpha * letterFreqScoring[letterFrequency.indexOf(text.substring(i, i+1))];
 }
 
 for (i = 0; i < text.length() - 2; i++){
  if (bigram.contains(text.substring(i, i+2)))
   score += beta * bigramScoring[bigram.indexOf(text.substring(i, i+2))];
 }
 
 for (i = 0; i < text.length() - 3; i++){
  if (trigram.contains(text.substring(i, i+3)))
   score += gamma * trigramScoring[trigram.indexOf(text.substring(i, i+3))];
 }
 
 return score;
}

Tuesday, 23 July 2013

RESTful Java Web Service For Solr


You have your Solr server set-up, now what?  You want people to be able to perform queries and have some control over what can be input, well you need a Java Web Service!  This is a short guide on how to create your Java Web Service which may not be tailored to your particular needs but you can tweak it as you please.

Set-Up

To set up the Java Web Service you will need:
  • Netbeans 7 with GlassFish
  • Solr Set-up and Running


Steps Involved

Project Set-Up

  • Start Netbeans and Go to FileNew ProjectJava WebWeb Applications and then hit Next.
  • Give your project a name and then hit Next
  • Ensure that Glassfish is the selected server and hit Finish

HelloResource.java

  • Right click the Default Package and create a new Java Class and call it HelloResource.java
  • Enter the following code:

    import java.net.URL;
    import javax.ws.rs.GET;
    import javax.ws.rs.Path;
    import javax.ws.rs.Produces;
    import javax.ws.rs.QueryParam;
    import java.io.BufferedReader;
    import java.io.InputStreamReader;
    import java.net.URLConnection;
     
    /**
     *
     * @author alamil
     */
     
    //Looks for hello in the pathname
    @Path("hello")
    public class HelloResource {
     
        /**
         *
         * @param arg
         * @param rows
         * @return
         */
        @GET
        @Path("/query")     //Looks for /query in the pathname
        @Produces("text/xml")       //Returns xml
        public String hello(@QueryParam("q") String arg, @QueryParam("rows") String rows, @QueryParam("filter") String filter){
     
            //Variables
            String xmldoc = "";
            String inputLine;
     
            //If the user has not selected a number of rows to display then 50 is set to defualt
            if (rows == null) 
                rows = "50";
     
            //If the user has not selected fields to filter on then it uses the default
            if (filter == null)
                filter = "id,title";
     
            try{
     
                //Trys to connect to the Solr Server with the query
                URL solr = new URL("http://localhost:8080/solr/select?q=url:(" + arg.replaceAll(" ","%20") 
                        + ")^25%20text:(" + arg.replaceAll(" ","%20") + ")&fl=" + filter + "&rows=" + rows);
     
                URLConnection yc = solr.openConnection();
     
                //Reads the returned xml file from the server
                BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
     
                while ((inputLine = in.readLine()) != null){ 
                        xmldoc = xmldoc + inputLine;
                }
     
                //Close the file
                in.close();
     
                //Return the xml document (Replacing the %20's with spaces)
                return xmldoc.replaceAll("%20", " ");
            }catch (Exception e){
                    return "Exception";
            }
        }
    }

RESTConfig.java

  • Right click the Default Package and create a new Java Class and call it RESTConfig.java
  • Enter the following code:

    import javax.ws.rs.core.Application;
    import javax.ws.rs.ApplicationPath;
    /**
     *
     * @author alamil
     */
    @ApplicationPath("SearchInt")
    public class RESTConfig extends Application {
     
    }

Testing

  • Go to RunRun Project and your server should start
  • A browser window will open to the index.html page that is there which you can tweak and edit to your own liking
  • Start your solr server
  • In your browser, Navigate to:
    localhost:8080/HelloRest/SearchInt/hello/query?q=[query term]&filter=[filter term]&rows=[rows]
  • The input parameters set up are:
    1. q=… : representing the keywords in the query
    2. rows=… : representing the number of rows you wish to have returned
    3. filter=… :representing which fields are displayed to the user
Your server and Solr server may be running from the same port which can cause problems

Storing text in an array splitting by whitespace (C#)

If you have a string and you wish to store individual words from that string in an array split by whitespace then. Using c#:

string[] ssize = myStr.Split(null)
The split method assumes whitespace to be the splitting character when you specify null and you now have an array of string with each word from the original text.