Few words about ConcurrentHashMap

Posted:  May 1 / 3:52pm by Ruslan Ciurca

There is a popular belief that, because ConcurrentHashMap is part of the java.util.concurrent package then it is designed to deal with multi-threading only. However, ConcurrentHashMap also has quite a useful property that I will exploit in this post, without mentioning multi-threading at all.

Let's start with the following code:

import java.util.Map;
import java.util.HashMap;

public class HashMapTest {
 // creates a map with the keys from an array and
 // initializes the values with zero 
 public static Map<String, Integer> prepareMap(String[] arr) {
  Map<String, Integer> map = new HashMap<String, Integer>();
  for (String str : arr) {
   map.put(str, new Integer(0));
  }
  return map;
 }

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
  }
 }

 // prints the map content
 public static void printMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   System.out.println(str + " -- " + map.get(str));
  }
 }

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  Map<String, Integer> map = prepareMap(arr);
  printMap(map);
  updateMap(map);
  printMap(map);
 }
}

Nothing complicated, we generate a map and process it within the updateMap method. Everything works fine at this point. However, let's imagine that we need a slightly more complicated processing of the map, e.g. if we encounter the key "C", then we need to add a new entry to the map, something like this:

// updates the map, i.e. increases the values by one
public static void updateMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
   if (str.equals("C")) {
    // we have a special case, let's treat it
    map.put("C1", new Integer(0));
   }
  }
}

Now, when we execute the updated code, it fails (run time) with the following exception:

Exception in thread "main" java.util.ConcurrentModificationException
        at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
        at java.util.HashMap$KeyIterator.next(Unknown Source)
        at HashMapTest.updateMap(HashMapTest.java:19)
        at HashMapTest.main(HashMapTest.java:55)

And this is exactly what we are told by the specification:

"Returns a Set view of the keys contained in this map. The set is backed by the map, so changes to the map are reflected in the set, and vice-versa. If the map is modified while an iteration over the set is in progress (except through the iterator's own remove operation), the results of the iteration are undefined."

In other words, HashMap (in fact, this isn't specific to HashMap only) doesn't allow for "updates while iterating", more specifically adding new entries with new keys.

One way to fix the problem is to create a copy of the key set, something like this:

import java.util.Map;
import java.util.HashMap;
import java.util.HashSet;

public class HashMapTest {
 // creates a map with the keys from an array and
 // initializes the values with zero 
 public static Map<String, Integer> prepareMap(String[] arr) {
  Map<String, Integer> map = new HashMap<String, Integer>();
  for (String str : arr) {
   map.put(str, new Integer(0));
  }
  return map;
 }

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  HashSet<String> keys = new HashSet<String>(map.keySet());
  for (String str : keys) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
   if (str.equals("C")) {
    // we have a special case, let's treat it
    map.put("C1", new Integer(0));
   }
  }
 }

 // prints the map content
 public static void printMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   System.out.println(str + " -- " + map.get(str));
  }
 }

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  Map<String, Integer> map = prepareMap(arr);
  printMap(map);
  updateMap(map);
  printMap(map);
 }
}

But, if we look at the output, we will mention that "C1" node is unprocessed:

...
A -- 1
B -- 1
C -- 1
C1 -- 0
H -- 1
...

As a result, we will need a new routine to process the unprocessed data. There is a different solution though, to use ConcurrentHashMap, which according to the specification:

"Retrievals reflect the results of the most recently completed update operations holding upon their onset."

And here is the final code:

import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

public class HashMapTest {
 // creates a map with the keys from an array and
 // initializes the values with zero 
 public static Map<String, Integer> prepareMap(String[] arr) {
  Map<String, Integer> map = new ConcurrentHashMap<String, Integer>();
  for (String str : arr) {
   map.put(str, new Integer(0));
  }
  return map;
 }

 // updates the map, i.e. increases the values by one
 public static void updateMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   Integer i = map.get(str);
   map.put(str, new Integer(i.intValue() + 1));
   if (str.equals("C")) {
    // we have a special case, let's treat it
    map.put("C1", new Integer(0));
   }
  }
 }

 // prints the map content
 public static void printMap(Map<String, Integer> map) {
  for (String str : map.keySet()) {
   System.out.println(str + " -- " + map.get(str));
  }
 }

 static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  Map<String, Integer> map = prepareMap(arr);
  printMap(map);
  updateMap(map);
  printMap(map);
 }
}

All the entries are processed accordingly.

Now, let's check the efficiency, by executing the following code:

static public void main(String[] args) {
  String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

  long total = 0;
  int max = 1000000;
  for (int i = 0; i < max; i++) {
   long start = System.currentTimeMillis();
   Map<String, Integer> map = prepareMap(arr);
   updateMap(map);
   long end = System.currentTimeMillis();
   total += end - start;
  }
  System.out.println("Average time " + ((1.0*total)/max) + "ms");
}

Average for the ConcurrentHashMap version:

Average time 0.0022ms

Average for the HashMap and HashSet version:

Average time 0.0010ms

Indeed, ConcurrentHashMap brings some (worth to consider) performance penalties. But it isn't like it is twice slower than the HashMap and HashSet version, because in this example we are operating with a small number of entries:

String[] arr = {"A", "B", "C", "D", "E", "F", "G", "H"};

If we increase the number of entries to e.g. 259 (from "A0", "A1", ... to "Z9", excluding "C1"), then we have the following figures:

For the ConcurrentHashMap version:

Average time 0.048ms

For the HashMap and HashSet version:

Average time 0.033ms
Tags:  Programming   Technology  

Comments (0)

The Game Of Life

Posted:  Mar 1 / 9:55am by Ruslan Ciurca

Did I manage to drag your attention with the title? Good. Let's proceed to the lyrical part of this post then. Few years ago I watched this film and found the statistics starting at 1:16:56 being quite alarming. Not that I didn't know about most of those details, but it is totally different when you see them altogether. I encourage you to watch it as well. However, for the sake of this particular post, let's retain the following statistics: "20% of the world's population consumes 80% of its resources".

Now, let's proceed to the technical part of the post. My son started to learn Pascal at school and from time to time I give him "advanced" home assignments, to polish his programming (and thinking) technique. After few complaints from my son, aka "I am tired of doing nonsenses like calculate π, sorting etc. I want to do something more real" (well, he is just 12 years old), I decided to give him something "more real", a very basic Monte-Carlo simulation that I called "The Game Of Life". The reaction was quite unexpectedly expected, he got dragged by the title :)

So, what is the game about? Basically, let's imagine a world with limited resources (I don't think it is difficult to imagine this), which in the language of the game is called simply "wealth". We have a population of players with the total number of PLAYERS=100. Each player has two attributes, "wealth" and "power". The game starts with the equal distribution of wealth between the players and the "power=0" for each player. With each iteration, we can call this "year", each player plays against another randomly picked player (the players can't play with themselves). The rules for winning a play are quite simple:

  • The player with greater "power" wins.
  • If the players have the same power, the richest one wins.
  • If the players have the same power and wealth, the winner is picked randomly.
  • The winner gets a reward, which means their "power" increases by 1 and if the loser has sufficient "wealth", the winner's "wealth" increases by 1, the loser's wealth decreases by 1 respectively (so we keep the resources constant).

The game continues for 10000 iterations.

 

Here is the code (slightly adjusted with my help):

program montecarlosimulation;
uses wincrt;
const
        PLAYERS = 100;
        YEARS = 10000;

var
        i,j:integer;
        x,y:integer;
        wealth:array[1..PLAYERS] of longint;
        power :array[1..PLAYERS] of longint;
        tmp:longint;
        poor:integer;

// p1 is the winner!!!
procedure reward(p1, p2:integer);
begin
        if wealth[p2] > 0 then begin
                wealth[p1] := wealth[p1] + 1;
                wealth[p2] := wealth[p2] - 1;
        end;
        power[p1] := power[p1] + 1;
end;

// decide who is the winner and apply the reward
function decision(a, b:integer):integer;
var
        d:integer;
begin
        // same power
        if power[b] = power[a] then begin
               // same wealth
                if wealth[a] = wealth[b] then begin
                        // pick the winner randomly
                        d := random(2);
                        if d = 0 then reward(a,b)
                        else reward(b,a);
                end
                else begin
                        if wealth[b] > wealth[a] then reward(b,a)
                        else reward(a,b);
                end;
        end
        else begin
                if power[b] > power[a] then reward(b,a)
                else reward(a,b);
        end;
end;

begin
        // initialise the players' values
        // total wealth is PLAYERS*1000
        for i := 1 to PLAYERS do begin
                wealth[i] := 1000;
                power[i] := 0;
        end;

        // the actual game
        randomize;
        for i := 1 to YEARS do begin
                for x := 1 to PLAYERS do begin
                        y := random(PLAYERS) + 1;
                        // the actual play
                        if x <> y then decision(y, x);
                end;
        end;

        // sorting the results, wealthiest on top
        for i := 1 to PLAYERS do begin
                for j:=i+1 to PLAYERS do begin
                        if wealth[i] < wealth[j] then begin
                                tmp := wealth[i];
                                wealth[i] := wealth[j];
                                wealth[j] := tmp;

                                tmp := power[i];
                                power[i] := power[j];
                                power[j] := tmp;
                        end;
                end;
        end;

        // printing the results
        poor := 0;
        for i := 1 to PLAYERS do begin
           if (wealth[i] = 0) then poor := poor + 1;
           writeln('person[', i,'] has ', wealth[i],' wealth and ',power[i],' power');
        end;

        // printing statistics
        writeln(poor, ' of ', PLAYERS, ' are poor.');
end.

This code was compiled using Free Pascal IDE.

 

Now, let's analyse the results. After executing this code several times, the number of wealth players (those with "wealth" > 0) is in between 20-23, from 100. This is ~20%. Setting YEARS = 15000 will decrease the result to 16%.But let's get back to the 20% figure, is this a coincidence with the 20% from the lyrical part on top of the post? The decision to set YEARS = 10000 was somewhat a random one, but later we (I and my son) figured out that ... check this out "Human prehistory begins in the Paleolithic Era, or "Early Stone Age". Later, during the Neolithic Era (New Stone Age), came the Agricultural Revolution(between 8000 and 5000 BCE) in the Fertile Crescent, where humans first began the systematic husbandry of plants and animals".

Couple of interesting facts as you can see. Obviously the code is a way too simple to pretend to model a real game of life, e.g. the number of players is constant, there is no chance for the poor players to ever win (which isn't the case in our reality), lots of rich people are also philanthropists, "brain" (and "faith") attribute isn't considered at all, two or more poor players with total "power" greater than the "power" of a rich player can compete against that player, an "insufficiently" rich player can bribe a richer and more powerful player to win the play with another player etc. This will obviously make the game more real and complicated (for my son, who still learns Pascal, while his father "plays" the game).

Anyway, if I managed to drag your attention to this subject, feel free to write a better version of "The Game Of Life" and drop me a note with your results. I am really curious.

Tags:  Programming   Technology  

Comments (0)

Scope of variables in Java

Posted:  Jan 30 / 2:01pm by Ruslan Ciurca

While playing with some performance improvements this week, I came across this article. Weird you'd say, but let me show my version of the test scenario.

First of all, I am using Java 1.6.0_24 HotSpot 64-bit and executing the code with the following options (i.e. limiting the heap size to 64Mb):

java -Xms64m -Xmx64m MemTest

Here is the first version of the code:

public class MemTest {
 public static int avg(int[] ns) {
  long sum = 0;
  for (int i = 0; i < ns.length; i++) sum += ns[i];
  return (int) (sum/ns.length);
 }

 public static void main(String[] args) {
  int[] nums = null;

  for (int j = 0; j < 1000000; j++) {
   System.out.print("Iteration: ");
   System.out.println(j);

   int sz = 2<<(j % 30);
   if (sz > 11162611) sz = 11162611;

   System.out.print("Dynamic size: ");
   System.out.println(sz);

   nums = new int[sz];
   for (int i = 0; i < nums.length; i++) nums[i] = i;

   int avr = avg(nums);
   System.out.print("Average: ");
   System.out.println(avr);
  }
 }
}

There is nothing complicated in this code, we allocate an array (int[] nums) outside the main "for" loop (this is the key point of this post by the way) and calculate the average. Everything works fine so far. Now let's remove the "for" loop responsible for assignments:

public class MemTest {
 public static int avg(int[] ns) {
  long sum = 0;
  for (int i = 0; i < ns.length; i++) sum += ns[i];
  return (int) (sum/ns.length);
 }

 public static void main(String[] args) {
  int[] nums = null;

  for (int j = 0; j < 1000000; j++) {
   System.out.print("Iteration: ");
   System.out.println(j);

   int sz = 2<<(j % 30);
   if (sz > 11162611) sz = 11162611;

   System.out.print("Dynamic size: ");
   System.out.println(sz);

   nums = new int[sz];
   int avr = avg(nums);
   System.out.print("Average: ");
   System.out.println(avr);
  }
 }
}

Now the execution fails with the OutOfMemoryError. Odd, isn't it? What do we typically do in such cases? Increase the heap size? Well, but the previous version was working fine with the same heap size. Ok, now let's do something which isn't considered as a good practise by adding the line "nums = null;":

public class MemTest {
 public static int avg(int[] ns) {
  long sum = 0;
  for (int i = 0; i < ns.length; i++) sum += ns[i];
  return (int) (sum/ns.length);
 }

 public static void main(String[] args) {
  int[] nums = null;

  for (int j = 0; j < 1000000; j++) {
   System.out.print("Iteration: ");
   System.out.println(j);

   int sz = 2<<(j % 30);
   if (sz > 11162611) sz = 11162611;

   System.out.print("Dynamic size: ");
   System.out.println(sz);

   nums = new int[sz];
   int avr = avg(nums);
   nums = null; // <<-- HERE IS THE NEW LINE

   System.out.print("Average: ");
   System.out.println(avr);
  }
 }
}

Hey, fantastic, it works again! And now let's do it properly by changing the scope of the "int[] nums":

public class MemTest {
 public static int avg(int[] ns) {
  long sum = 0;
  for (int i = 0; i < ns.length; i++) sum += ns[i];
  return (int) (sum/ns.length);
 }

 public static void main(String[] args) {
  for (int j = 0; j < 1000000; j++) {
   System.out.print("Iteration: ");
   System.out.println(j);

   int sz = 2<<(j % 30);
   if (sz > 11162611) sz = 11162611;

   System.out.print("Dynamic size: ");
   System.out.println(sz);

   int[] nums = new int[sz]; // <<-- LOCAL SCOPE
   int avr = avg(nums);

   System.out.print("Average: ");
   System.out.println(avr);
  }
 }
}

And it works again.

So, why is this happening? I have no idea. It could be a particularity of the JVM implementation, as the author of the article, I mentioned at the top, suggests. But, obviously we can draw a conclusion, variable scope is something worth considering (and probably coding standards aren’t always just "cosmetic" features).

Tags:  Programming   Technology  

Comments (0)

K-Means Clustering Algorithm

Posted:  Dec 13 / 9:21am by Ruslan Ciurca

This post is dedicated to K-Means Clustering Algorithm, used for unsupervised learning. Here is a short introduction into the unsupervised learning subject. This video shows the algorithm at work.

So, how does it work?

1. First of all we have the number K (the number of clusters) and the data set X1,...,Xm as inputs, K < m and Xj ∈ ℜn for ∀j=1..m. Considering that clustering is used as a task in data mining (e.g. including text), it is a nice exercise indeed to formalise a problem in a way where data is represented by vectors :)

2. At this step we randomly initialise K cluster centroids μ1,...,μK, μj ∈ ℜn for ∀j=1..K. The recommended way is to randomly initialise the centroids from the data set X1,...,Xm and make sure that μi≠μj for ∀ i≠j.

3. At this step we execute the loop:

Repeat {
  //cluster assignment step
  For i=1..m {
    Find the closest centroid for X[i], i.e.
    min{||X[i] - μ[j]||}, ∀j=1..K, e.g. it is μ[t];
    Assign c[i]=t;
    //in this case c[i] is the index of 
    //the closest centroid for X[i]
  }

  //move centroids step, if a particular cluster 
  //centroid has no points assigned to it, 
  //then eliminate it
  For j=1..K {
    old_μ[j] = μ[j];
    μ[j] = average (mean) of all X[i] where c[i]=j, ∀i=1..m;
  }
}

This loop ends when all the centroids stop "moving", i.e. ||old_μj - μj|| < ε, ∀j=1..K, where ε is an error we are happy with (e.g. 0.01 or 0.001).

 

This is pretty much the algorithm. However, in this form, there is a risk to get stuck in a local minima. By local minima I mean the local minima of the cost function:
J(X1,...,Xm,c1,...cm1,...,μK) = (1 ⁄ m)⋅∑||Xi – μci||2, sum is for i=1..m

In order to address this, the algorithm (steps 2, 3) are executed several times (e.g. 100). Each time the cost function (J(...)) is computed and the clustering with the lowest J(...) is picked up. E.g.

For p=1..100 {
  Perform step 2;
  Perform step 3;
  Calculate cost function J(...);
}
Pick up clustering with the lowest cost J(...);

 

In order to make it more interactive, I and my son spent couple of hours implementing a JavaScript version of this algorithm using <canvas> tag from HTML5 (my son studies HTML at school, so it was an interesting exercise). Here is the link to the page (excuse me and my son for our non OOP approach to JavaScript :)). If you want to have a look at the sources, feel free to download that page (we didn't provide comments, but hopefully this article explains everything). We tested it with Google Chrome and Internet Explorer 9 (if you have problems with IE9, please consult the following link).

Note: While I was writing this article, a friend of mine suggested this nice JavaScript library for plotting www.jqplot.com.

Comments (0)

Xbox Live: Updated

Posted:  Dec 9 / 3:20pm by Ashley Ellen

As a keen gamer, I often chew the gaming-world-fat with Bradley over the desk partition in the office. (Honestly, that isn’t a euphemism)

Yesterday I downloaded the latest (and much publicised) Xbox Live update, and I have to say at first glance I was extremely impressed.  It has been given a sleek new look and feel with Microsoft’s “Metro” design – the ‘Tile-Style’ already used on Windows Phone and planned for Windows 8.

The menu has Bing, Home, Social, Video, Games and Music - all nicely separated and ready for Apps (more on those in a minute). It feels like a new console. So far, so good…

Although it does look quite different, the biggest change is the expanded voice and motion control. Not too great if you don’t own a Kinect sensor! Fortunately for me – I own one already and so was very happy that there is no longer a need to go to the special “Kinect Hub” to use these control features. The Bing search page is also a welcome addition. Currently a little limited and only able to search for Xbox games, movies and apps, this will be made better if/when Xbox Live enables a full browser. The voice control feels very slick and results were displayed very quickly (“Bing, Tiger Woods” gave me the last 5 golf games featuring Tiger Woods and ability to purchase these by download. Fortunately, Xbox didn’t ‘hear’ “Bang Tiger Woods”!)

There is now an Apps menu that will allow you to download certain applications (once released): on-demand television channels including You Tube, 4OD, music apps and others. These sit next to the existing Xbox Live apps within each menu – for example 4OD would sit next to Sky TV. This was my wife’s highlight – knowing we can watch on demand video content (or catch up on the things we forget to Sky+) on our big screen without linking up the pc/laptop.

Finally, the update has introduced a cloud saving feature that enables gamers to take their profile to their friends’ houses without configuring memory sticks and carrying them round with you. When choosing a storage device for games you are now spoilt for choice – HDD, console memory, or Cloud.

Microsoft appears to have realised (much later than Sony) that it sits in many living rooms and can provide all round entertainment for a family.  It still appeals to the hard-core gamer, but now has much more to offer everyone. If you (Bradley) only use an Xbox to play games these days – you really need to go Live with it.

You wouldn’t buy a PC just to send emails would you?

Tags:  Microsoft   Windows 8   XBox  

Comments (0)

A* Search Algorithm

Posted:  Nov 22 / 2:47pm by Ruslan Ciurca

Algorithms form an important part of the problem solving process in Artificial Intelligence. One useful algorithm is A* Search, which is an extension of another useful algorithm called Dijkstra's algorithm. This video is a short introduction to the subject.

So, the key part of this algorithm is the evaluation function f(node)=g(node)+h(node) (note that in the video "state" is treated as a "node"), where
- g(node) is the cost from the "Start" node to the current node and
- h(node) is the heuristic (estimated) cost from the current node to the "Goal".
The algorithm works by always expanding the path which has the minimum value of f(node) first (cheapest first). The search terminates when the "Goal" node is chosen for expansion or there are no more nodes to expand.

For the heuristic function to be admissible (or optimistic, which is the same) it is important that for any node h(node) ≤ actual cost from the node to the "Goal". If this is true, then "A* Search" will always find the lowest cost path from the "Start" to the "Goal", this video explains this principle with more details.

As you could mention, the last video operates with another term called "search frontier". It is a "characteristic" of the algorithm or the way it progresses to/with expanding the nodes. For example:
- this article shows how the search frontier looks for the Breadth-First Search algorithm and
- this article shows how the search frontier looks for the Depth-First Search algorithm.

An implementation of this algorithm would use a priority queue, where priority is dictated by the function f.

Now, let's consolidate this material with an exercise:

For heuristic function h and cost of action 10 (per step), find the order (1, 2, 3, ...) of the nodes expansion (the node is expanded when it is removed from queue). Start with "1" at the "Start" state at the top. Indicate the nodes that will never be expanded. Is the heuristic function admissible?

 

First of all the heuristic function is not admissible because h(B5)=20 > 10, where 10 is the actual cost from B5 to the "Goal".

Now, let's start with the expansion. The frontier is at the "Start".

1. f(A1)=10+11=21, f(A2)=18, f(A3)=17, f(A4)=16, f(A5)=20. The frontier moves to A1, A2, A3, A4, A5 (this is the content of the queue).

2. A4 is the node (in the queue) with the minimum f (=16), so it's the second node to be expanded (removed from the queue). f(B4)=10+10+5=25, f(B5)=40. The frontier moves to A1, A2, A3, B4, B5, A5.

3. A3 is the next node with the minimum f (=17), so it's the third node to be expanded. But there is nothing to add to the queue. The frontier now is A1, A2, B4, B5, A5.

4. A2 is the next node with the minimum f (=18), so it's the forth node to be expanded. f(B2)=10+10+3=23, f(B3)=29. The frontier moves to A1, B2, B3, B4, B5, A5.

5. A5 is the next node (again, in the queue) with the minimum f (=20), so it's the fifth node to be expanded. f("Goal")=10+10+0=20. The frontier moves to A1, B2, B3, B4, B5, "Goal".

6. The "Goal" is the next node (from the queue) with the minimum f (=20), so it's the sixth node to be expanded and, because it is the "Goal", it is also the last one to be expanded (i.e. end of the search).

Here is the expansion order: "Start"-1, A4-2, A3-3, A2-4, A5-5, "Goal"-6.

The nodes which were never expanded: A1, B1, B2, B3, B4 and B5.

Comments (1)

Endava monthly innovation session - October '11

Posted:  Oct 31 / 3:46pm by Ben Fraser

We had Ellen vK's Digital Media (DM) Innovation session again recently and it was noticeable not only for the breadth of innovation areas with which we are coming into contact, but also for our Client Service Manager, Martin Smith's staggering lack of command of various basic English phrases (detail below)... 

In the session, Karina talked about some interesting Proof of Concept work we are doing for a big (huge) sports federation. We are building a membership provider service, utilising AppFabric cache, distributed across multiple servers. The team is currently using the PoC to identify the fastest options for the DB and we're looking at Raven DB at this stage. 

Martin talked about the iPhone 4s launch, describing it as a “Damp Squid” (sic). There followed a discussion around phones and tablets and there seemed to be quite a bit of interest in the new(ish) Kindle Fire with it's low price point at $200 – basically meaning that Amazon is selling the device as a loss leader, but pre loaded with a bunch of Amazon stuff on there. They have left out a camera and microphone to keep the cost down.

Ben talked about some innovation we saw this week from Sitecore. They have made a recent aquisition of a company called Pectora - giving them capability to print and create PDFs direct from the CMS. The timing of this is actually perfect for a big pitch we are currently working on but it's potentially a very interesting extension to the "Create once, use everywhere" mantra that really ought to apply to all content, in whatever medium it is published. We could see potential benefit for several other clients in this area. As part of the discussion about how we might use this capability, Martin referred to something as a "Mute point" (sic).

Ellen went through how her Rapid Response Team (RRT) have released a new newsletter module. Based on Sitecore Newsletter module, it can be plugged into any other campaign distributor. It uses queing technology, such as Rhino, and exists as a node inside of Sitecore. The integration allows us to create audiences and test lists in Sitecore and they've even added a little icon to see whether the list has synchronised with Exact Target, who we use for a lot of our distribution. The module uses Sitecore controls to create / add content – meaning there is nothing new for editors to learn over and above Sitecore commands, which is a real step forward for us and our clients. It also takes the API one step further in terms of people updating subscriber lists – registrations go straight to Exact Target.

Bradley has been looking at eCommerce vendors. MPP (Times paywall, Sky TV among others) have got some really cool payments systems which solve potential problems for our clients. 

He also talked about how we have just released a new version of the API for Cadbury – making it even easier for 3rd party agencies to use.

 

 

Comments (0)

SEO keywords information about to disappear

Posted:  Oct 31 / 10:14am by Bradley Howard

Over the next few weeks Google will be changing the user journey and reporting abilities in Google search:

  1.  Users who have signed into Google will now be automatically redirected to https://www.google.com (i.e. the SSL version)
  2. When you search from https://www.google.com, websites you visit from our organic search listings will still know that you came from Google, but won't receive information about each individual query

This means that in almost all analytics products will stop being able to report on Search Terms from personalised (i.e. logged in) Google results. My feelings are that this will change because SEO will have become ‘blinded’ by this Google modification.

The full article is here: http://googleblog.blogspot.com/2011/10/making-search-more-secure.html

However, if you register your site with Google Webmaster tools, you'll see the top 1,000 search terms in there, and if you use Google Analytics, you'll see the same results as Wemaster tools.

Quite how Google will be allowed to get away with this 'closed shop', or dare to call it a monopoly, we will have to wait and see, so I still think this will change.

 

Comments (0)

How many defects are there?

Posted:  Oct 28 / 10:14am by Ruslan Ciurca

This week I was asked if there are any statistics there showing
- the number of defects (on average) produced during a software elaboration project or
- the number of defects produced versus the number of lines of code per programming language or technological stack (C/C++, Java, .NET or PHP for example)?

I answered that I haven't seen anything publically available. Any company, publishing such statistics, could damage its reputation. However, internally, any company should collect these statistics for the risk management purpose.

Still, we can apply maths to do some estimation, can't we? For example, let's assume the following:

1. A project consists of one or few iterations.

2. Ideally, code from each iteration is deployed with 0 defects. As a result, we consider what was fixed during the iteration(s). We also assume that what was deployed with the previous iterations is free of defects in the current one.

3. Most of the trivial defects are spotted during the compilation or build process (far before the test team gets engaged) as a result, we count the defects spotted during the unit tests execution. For now we ignore the defects spotted by the test team as this complicates the model :)

4. The unit tests are free of defects :)

Going further:

5. The X-th iteration delivers N new units.

6. Each unit must have at least 2 Unit Tests, for Expected Pass and Expected Failure cases.

7. As a result the X-th iteration has 2⋅N Unit Tests.

8. The probability of a single Unit Test to fail is 1⁄2 (Unexpected Pass or Unexpected Failure).

9. The iteration can have from 0 to 2⋅N defects as a result. The probability that the number of defects is m (0≤m≤2⋅N) is

10. The mean value or the average number of the defects is 2⋅N⋅1⁄2 = N.

So, N units with N defects or roughly 1 defect per unit.

Few words about the maths used. It is the binomial distribution where p=1⁄2 and the mean value is E(X) =∑m⋅P(m) = n⋅p, where n=2⋅N.

This formula also tells us that if we reduce the probability for a Unit Test to fail (p<1⁄2), then we will also reduce the number of defects. Sounds logic, doesn't?

Tags:  analytics   defects   mathematics   testing  

Comments (54)

HTML5 Video – Ready for production?

Posted:  Oct 6 / 10:00am by Rob Phillips

Blog_image

Introduction

It’s been looming on the horizon for a few years now, championed by some, criticised by others, but is HTML5 video worth jumping through the extra hoops for? I think so.

The competition

Currently, the majority of multimedia delivered to the user’s browser is done using Adobe’s flash plugin. This method benefits from massive coverage[1] of the flash plugin in the traditional desktop market, but it also has its drawbacks: performance and usability have both been questioned. Apple have been openly critical of flash, with their incredibly popular portfolio of mobile devices not supporting it, instead they advocate the use of html5. I won’t detail the cons of flash, Steve Jobs does a good enough job of that for me.

Enter video

HTML5 gives us an open standard for an alternative way to delivery multimedia, using its native semantic element and associated API.

This method is by no means perfect, but I think it’s the future and worth spending the extra time to implement. The modern user now expects a complete experience across a whole spectrum of devices, and with html5 video there is now no excuse not to.

The Pros:

  • Supported by the latest versions of major and mobile desktop browsers
  • Previous uses of object and embed are ugly
  • No reliance on 3rd party plugin (and therefore no need for a specialist developer)
  • Ability to provide multiple sources, delivering different encodings to different browsers
  • Easy to use API
  • Uses fewer resources in most cases (looking at you CPU!)

About that API

As a front end developer by trade, I can’t help but get a bit excited about possibilities the MediaElement API affords people like me to help create a more elegant and integrated multimedia solution. I won’t bore you with all the details, but the API provides all the methods and events, to do pretty much anything you want with html and JavaScript, all styled with CSS. Don’t be fooled into thinking you will be limited to a basic result, there are examples about which speak for themselves.

Those Hoops…

It’s difficult to get a solution which covers all your bases without any issues, html5 video certainly isn’t with its pitfalls:

  • Codec support – Because the current HTML5 draft does not specify a codec to use, you will find that each browser vendor typically does not have complete support for all. (My colleague Akin Olusoga already covered this side of the headache )
  • Browser support – The grim reality is that you are probably still going to have to have a flash fall-back, as Internet Explorer 8 and earlier have virtually no support for HTML5 video (Don’t forget about chrome frame though). Having said that all latest releases from each desktop vendor do support it, and many smartphone browsers have supported it from the beginning.

Conclusion

If you are forward thinking, and want to provide a great experience for your users, no matter what device they are using then you are going to want to be using HTML5 video. The waters regarding codecs might be a bit murky, but it’s worth the effort to export a few extra video encodings from Premiere. Add on the performance enhancements over the completion, and the ability to theme custom controls and you have all the reasons why HTML5 Video is something you should be looking into.

References

  1. Adobe Flash player market penetration - http://www.adobe.com/products/player_census/flashplayer/version_penetration.html

HTML5 Video in the wild

  1. http://www.youtube.com/html5
  2. http://www.dailymotion.com/html5
  3. http://www.apple.com/html5/showcase/video/
Tags:  HTML5   development  

Comments (0)