Feature Engineering in Computer Vision systems (with sample Matlab code)
In computer vision systems ,processing and analysing an image can be depicted by what is known as the computer vision pipeline shown below
The challenge of the design of CV system is in the effectiveness of the feature extraction (feature engineering). In CV, a feature is a measurable piece of data in your image that is unique to that specific object. It may be a distinct color or a specific shape such as a line, edge, or image segment. A good feature is used to distinguish objects from one another.
What makes a good feature for object recognition?
A good feature will help us recognize an object in all the ways it may appear. Characteristics of a good feature are as follows:
- Identifiable
- Easily tracked and compared
- Consistent across different scales, lighting conditions, and viewing angles
- Still visible in noisy images or when only part of an object is visible
Feature extraction (engineering) can take place in 2 forms, the traditional method versus the Deep learning form as shown in Fig. 2 below
In traditional ML problems, for manual feature selection and engineering, reliance is on our domain knowledge (or partner with domain experts) to create features that make ML algorithms work better. We then feed the produced features to a classifier like a support vector machine (SVM) or AdaBoost to predict the output. A feature descriptor is a representation of an image or an image patch that simplifies the image by extracting useful information and throwing away extraneous information. Typically, a feature descriptor converts an image of size width x height x 3 (channels ) to a feature vector / array of length n
For this article some of the handcrafted features set are considered as listed below and their implementation using code in Matlab is shown. To compare their symbolism when processed by the respective algorithms, the same image is used, in this case the patron saint of JPEGS Lena Forsen
List of Handcrafted Features surveyed
- Histogram of oriented gradients (HOG)
- Haar Cascades
- Scale-invariant feature transform (SIFT)
- Speeded-Up Robust Feature (SURF)
Code and output images from feature extraction
- Histogram of oriented gradients (HOG)
The HOG descriptor focuses on the structure or the shape of an object. While in the case of edge features, we only identify if the pixel is an edge or not. HOG is able to provide the edge direction as well. This is done by extracting the gradient and orientation (or you can say magnitude and direction) of the edges.
>> img = imread('Lena_test_img.jpg');
[featureVector,hogVisualization] = extractHOGFeatures(img);
figure;
imshow(img);
hold on;
plot(hogVisualization);
3. Scale-invariant feature transform (SIFT)
SIFT helps locate the local features in an image, commonly known as the ‘keypoints‘ of the image. These keypoints are scale & rotation invariant that can be used for various computer vision applications, like image matching, object detection, scene detection. We can also use the keypoints generated using SIFT as features for the image during model training. The major advantage of SIFT features, over edge features or HOG features, is that they are not affected by the size or orientation of the image.
For consistency, the code below is for Matlab and can be copied and pasted to observe the SIFT keypoints in an image. Notice how the keypoints border around the edges.
%code credits Hossein Kami
%this software computes the keypoints in SIFT algorithm and save the rsult%in extrema variable% I--> the input image should be a color(rgb) imagefunction exterma=key_points(I)I=imread('lenna.jpg');I=double(rgb2gray(I));I=I/max(max(I)); % image should be in [0 1][M,N,C] = size(I) ;% Lowe's choicesS=3 ;omin=-1 ; % first octave -1 mmeans I should be doublsized for first octaveO=floor(log2(min(M,N)))-omin-4 ; % Up to 16x16 images% we can decrease the 1.6 for some images to reach better esultssigma0=1.6*2^(1/S) ; % this sigma is for the image in real size% and 1.6 is for doublsized image in first octavesigman=0.5 ;% antialiasing filter sigmathresh = 0.006;% it was .03 in Lowe's paper ,we changed it to .006 herer = 10 ;%calculate Guassian images in different scales and octaveGS = gaussianss(I,O,S,omin,-1,S+1,sigma0) ;%*************************************************************%calculate DOG imagesfor o=1:GS.O %all the octaves[M,N,SS] = size(GS.octave{o}) ;DOG.octave{o} = zeros(M,N,SS-1) ;for s=1:SS-1DOG.octave{o}(:,:,s) = ...GS.octave{o}(:,:,s+1) - GS.octave{o}(:,:,s) ;endend%********************************************************%******** imshow DOG images you can comment this*******% for o=1:GS.O %all the octaves% for s=1:SS-1% imshow(DOG.octave{o}(:,:,s),[])% pause(1)%% end% end%*************************************************************%finding key pointsexterma=zeros(2,1);for o=1:GS.Ofor s=2:SS-2sig=1.6*2^(o-1)*(2^(1/S))^s;current_DOG=DOG.octave{o}(:,:,s);down_DOG=DOG.octave{o}(:,:,s-1);up_DOG=DOG.octave{o}(:,:,s+1);extr = search_exterm(up_DOG,down_DOG,current_DOG ) ;%find exremumif extr(1,1)%accurate localization to subpixel and eliminate some unsuitable%pointsextr=localize_eliminate(extr,up_DOG,down_DOG,current_DOG ,thresh,r);if extr(1,1)extr=2^(o-1+GS.omin) *extr; %stor key pointsexterma=[exterma extr];endendendendimshow(I,[])hold onplot(exterma(2,:),exterma(1,:),'r+','LineWidth',2)end%**************************************************************************function SS = gaussianss(I,O,S,omin,smin,smax,sigma0)%smax--> maximum scale(here it is 4)%smin--> minimum scale(here is -1...image shold be double sized)%omin--> first octave(here is -1)%1.6 as sigma0 is considered for omin=-1% Scale multiplicative stepk = 2^(1/S) ;dsigma0 = sigma0 * sqrt(1 - 1/k^2) ; % Scale step factorsigman = 0.5 ; % Nominal smoothing of the image% Scale space structureSS.O = O ;SS.S = S ;SS.sigma0 = sigma0 ;SS.omin = omin ;SS.smin = smin ;SS.smax = smax ;% If mino < 0, multiply the size of the image.% (The rest of the code is consistent with this.)if omin < 0for o=1:-ominI = doubleSize(I) ;endelseif omin > 0for o=1:ominI = halveSize(I) ;endend[M,N] = size(I) ;% Index offsetso = -smin+1 ;% --------------------------------------------------------------------% First octave% --------------------------------------------------------------------%% The first level of the first octave has scale index (o,s) =% (omin,smin) and scale coordinate%% sigma(omin,smin) = sigma0 2^omin k^smin%% The input image I is at nominal scale sigman. Thus in order to get% the first level of the pyramid we need to apply a smoothing of%% sqrt( (sigma0 2^omin k^smin)^2 - sigman^2 ).%% As we have pre-scaled the image omin octaves (up or down,% depending on the sign of omin), we need to correct this value% by dividing by 2^omin, getting%e% sqrt( (sigma0 k^smin)^2 - (sigman/2^omin)^2 )%if(sigma0 * 2^omin * k^smin < sigman)warning('The nominal smoothing exceeds the lowest level of the scale space.') ;endSS.octave{1} = zeros(M,N,smax-smin+1) ;% we have 6 scale in each octaveSS.octave{1}(:,:,1) = gauss_filter(I,sqrt((sigma0*k^smin)^2 ...- (sigman/2^omin)^2));%HOSSEIN imsmooth(I, ...%sqrt((sigma0*k^smin)^2 - (sigman/2^omin)^2)) ;% sigman and sigma0 applyed simutnously%Not that sigma0 for first octave(double sized image state) is%1.6-->1.6*k*k^smin and smin=-1for s=smin+1:smax% Here we go from (omin,s-1) to (omin,s). The extra smoothing% standard deviation is%% (sigma0 2^omin 2^(s/S) )^2 - (simga0 2^omin 2^(s/S-1/S) )^2%% Aftred dividing by 2^omin (to take into account the fact% that the image has been pre-scaled omin octaves), the% standard deviation of the smoothing kernel is%% dsigma = sigma0 k^s sqrt(1-1/k^2)%dsigma = k^s * dsigma0 ;% smooth Image in prevous scale and just use dsigmaSS.octave{1}(:,:,s +so) =gauss_filter...(squeeze(SS.octave{1}(:,:,s-1 +so)), dsigma);%HOSSEIN imsmooth(squeeze(SS.octave{1}(:,:,s-1 +so)), dsigma ) ;end% --------------------------------------------------------------------% Other octaves% --------------------------------------------------------------------for o=2:O% We need to initialize the first level of octave (o,smin) from% the closest possible level of the previous octave. A level (o,s)% in this octave corrsponds to the level (o-1,s+S) in the previous% octave. In particular, the level (o,smin) correspnds to% (o-1,smin+S). However (o-1,smin+S) might not be among the levels% (o-1,smin), ..., (o-1,smax) that we have previously computed.% The closest pick is%% / smin+S if smin+S <= smax % for me it is% statisfied all the time% (o-1,sbest) , sbest = |% \ smax if smin+S > smax%% The amount of extra smoothing we need to apply is then given by%% ( sigma0 2^o 2^(smin/S) )^2 - ( sigma0 2^o 2^(sbest/S - 1) )^2%% As usual, we divide by 2^o to cancel out the effect of the% downsampling and we get%% ( sigma 0 k^smin )^2 - ( sigma0 2^o k^(sbest - S) )^2%sbest = min(smin + S, smax) ;TMP = halveSize(squeeze(SS.octave{o-1}(:,:,sbest+so))) ;target_sigma = sigma0 * k^smin ;prev_sigma = sigma0 * k^(sbest - S) ;if (target_sigma > prev_sigma)TMP =gauss_filter(TMP, sqrt(target_sigma^2 - prev_sigma^2));end[M,N] = size(TMP) ;SS.octave{o} = zeros(M,N,smax-smin+1) ;SS.octave{o}(:,:,1) = TMP ;for s=smin+1:smax% The other levels are determined as above for the first octave.dsigma = k^s * dsigma0 ;SS.octave{o}(:,:,s +so) =gauss_filter(squeeze(SS.octave{o}...(:,:,s-1 +so)), dsigma);endendend% -------------------------------------------------------------------------% Auxiliary functions% -------------------------------------------------------------------------function J = doubleSize(I)[M,N]=size(I) ;J = zeros(2*M,2*N) ;J(1:2:end,1:2:end) = I ;J(2:2:end-1,2:2:end-1) = ...0.25*I(1:end-1,1:end-1) + ...0.25*I(2:end,1:end-1) + ...0.25*I(1:end-1,2:end) + ...0.25*I(2:end,2:end) ;J(2:2:end-1,1:2:end) = ...0.5*I(1:end-1,:) + ...0.5*I(2:end,:) ;J(1:2:end,2:2:end-1) = ...0.5*I(:,1:end-1) + ...0.5*I(:,2:end) ;endfunction J = halveSize(I)J=I(1:2:end,1:2:end) ;end%*************************************************************************%**************************************************************************function im=gauss_filter(image,sigma)G = fspecial('gaussian',[5 5],sigma);im=imfilter(image,G,'same');end%**************************************************************************%**************************************************************************function [points2]=localize_eliminate(points,up,down,curr,thr,r)points2=zeros(2,1);t=1;for i=1:size(points,2)x=points(1,i);y=points(2,i);fxx= curr(x-1,y)+curr(x+1,y)-2*curr(x,y); % double derivate in x directionfyy= curr(x,y-1)+curr(x,y+1)-2*curr(x,y); % double derivate in y directionfsigmasigma=up(x,y)+down(x,y)-2*curr(x,y); % double derivate in sigma directionfxsigma=((up(x+1,y)-down(x+1,y))-(up(x-1,y)-down(x-1,y)))/4;%derivate in x and sigma directionfysigma=((up(x,y+1)-down(x,y+1))-(up(x,y-1)-down(x,y-1)))/4;%derivate in y and sigma directionfxy= curr(x-1,y-1)+curr(x+1,y+1)-curr(x-1,y+1)-curr(x+1,y-1); %derivate inx and y directionfx=curr(x,y)-curr(x-1,y);%derivate in x directionfy=curr(x,y)-curr(x,y-1);%derivate in y directionfsigma=(up(x,y)-down(x,y))/2;%derivate in sigma direction%localization using Teilor seriA=[fsigmasigma fxsigma fysigma;fxsigma fxx fxy;fysigma fxy fyy];X=-inv(A)*([fsigma fx fy]');x_hat=X(2);y_hat=X(3);if abs(x_hat)<4 && abs(y_hat)<4 %ignor the ofsets > 4px=round(x+x_hat);py=round(y+y_hat);elsepx=x;py=y;%[px py]endD_hat=curr(px,py)+([fsigma fx fy]*X)/2;if abs(D_hat)>thr%% filter some low contrast pointsif (fxx+fyy)^2/(fxx*fyy-fxy^2)<(r+1)^2/r % remove edge pointspoints2(1,t)=px;points2(2,t)=py;t=t+1;endendendend%**************************************************************************%**************************************************************************function indx=search_exterm(up,down,im)[m n]=size(im);t=1;thr=.004;indx=[0;0];for i=2:m-1for j=2:n-1if im(i,j)> thrwindow(1:3,1:3)=down(i-1:i+1,j-1:j+1);window(4:6,1:3)=im(i-1:i+1,j-1:j+1);window(7:9,1:3)=up(i-1:i+1,j-1:j+1);window(5,2)=-100;if im(i,j)>max(max(window))indx(:,t)=[i j]';t=t+1;endendif im(i,j)<-thrwindow(1:3,1:3)=down(i-1:i+1,j-1:j+1);window(4:6,1:3)=im(i-1:i+1,j-1:j+1);window(7:9,1:3)=up(i-1:i+1,j-1:j+1);window(5,2)=100;if im(i,j)<min(min(window))indx(:,t)=[i j]';t=t+1;endendendendend%**************************************************************************