This software is Copyright © 2007 The Regents of the University of California.
All Rights Reserved. Permission to use, copy, modify, and distribute this 
software and its documentation for educational, research and non-profit purposes, without fee, and without a written agreement is hereby granted, provided that the above copyright notice, this paragraph and the following four paragraphs  appear in all copies. Permission to incorporate this software into commercial products may be obtained by contacting:
 
Technology Transfer Office 9500 Gilman Drive, Mail Code 0910 University of 
California La Jolla, CA 92093-0910 (858) 534-5815
invent@ucsd.edu
 
This software and documentation are copyrighted by The Regents of the 
University of California. The software and documentation are supplied "as is",
without any accompanying services from The Regents.
 
IN NO EVENT SHALL THE UNIVERSITY OF CALIFORNIA BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THIS DATABASE AND ITS DOCUMENTATION, EVEN IF THE UNIVERSITY OF CALIFORNIA HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
THE UNIVERSITY OF CALIFORNIA SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING,
BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. THE DATABASE PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND
THE UNIVERSITY OF CALIFORNIA HAS NO OBLIGATIONS TO PROVIDE MAINTENANCE, SUPPORT,
UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
 
This software was created by Vincent Rabaud. If you find any bug, please contact
vrabaud-at-cs-dot-ucsd-dot-edu. If you use this code or software in a project or publication, we’d like to know. Please contact Vincent Rabaud as well as Amy Alexander ajalexander-at-ucsd-dot-edu.
 
 
============================================================
 
SVEN CV 
 
============================================================
 
Release Date - 22 Oct 07
 
 
SVEN CV is a software application for Real Time Tracking of Pedestrians, using OpenCV. Its robust build has features for tracking several individuals with all kinds of foreground and background occlusions. It also provides real-time subjective features such as face detection, hair color etc and transfers the data including the coordinates
of the person's outline (matte) to a specified IP or localhost via the UDP port. It supports inputs from both live camera and AVI files.
 
SVEN CV was developed as part of the SVEN public space digital media art project (http://deprogramming.us/sven).* As such, the features and interface were designed for the specific needs of that project. We are releasing SVEN CV and its source code in hopes that parts of it can be useful to others.  
 
 
I. Hardware and software requirements
----------------------------------
 
SVEN CV runs real time (30 fps) at 320x240 on a Pentium 4 with 256 MB of RAM. (Real time at 640x480 has been achieved using an Intel Core 2 Duo processor with 1 GB RAM.) SVEN CV requires minimal hard disk space of 4 MB. It requires a firewire or USB camera or digitizer to run in camera mode. 
 
SVEN requires Windows XP with Service Pack 2. It has not been tested on Vista, and the current version is likely to not be fully functional with it. (Please contact the authors if you are trying to run it under Vista.)
 
 
II. Compilation instructions 
(for programmers - not needed for users just downloading the application)
------------------------
 
Install OpenCV from http://sourceforge.net/projects/opencvlibrary/
 
Put the project files included in "VC++Project" in some folder. Then, create a "src" folder in it. Add the sources to it.
 
The file "mainSven.cpp" is here for robustness in case it has to work in its original setup (with two computers, the second running MAX/MSP). If you just want to use SVEN to do tracking, you should ignore it.
 
Also, if you want to use this code for computer vision, you can safely ignore the udp files and you should be aware than Bramble tracking is also implemented (hence all the particle filtering files).
 
Compile by linking to openCV and that should work !
 
 
 
III. Installation/Execution instructions
-----------------------------------
 
Install OpenCV from http://sourceforge.net/projects/opencvlibrary
Be sure to say yes when the OpenCV installer asks if you want to add its bin directory to your path.
 
Unzip the file and run the file sven.exe from the command prompt using the following syntax:
 
sven.exe <mode> <ip>
 
Command line parameters explained: 
<mode> 
Can be camera, imageSeq, or the name of an avi file.
If mode is the word "camera", then SVEN will look for a video capture device.
If mode is the word "imageSeq", then SVEN can process an image sequence and it expects the following parameters after:
      The beginning of the file name, including the path.
      The starting number of the image sequence
      The ending number of the image sequence
      The length of the number in the image sequence
      The end of the file name
      
      E.g: sven.exe imageSeq ./path_to_folder/image 0 1000 4 .png to process all the images from "image0000.png" to "image1000.png" in the folder "./path_to_folder/"
      
Else, SVEN CV assumes that a video file name is given. 
 
ip: the IP address of the computer where the data (in a format explained later) will be sent to (default: Localhost)
 
Image resolution: 
 
You may need to change the default capture resolution. In the “data” folder, find a file called “useFireI.txt.”  For most cameras (using Windows generic capture device) set 320 on the top line of this file for 320x240 resolution; 640 if you want 640x480 resolution. However, if you are using a Unibrain Fire-I camera with Unibrain’s drivers (which you’ll need for best performance with that camera), then make this line “64” to choose 640x480 resolution; “32” for 320x240. Inelegant, but it will work!  
 
To quit: press ‘q’ (with the SVEN CV window selected).
 
 
IV. Documentation
 
A. Camera and lighting setup: 
 
 Although SVEN CV is designed for use in public spaces where you may not have control over everything, as with any computer vision application, camera setup and lighting are very important. 
 
Set up the camera using manual exposure, black level, gain, etc.  Using “auto” features will confuse any computer vision software, including SVEN CV. You can often get away with auto white balance, depending on the camera.  If you are forced to accommodate significant lighting changes while SVEN CV is running, an auto iris *lens* usually will work out better than auto exposure electronic features in the camera. However, using both manual iris lens and manual exposure features is ideal.
 
Try to keep the lighting as flat as possible. Try to find a location without harsh shadows; this will improve the tracking performance immensely. Fairly constant lighting is also preferred – while SVEN CV will adapt to gradual lighting changes, more drastic changes will impede performance. This is especially true when running SVEN CV without an operator.
 
 
B. Using SVEN CV:
 
SVEN CV launches from the command line, and as such, can be run with or without an operator present. In the event SVEN CV should bogs down or crash, it has a built-in wrapper application that will restart it after 5 seconds in limbo.  The documentation below mentions some keypress options that can be used if an operator is present.
 
 
1. Launching the software and letting it “build” the background. 
 
sven.exe <mode> <ip>
(See Installation/Execution Instructions above.)
 
 SVEN CV needs an empty 4 second shot (assuming 15fps) for learning the background model; it then starts tracking using the background model. SVEN CV continuously updates the background model. To force a relearn of the model at 
any time, press the 'b' key. 
 
In other words, SVEN CV will “learn” the background in the first four seconds after it’s launched. So there should ideally be no people in the frame during the first four seconds. Since SVEN is designed to work in public spaces, it will continuously analyze the background while it’s running – so it will gradually adjust to subtle changes in lighting, objects moved into or out of the background, etc. However, for more drastic changes to the background or lighting, pressing the “b” key on an empty shot will force SVEN CV to immediately relearn the background.
 
2.    The onscreen display.
 
 SVEN CV is designed to output UDP data to the network, to be used by a second application (the authors use it to send data to Max/MSP.) However, SVEN CV also 
displays what it is doing onscreen. There are a few display modes, which you can select using the following keys:
 
 
      
o: toggles between multi-person and green-screen modes.
       
      Multi-Person mode: A semi green screen effect on the video with all the tracked subjects highlighted (not green) with all the relevent details marked. This is the default. The semi-green effect appears after the background is learned (see #1.) 
 
      Green-screen mode: Same as multi-person mode, but the background is solid green. May be useful if outputting to an application that uses chroma key. Most useful in conjunction with the “d” key.
 
d: cycles between three different modes which turn the display details (boxes, numbers – see below) on and off as well as the green effect. This is easier to understand by running the app and hitting the “d” key several times to see what it does. 
 
Details: 
 
      A red box around the Head
 
      White box around HeadShoulder
 
      Blue Box around Torso
 
      Gray Box around Total subject.
 
Identity of the subject on top left of the Gray box - in blue if the subject is active, orange if  inactive
 
      Haircolor on Bottom Left with the Identity
 
      Torso Color in similar sequence to hair color. 
 
 
Using “d” and “o” keys in combination you can get several different display effects, but the network output will remain unaffected. 
 
 
3.  Selecting a subject to track.
 
Regardless of display mode, a single subject can be selected using the number keys. (Select number 1, 2, or 3 based on the number labels on the display.) That person’s data is sent to the <IP> via UDP port using the syntax below. Note that the default is #1. So in a situation without an operator, the system will always send data for whichever subject is assigned #1. If you want the system to 
select any person automatically in cases where there is no person assigned #1, then select #0 after launching SVEN (“automatic” pedestrian selection.) This is mostly useful only in situations where there is very sparse pedestrian traffic. 
 
4. Other keys:
 
t, +, and -: 
Useful for adjusting background tolerance. Not normally necessary. Experiment with them and see further info in the onscreen help (press ‘h’) if you want to see what they do. 
 
f: Face analysis disable/enable. In case the facial expression analysis causes you any problem, you can turn it off. Not normally necessary. Default is enabled. 
 
e: Force SVEN CV to send a pretend facial expression across the network; the “e” key cycles through all the different expressions. Useful only for testing, so you don’t have to make faces at the camera for hours.  Otherwise, leave it off, so it sends whatever expressions it detects (default.) 
 
c: Crash SVEN CV. Obviously, for testing only (or if you have a bad day!) Used for testing SVEN CV’s ability to restart after a crash using its built-in wrapper thread. In real-world crashes, SVEN CV tends to restart more quickly and smoothly, as a real freeze won’t normally leave dialog boxes onscreen.
 
q: Quit SVEN CV.
 
All the keys (except c) are explained in the help. Press 'h' in SVEN CV to toggle the help.
 
 
5. Interpreting the data that is output to the network. 
 
The data is output via UDP in three separate packets, using the format described in the next paragraph. Letters such as “z”, “s”, etc., function as headers, separating the types of data in each packet. Note that the software assumes it is seeing a full length view (or close to it) of the person. So if, for example, people are in frame only down to their waist, the software will be confused as to where the head, shirt and pants are (although it should still detect facial expressions.)
 
Packet1:
z [frame#] [#blobs] [change in # blobs][selected blob] [change in selected blob]
 
s (rectangles framing parts of body): [total1x total1y total2x total2y] [torso1x torso1y torso2x torso2y] [head1x head1y head2x head2y] [head+shoulders1x head+shoulders1y head+shoulders2x head+shoulders2y] 
c (color of hair, face, shirt and pants): [hairColorR hairColorG hairColorB] [faceColorR faceColorG faceColorB] [torsoColorR torsoColorG torsoColorB] [pantsColorR pantsColorG pantsColorB]
 
d (direction the person is moving): [directionVectorX(between -1 and 1) 
directionVectorY(between -1 and 1)]
 
e [# of the facial expression] – see section on Facial Expressions below.
 
b [baldness (0=no, 1=yes)] (experimental implementation – doesn’t work well)
 
g [sunglasses (0=no, 1=yes)] (experimental implementation – works a bit better than baldness but still not very reliable)
 
 
p (useful for parsing packets 2 and 3) [# of points in the matte outline of the person]
 
Packets 2 and 3 contain the x/y coordinates of the points in the matte outline of the person. Because the outline is an irregular shape it requires a large number of points. Some receiving applications can’t handle this many items in one packet, so it’s split into two separate packets: one for x coordinates, the other for y. The coordinates are relative to the width of the picture and multiplied by 1000: 
 
 
Packet 2:
x [all x points in the matte: [x] [x]]
 
 
Packet 3:
y [all y points in the matte: [y] [y]]
 
 
 
The subjective data (colors, expressions, rectangles) and number of points will be zero when there are occlusions or the frame is empty or the selected subject is not there. 
 
 
Facial Expressions: 
The expression numbers (see “e” above in Packet 1) are as follows. As SVEN CV is geared to the SVEN project, these expressions are based on common expressions we found on rock stars in music videos (but the algorithms were trained on “real people.”) Some of the expressions are quite similar, so considerable overlap and error are to be expected, especially in cases where expressions are quite close (e.g. #1 and #2.):
0 – none (no expression is being sent. 
1 - serious_normal (a fairly “straight face,” a serious expression.) 
2 - serious_focused (a serious expression but looking more intent.)
3 - serious_pout (a pout)
4 - happy_closed (happy with lips closed)
5 - happy_teeth (happy with teeth showing)
6 - happy_open  (happy with lips open)
7 - sunsquint (squinting, as though looking toward the sun or a bright light)
8 - intense_pain (that pained grimace stereotypically associated with rock stars on stage.) 
9 – speaking (person looks as though they might be speaking or singing. Since we’re only analyzing a one frame moment, they may not actually be speaking.)
 
 
6. Heuristics for what SVEN CV tracks.
 
When SVEN CV detects an object (not part of the background) it makes some assumptions about whether or not it is likely to be a person. So objects that appear very large or very small in the frame are likely to be ignored, as will objects at the very edges of the frame or objects that don’t appear to be shaped approximately like a person. Objects that don’t move for awhile will eventually be interpreted as part of the background. If a stationary object is removed from the background, the now vacant space will initially be interpreted as a person, but will eventually be understood as background.
 
V. Feature Suggestions, Bug Reports, Questions, and even Flattering Comments to
-----------------------------------------------------------------------------
 
Vincent Rabaud vrabaud-at-cs-dot-ucsd-dot-edu (Questions and comments on the source code and algorithms)
 
Amy Alexander ajalexander-at-ucsd-dot-edu (Questions and comments on features, usage, hardware and documentation)
 
If in doubt – mail us both.
 
------------------------------------------------------------------------------
 
* SVEN CV was developed as part of the SVEN public space digital media art project (http://deprogramming.us/sven) by Amy Alexander, Vincent Rabaud, and Wojciech Kosma with additional contributions by Jesse Gilbert, Nikhil Rasiwasia, and Marilia Maschion.
 
 