Digital photos recovery
Un article de Page Personnelle de Cédric Blancher, l'encyclopéde libre.
(http://sid.rstack.org/index.php/R%E9cup%E9ration_de_photos_num%E9riques) Version française
How I managed to recover 1.2GB of accidentally deleted pictures with 98% success on an EXT3 file system...
| Sommaire |
Disclaimer
This article details deleted data (digital photos for instance) recovery from an EXT3 filesystem on a Linux partition and its specific aspects. Some other filesystems, such as FAT32 and NTFS in particular do not have theses specifics, and this make data recovery much more simple. There are numerous tools, commercial or not, to achieve data recovery on them. I particularly advise photorec (http://www.cgsecurity.org/index.html?photorec.html), which will be described later in this article.
The beginning : the big mistake
Planning a imminent DVD burning, I moved 1.2GB of vacation pictures in /tmp. Silly idea, isn't it ? But at the time, it was the only partition with enough space left for this amount of data, /var/tmp being quite busy. Then, I went to another task, then another, left my computer for cooking, finally completly lost what I was doing and switched off the computer. And the day after, after the boot, paf...
/tmp being cleaned at startup, everything was gone.
The situation
- First look : there's nothing left of my burn directory containing all the photos in /tmp
- First action : dump an image of the partition on the filesystem so I work at calm
~# dd if=/dev/hda13 of=tmp.img bs=4096
- First idea : see what I can do with usual EXT2 recovery technics and tools
First moves with debugfs
Let's go !
~# debugfs tmp.img debugfs 1.38 (30-Jun-2005) debugfs: lsdel Inode Owner Mode Size Blocks Time deleted 0 deleted inodes found.
Hummm, it's bad.
debugfs: ls -d [...] 140833 (3716) sva6l.tmp <172129> (40) .exchange-sid <219073> (16) v858323 <140833> (32) burn <78244> (20) ati_blank <172130> (20) orbit-root <62596> (3604) gconfd-root [...]
Better. We can see burn directory and its deleted inode. Let's go further.
debugfs: imap <140833>
Inode 140833 is part of block group 9
located at block 294916, offset 0x0000
debugfs: cd <140833>
debugfs: ls
140833 (12) . 2 (4084) ..
debugfs: lsdel
Inode Owner Mode Size Blocks Time deleted
0 deleted inodes found.
debugfs: ls -d
140833 (12) . 2 (4084) ..
Dead end... There's nothing inside the directory I can find with debugfs : no link, no unallocated inode...
EXT3 vs. EXT2
It's Googling time. I quickly learn that, although being very similar, EXT3 is not just simply a journalized EXT2. Among the differences, there's deletion. When EXT2 only marque file structures (directory entry, inode and blocs) as unallocated, EXT3 adds a pass. It cleans file size and links to data blocs within the inode. So, there's no way to find data associated to a given deleted inode...
This explains why I can find burn directory inode using its unallocated directory entry in /tmp, but cannot find anything inside it, because all links to blocs refering to files and subdirectories are lost. Thus, EXT2 data recovery tools such as e2undel (http://e2undel.sourceforge.net/) or recover (http://recover.sourceforge.net/linux/recover/) won't simply work.
During my search, I could find a very good article (from which I got above pictures) on the subject in Linux World Magazine (http://linux.sys-con.com/read/117909.htm).
Starting to work
For the date I hope to recover are 99,9% JPEG files and some MPEG files, I will use pattern search in order to find data blocs. Finding start and end markers for each file, I should be able to bring them back to life.
Assumptions
My /tmp partition is nearly never used beyond 200MB. This makes me quite confident in the fact most data have been written sequentially on the disk without fragmentation.
JPEG pictures
A JPEG picture file has a start marker (SOI, Start Of Image), which is 0xFFD8, and an end marker (EOI, End of image), which is 0xFFD9. One can find more detailled information on the web, here as an example (http://www.media.mit.edu/pia/Research/deepview/exif.html).
Using hexedit, I could find some pictures quite easily.
~# hexedit tmp.img [...] 023DAFF8 00 00 00 00 00 00 00 00 FF D8 FF E1 18 F9 45 78 69 66 00 00 49 49 2A 00 ..............Exif..II*. 023DB010 08 00 00 00 0B 00 0E 01 02 00 20 00 00 00 92 00 00 00 0F 01 02 00 05 00 .......... ............. 023DB028 00 00 B2 00 00 00 10 01 02 00 08 00 00 00 B8 00 00 00 12 01 03 00 01 00 ........................ 023DB040 00 00 01 00 00 00 1A 01 05 00 01 00 00 00 C0 00 00 00 1B 01 05 00 01 00 ........................ 023DB058 00 00 C8 00 00 00 28 01 03 00 01 00 00 00 02 00 00 00 32 01 02 00 14 00 ......(...........2..... 023DB070 00 00 D0 00 00 00 13 02 03 00 01 00 00 00 02 00 00 00 69 87 04 00 01 00 ..................i..... 023DB088 00 00 00 01 00 00 A5 C4 07 00 1C 00 00 00 E4 00 00 00 A0 08 00 00 20 20 ...................... 023DB0A0 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 023DB0B8 20 20 20 20 20 00 53 4F 4E 59 00 00 44 53 43 2D 50 31 30 00 48 00 00 00 .SONY..DSC-P10.H... 023DB0D0 01 00 00 00 48 00 00 00 01 00 00 00 32 30 30 35 3A 30 38 3A 30 35 20 30 ....H.......2005:08:05 0 023DB0E8 39 3A 33 32 3A 30 32 00 50 72 69 6E 74 49 4D 00 30 32 35 30 00 00 02 00 9:32:02.PrintIM.0250.... [...] 023DC870 53 7C 96 24 00 33 DB D2 A9 BB C5 21 5D 8D D9 9E B8 A4 6C 05 38 E7 E9 4A S|.$.3.....!].....l.8..J 023DC888 DD 09 EB 72 26 20 A8 F6 A6 ED 00 F4 19 A1 AD 4A 52 7C B6 1E B2 3C 5D 14 ...r& .........JR|...<]. 023DC8A0 B1 63 80 3D 69 8F 3B CB F2 B8 2B B4 F2 B8 E4 1A 8F 66 AF CC 63 C9 1F 69 .c.=i.;...+......f..c..i 023DC8B8 7E A4 13 06 13 C5 B9 71 D4 F3 53 79 AC 54 A9 C6 3D 71 5A 5F B8 DC 2E D9 ~......q..Sy.T..=qZ_.... 023DC8D0 56 DD 8A BB B7 5E C6 A5 79 B8 18 07 8A 52 8D DD CB A7 34 A1 66 45 CB 0F V....^..y....R....4.fE.. 023DC8E8 98 0F CA 9E 93 CD 1F DD 72 47 A3 73 4E 51 4F 46 62 BB 9F FF D9 FF DB 00 ........rG.sNQOFb....... 023DC900 84 00 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 02 02 01 01 01 01 03 ........................ 023DC918 02 02 02 02 03 03 04 04 03 03 03 03 04 04 06 05 04 04 05 04 03 03 05 07 ........................
I couild extract it using dd. I just have to note start and end offsets (respectively 0x023DAFF8 and 0x023DC8E8) to find out what data blocs I have to extract from the image.
dd if=tmp.img of=toto.jpg bs=4096 skip=<start_offset> count=<blocs_number>
However, I have 1.2GB of data to recover and I don't have a lifetime to spend on this. Moreover, my first tries on 2MB pictures give corrupted files. I then decide to find some tools to automatize the task.
Tested tools
Theses are the tools I could experiment:
- jpg-recover (http://www.clarity.net/~adam/jpg-recover/jpg-recover)
- magicrescue (http://jbj.rapanden.dk/magicrescue/)
- photorec (http://www.cgsecurity.org/index.html?photorec.html)
- photorecover (http://www.ee.oulu.fi/~iiska/projects/photorecover/)
- recoverjpeg (http://www.rfc1149.net/devel/recoverjpeg.html.en)
After numerous tries, it appeared I needed two main features:
- being able to select file types I want recovered so I don't lose time on .zip, .png, .doc or anything else I don't care about
- being able to grab corrupted files so I can work on them later
Among theses tools, photorec fits the needs and appears very efficient. I could find about 750 JPEG or MPEG files. Videos are clean (yes !), but most of pictures are corrupted: some are truncated near the beginning, some looks like puzzles, others have there color palette fucked. Obviously, I'm missing something.
Going deeper
Corrupted pictures analysis
A quick look shows that only files bigger than 50KB have this kind of corruption. When look at the contents, I could find a wierd data block, mostly filled with zeros, between positions 0xC000 and 0xCFFF. That's exactly 4096 bytes of garbage, which is the size of a block. Hummm...
0xC000 = 49152
= 12 * 4096
That's the 13th block, being the first indirection block, that is positionned between real data blocks and thus is responsible for file corruption.
Tools I use seem to have been develop with data recovery on FAT file systems in mind, and this kind of filesystem does not present this kind of data structures. Therefore, they don't care about it and consider it part of the data. Thus, picture size given in JPEG header does not fit file size and JPEG interpreter finds unknown markers in it.
Indirection block removal
We can remove this block by hand using a hexa editor. It works fine but is time consuming. So I decided to automatize the process using dd in a shell script. Basicly, we will create a new file with 12 firsts data block, then append other data blocks starting at position 14.
for i in *.jpg; do
dd if=$i of=${i/.jpg/}_new.jpg bs=4096 count=12;
dd if=$i bs=4096 skip=13 >> ${i/.jpg/}_new.jpg;
done
Results
Final result is very satisfying. From a set of about 750 recovered objects, I have 9 MPEGs which are in fact 3 times the 3 same one. Among the 740 pictures, 690 are perfect (some are not related to my particular search). Remaining pictures still present defects we can basicly class into 3 groups:
- pictures with missing data, probably due to fragmentation
- pictures resulting of others pictures data bloc mixing, due to fragmentation as well
- pictures with very similar defects to the ones originaly found
In the first group, I could recover pictures with very few missing data, using ImageMagick (http://www.imagemagick.org/) to correct JPEG header and The Gimp (http://www.gimp.org/) to resize the picture.
Second group pictures are lost. I couldn't find a way to deal with them.
Third group pictures are big ones and show other indirection blocks. I didn't take the time needed to remove them.
Photorec improvements
I was in contact with Christophe Grenier (http://www.cgsecurity.org/), photorec author, so I kept him informed of my progression and the problems that occured. He then added code to support EXT2/EXT3 indirection bloc detection and removal (thanks so much, Christophe). I ran the tool again and could recover perfectly 700 pictures and 3 videos in one shot.
The latest 5.9-WIP beta version (http://www.cgsecurity.org/testdisk-5.9-WIP.tar.gz) of photorec (now part of testdisk) includes this feature and gave me wonderful results.
Sorting the photos
We end with a file list, numbered by order in which they got found by photorec. We definitly prefer having them sorted by date. To achieve this, we will rename the files according to EXIF meta-data fromJPEG header, using exiv2 (http://home.arcor.de/ahuggel/exiv2/) tool.
~# exiv2 -p s 1.jpg Filename : 1.jpg Filesize : 1285238 Bytes Camera make : FUJIFILM Camera model : FinePix F601Z Image timestamp : 2005:08:07 14:31:22 Image number : Exposure time : 1/70 s Aperture : F3.5 Exposure bias : 0 Flash : No, auto Flash bias : Focal length : 6.1 mm Subject distance: ISO speed : 200 Exposure mode : Auto Metering mode : Matrix Macro mode : Off Image quality : NORMAL Exif Resolution : 2736 x 1824 White balance : Auto Thumbnail : JPEG, 9612 Bytes Copyright : Exif comment :
Thus, we catch timestamp field and use it to rename files.
Simple solution for simple situation
Supposing we are in recovered pictures directory and we want to copy them into another directory (../final/ for instance), we do:
for i in ./*.jpg; do cp $i ../final/`exiv2 -p s $i | grep timestamp | awk '{ print $4 "-" $5 }'`.jpg; done
Then we get a picture list sorted by the date they were taken.
In my case, deleted folder had pictures from various people, so I wanted to sort them by author as well. I used "Camera model" field as basis for another rename:
for i in *.jpg; do mv $i `exiv2 -p s $i | grep model | awk '{ print $4 }'`-$i; done
And I'm done. Each file has a MODEL-DATE-HOUR.jpg naming scheme and I just have to move my files into appropriate directories based on filename. In fact, it would have been more interesting do sort by camera model in the first time as some pictures could have been taken by different cameras, but having the same timestamp.
Advanced solution for tricky situations
Jean-Cédric Chappelier (http://icwww.epfl.ch/~chappeli/), who had to deal with a huge pictures deletion using Photorec, sent me a improved script handling rare, but still possible, conflicts such as pictures shot in the same seconde using burst mode...
trash=trash$$
unknown=unknown$$
mkdir $trash
mkdir $unknown
for file in f*.jpg; do
newname=`exiv2 -p s $file | grep -a timestamp | awk '{ print $4 "-" $5 }' | sed 's/:/-/g'`
if [ $? -eq 0 -a "x$newname" != "x" ]; then
if [ ! -f ${newname}.jpg ]; then
mv $file ${newname}.jpg
else
cmp $file ${newname}.jpg
if [ $? -eq 0 ]; then
echo "WARN: $file is a duplicate of ${newname}.jpg: moved to $trash"
mv $file $trash
else
echo "ERR: ${newname}.jpg already exists but differs. $file unchanged"
fi
fi
else
echo "ERR: unable to get info from ${file}: moved to $unknown"
mv $file $unknown
fi
done
I warmly recommand you read his story (http://icwww.epfl.ch/~chappeli/perso/recup_photos.html), as it's a perfect example of how free software can be helpful for difficult situations.
Conclusion
The first lesson I could learn is that recovering deleted files on EXT3 is no simple task. Don't believe people claiming it's just like EXT2. It's just not. EXT3 does not delete files the way EXT2 does, so technics and tools for EXT2 just don't work.
The second lesson was that markers search was a valuable technic as long as you're lucky with fragmentation and you don't have too many file types to recover.
The third lesson was a reminder about EXT2/EXT3 indirection blocks you have to take care of for data recovery.




