Digital photos recovery

De Page Personnelle de Cédric Blancher.

link=R%E9cup%E9ration_de_photos_num%E9riques Version française


How I managed to recover 1.2GB of accidentally deleted pictures with 98% success on an EXT3 file system...

Sommaire

[modifier] Disclaimer

This article details deleted data (digital photos for instance) recovery from an EXT3 filesystem on a Linux partition and its specific aspects. Some other filesystems, such as FAT32 and NTFS in particular do not have theses specifics, and this make data recovery much more simple. There are numerous tools, commercial or not, to achieve data recovery on them. I particularly advise photorec, which will be described later in this article.

[modifier] The beginning : the big mistake

Planning a imminent DVD burning, I moved 1.2GB of vacation pictures in /tmp. Silly idea, isn't it ? But at the time, it was the only partition with enough space left for this amount of data, /var/tmp being quite busy. Then, I went to another task, then another, left my computer for cooking, finally completly lost what I was doing and switched off the computer. And the day after, after the boot, paf...

/tmp being cleaned at startup, everything was gone.


[modifier] The situation

  • First look : there's nothing left of my burn directory containing all the photos in /tmp
  • First action : dump an image of the partition on the filesystem so I work at calm
~# dd if=/dev/hda13 of=tmp.img bs=4096
  • First idea : see what I can do with usual EXT2 recovery technics and tools

[modifier] First moves with debugfs

Let's go !

~# debugfs tmp.img
debugfs 1.38 (30-Jun-2005)
debugfs:  lsdel
 Inode  Owner  Mode    Size    Blocks   Time deleted
0 deleted inodes found.

Hummm, it's bad.

debugfs:  ls -d
[...]
 140833  (3716) sva6l.tmp   <172129> (40) .exchange-sid
<219073> (16) v858323   <140833> (32) burn   <78244> (20) ati_blank
<172130> (20) orbit-root   <62596> (3604) gconfd-root
[...]

Better. We can see burn directory and its deleted inode. Let's go further.

debugfs:  imap <140833>
Inode 140833 is part of block group 9
        located at block 294916, offset 0x0000
debugfs:  cd <140833>
debugfs:  ls
 140833  (12) .    2  (4084) ..
debugfs:  lsdel
 Inode  Owner  Mode    Size    Blocks   Time deleted
0 deleted inodes found.
debugfs:  ls -d
 140833  (12) .    2  (4084) ..

Dead end... There's nothing inside the directory I can find with debugfs : no link, no unallocated inode...

[modifier] EXT3 vs. EXT2

It's Googling time. I quickly learn that, although being very similar, EXT3 is not just simply a journalized EXT2. Among the differences, there's deletion. When EXT2 only marque file structures (directory entry, inode and blocs) as unallocated, EXT3 adds a pass. It cleans file size and links to data blocs within the inode. So, there's no way to find data associated to a given deleted inode...

This explains why I can find burn directory inode using its unallocated directory entry in /tmp, but cannot find anything inside it, because all links to blocs refering to files and subdirectories are lost. Thus, EXT2 data recovery tools such as e2undel or recover won't simply work.

During my search, I could find a very good article (from which I got above pictures) on the subject in Linux World Magazine.

[modifier] Starting to work

For the date I hope to recover are 99,9% JPEG files and some MPEG files, I will use pattern search in order to find data blocs. Finding start and end markers for each file, I should be able to bring them back to life.

[modifier] Assumptions

My /tmp partition is nearly never used beyond 200MB. This makes me quite confident in the fact most data have been written sequentially on the disk without fragmentation.

[modifier] JPEG pictures

A JPEG picture file has a start marker (SOI, Start Of Image), which is 0xFFD8, and an end marker (EOI, End of image), which is 0xFFD9. One can find more detailled information on the web, here as an example.

Using hexedit, I could find some pictures quite easily.

~# hexedit tmp.img
[...]
023DAFF8   00 00 00 00  00 00 00 00  FF D8 FF E1  18 F9 45 78  69 66 00 00  49 49 2A 00  ..............Exif..II*.
023DB010   08 00 00 00  0B 00 0E 01  02 00 20 00  00 00 92 00  00 00 0F 01  02 00 05 00  .......... .............
023DB028   00 00 B2 00  00 00 10 01  02 00 08 00  00 00 B8 00  00 00 12 01  03 00 01 00  ........................
023DB040   00 00 01 00  00 00 1A 01  05 00 01 00  00 00 C0 00  00 00 1B 01  05 00 01 00  ........................
023DB058   00 00 C8 00  00 00 28 01  03 00 01 00  00 00 02 00  00 00 32 01  02 00 14 00  ......(...........2.....
023DB070   00 00 D0 00  00 00 13 02  03 00 01 00  00 00 02 00  00 00 69 87  04 00 01 00  ..................i.....
023DB088   00 00 00 01  00 00 A5 C4  07 00 1C 00  00 00 E4 00  00 00 A0 08  00 00 20 20  ......................
023DB0A0   20 20 20 20  20 20 20 20  20 20 20 20  20 20 20 20  20 20 20 20  20 20 20 20
023DB0B8   20 20 20 20  20 00 53 4F  4E 59 00 00  44 53 43 2D  50 31 30 00  48 00 00 00       .SONY..DSC-P10.H...
023DB0D0   01 00 00 00  48 00 00 00  01 00 00 00  32 30 30 35  3A 30 38 3A  30 35 20 30  ....H.......2005:08:05 0
023DB0E8   39 3A 33 32  3A 30 32 00  50 72 69 6E  74 49 4D 00  30 32 35 30  00 00 02 00  9:32:02.PrintIM.0250....
[...]
023DC870   53 7C 96 24  00 33 DB D2  A9 BB C5 21  5D 8D D9 9E  B8 A4 6C 05  38 E7 E9 4A  S|.$.3.....!].....l.8..J
023DC888   DD 09 EB 72  26 20 A8 F6  A6 ED 00 F4  19 A1 AD 4A  52 7C B6 1E  B2 3C 5D 14  ...r& .........JR|...<].
023DC8A0   B1 63 80 3D  69 8F 3B CB  F2 B8 2B B4  F2 B8 E4 1A  8F 66 AF CC  63 C9 1F 69  .c.=i.;...+......f..c..i
023DC8B8   7E A4 13 06  13 C5 B9 71  D4 F3 53 79  AC 54 A9 C6  3D 71 5A 5F  B8 DC 2E D9  ~......q..Sy.T..=qZ_....
023DC8D0   56 DD 8A BB  B7 5E C6 A5  79 B8 18 07  8A 52 8D DD  CB A7 34 A1  66 45 CB 0F  V....^..y....R....4.fE..
023DC8E8   98 0F CA 9E  93 CD 1F DD  72 47 A3 73  4E 51 4F 46  62 BB 9F FF  D9 FF DB 00  ........rG.sNQOFb.......
023DC900   84 00 01 01  01 01 01 01  01 01 01 01  01 01 01 01  01 02 02 01  01 01 01 03  ........................
023DC918   02 02 02 02  03 03 04 04  03 03 03 03  04 04 06 05  04 04 05 04  03 03 05 07  ........................

I couild extract it using dd. I just have to note start and end offsets (respectively 0x023DAFF8 and 0x023DC8E8) to find out what data blocs I have to extract from the image.

dd if=tmp.img of=toto.jpg bs=4096 skip=<start_offset> count=<blocs_number>

However, I have 1.2GB of data to recover and I don't have a lifetime to spend on this. Moreover, my first tries on 2MB pictures give corrupted files. I then decide to find some tools to automatize the task.

[modifier] Tested tools

Theses are the tools I could experiment:

After numerous tries, it appeared I needed two main features:

  • being able to select file types I want recovered so I don't lose time on .zip, .png, .doc or anything else I don't care about
  • being able to grab corrupted files so I can work on them later

Among theses tools, photorec fits the needs and appears very efficient. I could find about 750 JPEG or MPEG files. Videos are clean (yes !), but most of pictures are corrupted: some are truncated near the beginning, some looks like puzzles, others have there color palette fucked. Obviously, I'm missing something.

[modifier] Going deeper

[modifier] Corrupted pictures analysis

A quick look shows that only files bigger than 50KB have this kind of corruption. When look at the contents, I could find a wierd data block, mostly filled with zeros, between positions 0xC000 and 0xCFFF. That's exactly 4096 bytes of garbage, which is the size of a block. Hummm...

0xC000 = 49152
       = 12 * 4096

That's the 13th block, being the first indirection block, that is positionned between real data blocks and thus is responsible for file corruption.

Tools I use seem to have been develop with data recovery on FAT file systems in mind, and this kind of filesystem does not present this kind of data structures. Therefore, they don't care about it and consider it part of the data. Thus, picture size given in JPEG header does not fit file size and JPEG interpreter finds unknown markers in it.

[modifier] Indirection block removal

We can remove this block by hand using a hexa editor. It works fine but is time consuming. So I decided to automatize the process using dd in a shell script. Basicly, we will create a new file with 12 firsts data block, then append other data blocks starting at position 14.

for i in *.jpg; do
      dd if=$i of=${i/.jpg/}_new.jpg bs=4096 count=12;
      dd if=$i bs=4096 skip=13 >> ${i/.jpg/}_new.jpg;
done

[modifier] Results

Final result is very satisfying. From a set of about 750 recovered objects, I have 9 MPEGs which are in fact 3 times the 3 same one. Among the 740 pictures, 690 are perfect (some are not related to my particular search). Remaining pictures still present defects we can basicly class into 3 groups:

  • pictures with missing data, probably due to fragmentation
  • pictures resulting of others pictures data bloc mixing, due to fragmentation as well
  • pictures with very similar defects to the ones originaly found

In the first group, I could recover pictures with very few missing data, using ImageMagick to correct JPEG header and The Gimp to resize the picture.

Second group pictures are lost. I couldn't find a way to deal with them.

Third group pictures are big ones and show other indirection blocks. I didn't take the time needed to remove them.

[modifier] Photorec improvements

I was in contact with Christophe Grenier, photorec author, so I kept him informed of my progression and the problems that occured. He then added code to support EXT2/EXT3 indirection bloc detection and removal (thanks so much, Christophe). I ran the tool again and could recover perfectly 700 pictures and 3 videos in one shot.

The latest 5.9-WIP beta version of photorec (now part of testdisk) includes this feature and gave me wonderful results.

[modifier] Sorting the photos

We end with a file list, numbered by order in which they got found by photorec. We definitly prefer having them sorted by date. To achieve this, we will rename the files according to EXIF meta-data fromJPEG header, using exiv2 tool.

~# exiv2 -p s 1.jpg
Filename        : 1.jpg
Filesize        : 1285238 Bytes
Camera make     : FUJIFILM
Camera model    : FinePix F601Z
Image timestamp : 2005:08:07 14:31:22
Image number    :
Exposure time   : 1/70 s
Aperture        : F3.5
Exposure bias   : 0
Flash           : No, auto
Flash bias      :
Focal length    : 6.1 mm
Subject distance:
ISO speed       : 200
Exposure mode   : Auto
Metering mode   : Matrix
Macro mode      : Off
Image quality   : NORMAL
Exif Resolution : 2736 x 1824
White balance   : Auto
Thumbnail       : JPEG, 9612 Bytes
Copyright       :
Exif comment    :

Thus, we catch timestamp field and use it to rename files.

[modifier] Simple solution for simple situation

Supposing we are in recovered pictures directory and we want to copy them into another directory (../final/ for instance), we do:

for i in ./*.jpg; do cp $i ../final/`exiv2 -p s $i | grep timestamp | awk '{ print $4 "-" $5 }'`.jpg; done

Then we get a picture list sorted by the date they were taken.

In my case, deleted folder had pictures from various people, so I wanted to sort them by author as well. I used "Camera model" field as basis for another rename:

for i in *.jpg; do mv $i `exiv2 -p s $i | grep model | awk '{ print $4 }'`-$i; done

And I'm done. Each file has a MODEL-DATE-HOUR.jpg naming scheme and I just have to move my files into appropriate directories based on filename. In fact, it would have been more interesting do sort by camera model in the first time as some pictures could have been taken by different cameras, but having the same timestamp.

[modifier] Advanced solution for tricky situations

Jean-Cédric Chappelier, who had to deal with a huge pictures deletion using Photorec, sent me a improved script handling rare, but still possible, conflicts such as pictures shot in the same seconde using burst mode...

trash=trash$$
unknown=unknown$$
mkdir $trash
mkdir $unknown
for file in f*.jpg; do
  newname=`exiv2 -p s $file | grep -a timestamp | awk '{ print $4 "-" $5 }' | sed 's/:/-/g'`
  if [ $? -eq 0 -a "x$newname" != "x" ]; then
    if [ ! -f ${newname}.jpg ]; then
      mv $file ${newname}.jpg
    else
      cmp $file ${newname}.jpg
      if [ $? -eq 0 ]; then
        echo "WARN: $file is a duplicate of ${newname}.jpg: moved to $trash"
        mv $file $trash
      else
        echo "ERR: ${newname}.jpg already exists but differs. $file unchanged"
      fi
    fi
  else
      echo "ERR: unable to get info from ${file}: moved to $unknown"
      mv $file $unknown
  fi
done

I warmly recommand you read his story, as it's a perfect example of how free software can be helpful for difficult situations.

[modifier] Conclusion

The first lesson I could learn is that recovering deleted files on EXT3 is no simple task. Don't believe people claiming it's just like EXT2. It's just not. EXT3 does not delete files the way EXT2 does, so technics and tools for EXT2 just don't work.

The second lesson was that markers search was a valuable technic as long as you're lucky with fragmentation and you don't have too many file types to recover.

The third lesson was a reminder about EXT2/EXT3 indirection blocks you have to take care of for data recovery.

Navigation
autres
Locations of visitors to this page

No software patents !

Valid XHTML 1.0 Transitional

Valid CSS 2.1