What are "file hashes"?

P

Peter Jason

I have Win7 SP1, and I use CCleaner occasionally.

The output of CCleaner reports that it has erased
eMule "File Hashes" and I wonder what these are?

Peter
 
P

Paul

Peter said:
I have Win7 SP1, and I use CCleaner occasionally.

The output of CCleaner reports that it has erased
eMule "File Hashes" and I wonder what these are?

Peter
http://en.wikipedia.org/wiki/Hash_function

"A hash function is any algorithm or subroutine that ]
maps large data sets of variable length, called keys,
to smaller data sets of a fixed length."

So say I had a 1 megabyte sized file, and I ran MD5SUM on it,
I'd get a unique number out of the computation.

That computation is sensitive to byte order, so doing
MD5SUM(1,2,3,4) gives a different answer than MD5SUM(4,3,2,1).

Whereas addition ADD(1,2,3,4) and ADD(4,3,2,1) gives 10
as the answer in both cases. Addition would not be a very
good way of generating a small, unique number, for each file.
It would create too many "hash collisions".

http://en.wikipedia.org/wiki/Collision_(computer_science)

With simple addition as my hash generator, it doesn't take
much tinkering, to think of cases that are going to fail.
Whereas with MD5SUM, SHA1SUM, SHA256SUM or the like, those
are much more likely to give a unique number for each file
processed.

The site "virustotal.com", stores MD5SUM values for each
file submitted. It's one of the ways they speed up the search
for previously submitted files, where each user could have
used a different file name for their submitted file to be
checked. By storing a table of hashes, and searching through
that, there are reasonably good odds that if the file was
submitted before, the results for the virus scan are still
applicable.

So if eMule is storing a file hash, it would be for the
purpose of quickly determining whether two files are the
same thing or not.

Paul
 
F

Fishface

Peter said:
I have Win7 SP1, and I use CCleaner occasionally.

The output of CCleaner reports that it has erased
eMule "File Hashes" and I wonder what these are?
HashTab is the coolest thing ever. I think you have
to agree or you can't use it. It's OK, you can just cross
your fingers and click.

http://www.implbits.com/hashtab.aspx
 
P

Peter Jason

Peter said:
I have Win7 SP1, and I use CCleaner occasionally.

The output of CCleaner reports that it has erased
eMule "File Hashes" and I wonder what these are?

Peter
http://en.wikipedia.org/wiki/Hash_function

"A hash function is any algorithm or subroutine that ]
maps large data sets of variable length, called keys,
to smaller data sets of a fixed length."

So say I had a 1 megabyte sized file, and I ran MD5SUM on it,
I'd get a unique number out of the computation.

That computation is sensitive to byte order, so doing
MD5SUM(1,2,3,4) gives a different answer than MD5SUM(4,3,2,1).

Whereas addition ADD(1,2,3,4) and ADD(4,3,2,1) gives 10
as the answer in both cases. Addition would not be a very
good way of generating a small, unique number, for each file.
It would create too many "hash collisions".

http://en.wikipedia.org/wiki/Collision_(computer_science)

With simple addition as my hash generator, it doesn't take
much tinkering, to think of cases that are going to fail.
Whereas with MD5SUM, SHA1SUM, SHA256SUM or the like, those
are much more likely to give a unique number for each file
processed.

The site "virustotal.com", stores MD5SUM values for each
file submitted. It's one of the ways they speed up the search
for previously submitted files, where each user could have
used a different file name for their submitted file to be
checked. By storing a table of hashes, and searching through
that, there are reasonably good odds that if the file was
submitted before, the results for the virus scan are still
applicable.

So if eMule is storing a file hash, it would be for the
purpose of quickly determining whether two files are the
same thing or not.

Paul
I'll try to understand it, and now there's "iMule"
which is even more anonymous than eMule.
Peter
 
P

Peter Jason

A video conversion program "WTV to MPEG2" became
slow and evenmtually hung up. I deleted the
program and reinstalled after a cold boot and now
all is well. Also, my 12Gb of RAM slowly fills
up to "40%" (desktop indicator), but this falls to
"20%" after a cold boot. Is it good practice to
cold reboot before an image backup?
Peter
 
Y

Yousuf Khan

I have Win7 SP1, and I use CCleaner occasionally.

The output of CCleaner reports that it has erased
eMule "File Hashes" and I wonder what these are?

Peter
A hash, in general, is a unique identifier for a large amount of data;
this identifier is a smaller representation of that larger data, much
like a thumbnail. A file hash would therefore be a unique identifier for
a file. The simplest, earliest hash would be the checksum, where you
just simply add all of the bytes of data together. The checksum wasn't a
good unique identifier because any of the bytes of data were transposed
with each other, the checksum would've still come out to the same.

Yousuf Khan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top