Originally posted by Neglacio at the Flox forum.
This is a small introduction to what Flox must achieve in my eyes:
Flox will be a Blocks Enhanced Network. That's why I called the pre-project the abbreviation from it, BEN. It'll be a P2P network
What? Blocks, I hear you say. Well, blocks of files are wonderful! You can have similar files, like only one second missing from a song, but still 99% is the same.
With the current hashing system, that one second less will result in a total different hash, so classic networks will treat these two files, which are only 1 second different, as two totally different files with their own bunch of sources.
This is bad! That way, many sources will get lost, and speed will be cut into two.
So, everyone wants to download the file they want, but with more sources. Introducing blocks will speed up the download that you want, with sources from other files.
Looks nice isn't it? But tech-geeks will say this: "Well, shift one bit to another side, and the whole range of blocks will change, and so will the blocks."
That's right. Let's say we split the files in blocks of 100KB and we cut out the first 50KB from the file. Well, then 10 seconds less in a song will create another kind of 'HashChunkList'.
This can be solved by an own-made technique called Palindrom File Splitting (PFS). A Palindrom is a term from the biochemistry and is used to identify genes and molecules. Now, what is it? The application will search in the file for, so called, mirror sequences. They're a byte range, that'll occur again after it but then in a mirrored mode. An example: 101110011/110011101. The line is like the mirror. And on that line a chunk will be cut off. The nice thing is that a very similar file will have the same Palindrom's, for 98%. So all your blocks will also be 98% similar. So you could use the sources ,who have that 98% from the file you want, too
There's also a problem with the network structure (which will be a DHT). The pressure on it! If we are going to store 30 000 blocks for a 4 MB file inside the network, I think that may be a little bit too much. That's why we have this idea: Let's order the blocks in bytesize. Now, let's take the 10 biggest blocks. Hash them all together (see beneath). That will be the Big Master Hash (BMH). Then do the same for the 10 smallest blocks. That will be the Tiny Master Hash (TMH). Now spread those two hashes as two seperate files. Due to this way of working , similar files will either have the BMH or the TMH as the same. Then you can compare the list which contains all the hashes from a file. We'll call that list the Hash-Chunk List (HCL). If there are Chunks the same, well download them!
And next, the hash used. Well, I'm hoping to use Radio-Gatùn, a branch of PANAMA. Why RG? Well, it's a successor of SHA1. It's kinda new, from 2006, and has a kinda nice extra. You can use it as a Stream Cipher What's that? A Stream Cipher is a kind of network encryption :p In this world of Anti-P2P this IS necessary. You don't want the feds to know that you're downloading some webcam video from your girlfriend, or do you