I have just released dnscheck, a command line tool to randomly query different nameservers spread across the world or per location (country or city).
To install dnscheck, simply run
1


Have fun,
I have just released dnscheck, a command line tool to randomly query different nameservers spread across the world or per location (country or city).
To install dnscheck, simply run
1


Have fun,
This post assumes you have a sound understanding of Quantum Mechanics, more precisely eigenstates, superpositions and entanglements
Shor’s algorithm allows us to find a factor of any composite number in for any random , which is important in this scope to understand, as cryptography’s security baseline is founded on the fact that factoring cannot happen in polynomial time  I have talked on this subject in my post Yes DiffieHellman is secure, quoting myself
Because the discrete logarithm is not NPcomplete, it is not possible to compute our in polynomial time and therefore generating a big prime number to be used in generating a cryptographic key renders exhaustive search attack completely useless.  Ali Abbas
Since Shor’s algorithm runs in polynomial time to the length , it solves the discrete logarithm problem and hence breaks the conjecture on which public key cryptography such as RSA are based upon.
Shor works because the factoring problem can be reduced in finding the period () which validates the cyclic function , in other words, if the (n and N are coprime integers), then given is even, and are not multiples of , but . This gives us, , since as we have just seen, both and share a common ‘nontrivial’ factor of . Finding is done by transforming the periodic sequence using a Quantum Fourier Transform.
QFT is simply a Discrete Fourier transformation applied to a quantum amplitude using a unitary operator, meaning, there exists such that , in other words, assuming a basis state, there exists a vector which is a superposition of all computational basis, defining the orthonormal basis states  it is unitary because the amplitude of the vector never changes as part of the transform.
For example, taking a set of complex numbers, , the orthonormal basis state acted upon is: defined by .
This means that the Fourier transform of an amplitude is a mapping of an arbitary state to since per the Discrete Fourier equation, QFT can then be modelled as follow: .
Having said that, and not wanting of course to go into a full lecture on QFT  what is most important to understand here is that the speed at which we can apply QFT  by properly aligning the Qubits in a proper Quantum circuit, QFT is computed in order of gates (assuming of course that all factors of N are )  an equivalent amplitude of would require simple gates.
To find the period of a random such as and , we create 2 entangled registers (input and output) of size , both initialized to . We then compute our periodic function f(x) to each number (remember, register 1 is loaded with an equally weighted superposition of integers 0 to N1) in our input register and store the value in the second register  due to the superposition state and quantum computation parallelism, this operation is done in one step. Once the quantum computation is done, we observe the second register for some random value , such that register 1 is collapsed into an equal superposition of each values ranging from to for each value; In other words, we have a set values of possible such as , which means we observe a lower integer , such as . Our superposition in register 1 being , we apply a Discrete Fourier Transform (see earlier) on the superposition, this will generate an amplitude of integers (spectrum) which are multiples of ; Since we no longer have an equal superposition of states and our superposition in register 1 was periodic in nature, we can observe a state, let’s call it , which has a high probability of being a multiple of . Having and , we can easily compute and verify whether . If our verification works, we have found our periodic value and can resume to calculate . If not, then we pretty much repeat the process with a random value .
Of course, Shor’s algorithm can be more elaborated into steps and in soft and hard cases whether divides or not and if you have followed the core logic of the algorithm which we had quickly undergo, you can only come to realize that Shor’s algorithm is highly probabilistic in nature due to the nature of DFT, hence the more Shor is run, the more it becomes accurate. Quantum computing parallelism provides a significant gains since all states of all values of are computed in one step. Hence the problem of solving the discrete logarithm problem is in finding the period , which in case of Shor’s is shown to be trivial and faster than classical computing using a Discrete Fourier transform.
When installing vmware player 4.0.4, I had the nice surprise of finding out that the vmnet kernel module would not properly compile while starting the vmware service.
1 2 3 4 

Manually directly compiling the module, we see a couple of incompatibility with the kernel header definitions
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 

I have created a small repo at my github account @alouche, containing some code changes to fix these issues  the repo will continuously updated if necessary against different kernel versions (only from 3.2 up). To fix it, simply replace vmnet.tar in /usr/lib/vmware/modules/source with my modified version
1


Start the vmware service and the vmnet module will automatically successfully compile.
Enjoy,
Alright… this is just a tiny hint on the process I used to nuke some committed files from all commit history
1 2 3 4 5 

And here comes the explanation:
git filterbranch –indexfilter “git rm rf –cached –ignoreunmatch my_files” HEAD
Our action is to rewrite our branch, hence we need to use the top level “filterbranch” command.
–indexfilter: there is no need to checkout the current branch, so we can move faster and simply filter in which we issue the “git rm rf …”
gitrm: quiet obvious
–cached: Only match the paths in the index  leaving modified matching files untouched –ignoreunmatch: result 0 status in any case if no match
HEAD: obviously, we are working on our last commit
rm rf .git/refs/original/
Even with our branch rewrite from earlier, we still have a backup in refs/original, so we need to delete it
git reflog expire –all
Here is where it gets interesting, you see, each action performed inside git is “backed up” in the reflog. Think of it as a safety net, which is an inventory hash of all the points you been at for each commit. So it is possible to restore the files commit from the reflog, hence “expire –all”.
git gc –aggressive –prune
Oh my… gc what? “garbage collector?”, well actually we have rewritten the branch, purged the reflog and we are left with a lot of unused objects, so time to save some disk space and clean up
git push origin +master
Well… I don’t want to merge and then push, that would be deafeating the whole purpose of my previous actions and since no one has yet pulled from this repo, so we need to force the nonfastforward since we are pretty much breaking the objects inheritance, hence the “+master”
For the last few days, there has been a buzzing news in the community, following the recent discovery of a socalled information leak in the skype SDK. [email protected], published a python code sample “exploiting this vulnerability” https://github.com/zhovner/Skypeiplookup/ using a deobfuscated SDK and published a demo site @http://skypeipfinder.tk/. More related information on the skypeopensource project can be found @ http://skypeopensource.blogspot.de/
So to sumpup, the “socalled leak” takes place by: 1. having “debug logging enabled” in the hijacked SDK 2. triggering a lookup information on a user who happens to be online, such as seeing his vcard and blam! both private and public ip addresses of this user are shown in the log.
Now, I say “socalled” leak, because truth be told, this isn’t leak nor a bug and here is why:
The skype protocol is merely at its core a P2P protocol  while it has never standardized and is fully proprietary, a minimum understanding on how P2P architecture operates clearly explain why the skype client passes such information, after all, there aren’t truly such things as relay servers in a P2P network, clients in this case are exchanging the VoIP traffic directly and doing all the processing. Skype’s communication architecture constitutes of 3 types of nodes, “login servers”, “supernodes” and “standard nodes”.
To put it as simple as possible, “login servers” are the servers you connect to authenticate, “standard nodes” are clients (skype clients) sitting behind a NAT firewall and “supernodes” (most interesting for this article) are simply the opposite of “standard nodes”, in other words, they are directly assigned a public IP address and no firewall rules are blocking a direct connection to the skype random port client.
For a call to take place between 2 “standard nodes”, they must have a direct non NATed connection, since “standard nodes” sit behind a NAT/firewall, a direct connection between the 2 hosts is therefore not possible, to overcome this skype uses a technique called “UDP holepunching” and this is where “supernodes” come into play.
UDP holepunching is a simple technique which persists in somewhat convincing the firewall that the incoming UDP packets are responses of an already established connection. Now I say “established”, but remember, UDP is not connection oriented, so we basically have no concept of sessions and all that nastySYNoverhead :P, but we have a NAT recorded session which we could exploit.
To clarify: assume we have 2 hosts A, B each with a NAT/firewall and a random server called X. UDP holepunching works as follow:
Now that you have a basic understanding of UDP holepunching, keep in mind that a supernode will act as a relay in the case that A and B are still unable to communicate in cases where the firewalls are randomizing the source IP of the clients when the NAT process takes place.
Finally, you may ask:
“Ok, I get it! A needs to know B’s public IP and viceversa, but why the private IP addresses, as seen in the socalled skype SDK IP leak”.
Well, you are right, why is Skype communicating the private IP address? Well, it is like answering, why would 2 hosts communicate over their NAT IPs if they could directly communicate through their privately assigned IP addresses? in other words, if 2 clients are located on 2 routeable LANs, it would be faster for them to exchange internally than externally and hence the reasons why during the holepunching process, both A and B reported not only their public IPs but also their private IPs which server X exchanged between the 2 clients, both ending up knowing the public and private IPs as well as source ports of the opposite peer.
So… there is no leak, just some guys who figured out how to enable “printf” in an SDK which hooks to your client :)
Needless to say, if you are curious as the kind of information which circumvent over skype  read more @ http://www.scribd.com/doc/69593950/Skype
Here are a few snaps from my visit at the Mercedes Museum this weekend in Stuttgart  for those visiting Germany, it is definitely worth the stop. It was both cultivating and a lot of fun :) [flickrgallery mode=”photoset” photoset=”72157629512062396”]
I have recently backported the rfc2988bis changes (initRTO=1 and fallack) to the redhat 2.6.32 kernel  find the patch on my github account at @ https://github.com/alouche/redhat2.6.32kernelpatches/blob/master/rfc2988bis.patch
On short lived connections with a lot of 3WHS, a lower initial RTO will improve 3WHS latency by 22000msX% (X% being the average of packet drops of a specific route). For further technical details, refer to https://www.ietf.org/proceedings/77/slides/tcpm1.pdf
Since the 2.6.23 kernel, the Linux kernel process scheduler previously O(1) was replaced by CFS  a Completely Fair Scheduler.
CFS uses a redblack tree as datastructure and unlike previous Unix process scheduler does not account a traditional time slice of process execution but accounts what is referred as the process virtual runtime, expressed in nanoseconds (as opposed to Hz or jiffies). The usage of a selfbalanced tree as the redblack tree allows for a lookup of time per the height of the tree, but more on this later.
The Virtual Runtime, declaration vruntime is a variable part of the process inherited C structure as defined in linux/sched.h which accounts for the time a process run in relation to the number of running processes. Each process holds a process descriptor “task_struct” dynamically allocated via the slab allocator.
1 2 3 4 5 6 7 8 9 10 11 12 13 

The sched_entity structure linked in “task_struct” is defined as
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 

While the other variables also play a role in CFQ decision’s algorithm, vruntime is by far the core variable which needs more attention as to understand the scheduling decision process. As stated earlier, CFQ does not account time slice as did previous schedulers, the vruntime is evaluated
The vruntime for each process is calculated as followed:
The update_curr function is responsible to calculate the running time spent by the process, which stores the value into an unsigned long variable delta_exec which is computed as followed:
1 2 3 4 

delta_exec is passed unto __update_curr
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 

calc_delta_fair will return the weighted value of the process’s calculated runtime delta_exec in relation to number of runnable processes. The sub function used for that calculation is calc_delta_mine.
To keep in mind:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 

Once the vruntime is computed, it is stored into the inherited sched_entity structure of the process by calling
1


in the previously seen __update_curr function, we also notice the call
1


What this does is to compute the smallest vruntime of all runnable processes and store it at the leftmode node of the redblack tree.
The CFS selection algorithm for process to be run is extremely simple, it will run its red black tree and search for the smallest vruntime value pointing to the next runnable process. In other words, “run the process which is represented by the leftmost node in the tree”, since the leftmost node constains the smallest vruntime, referred in the source code as min_vruntime.
To conclude this post, it is important to note that CFS does not walk the whole tree since min_vruntime is referenced by rb_leftmost in the cfs_rq struct (kernel/sched.c)
1 2 3 4 5 6 7 8 9 10 11 12 

as seen in this construct (kernel/sched_fair.c)
1 2 3 4 5 6 7 8 9 

DCB  Data Center Bridging is set of standard which defines 4 set of independent technologies/concepts to pretty much make Ethernet lossless, hence to support storage traffic. We will not go into a debate over FCoE, whether you should consider a single fabric for both storage and “standard/Ethernet” traffic in your data center design strategy or go a more traditional way.
As we said earlier, DCB is a set of standards, actually a set of 4 standards, which we will depict over the “DCB 101” posts series. DCB is compromised of:
While these standards are independent, they do correlate as dependent layers  remove one and the nice theory of maintaining a lossless transport fabric will start falling apart. Alright! I said no debate on FCoE.
You may sometimes come across some technical documents which refer to PFC as PerPriority Pause  for now, keep this in mind as it will shortly become clear as you read on.
Who says “Priority flow”, says “multiple flows/segments/divided lanes with different priorities” and at the end that’s naively what PFC is about. PFC’s goal is to merely segment traffic over the Ethernet fabric/medium and protocol into streamed marked priorities and define specific “guidance/action” for each of these streams. In other words, think of a highway which has a unique lane  PFC is virtually creating a secondary, third, fourth lane so that certain car types will be assigned to a specific lane  some cars can move faster, while others can be stuck in a traffic jam.
Moving away from the highway simplistic illustration  the PCP value in the 802.1Q Tag Control Info 2bytes header is used by PFC to determine the priority of the traffic. It is important to keep in mind that PCP is not defined by DCB but by the 802.1p task committee; another great example is QoS which makes use of PCP for traffic classification.
Because PCP is only 3bits, we can only define 8 priorities. While the priority positive integer is ascendant in terms of higher priority, keep in mind that, a PCP value of 1 is lower than a PCP value of 0, this is simply due to the Network Priority translation value, such as PCP0 = NP1 and PCP1 = NP0. I will not detail why this is as it is now, but perhaps in a future post.
We step away shortly from PCP and PFC to review some small concepts about Pause frames which will lay down the path to not only understand the importance of PCP but what PFC introduces to make Ethernet lossless.
Who talks about congestion, talks about the inability to not forward nor process traffic  whether that be frames or simply segments. Like on a highway, one way of getting rid of a traffic jam is to stop a specific flow of traffic or the whole traffic to dismantle the traffic jam. Pause Frames are like the highway patrol and yes they can also shoot you down ;)… for more insights on this analogy, you can refer to my following posts:
Without going into too much details, a Pause Frame is simply a MAC Frame which carries an OPCODE of 0x0001 in its Mac Control field and a quanta based time unit field.
The opcode value of 0x0001 simply says that this MAC Frame is a Pause Frame and the quanta time field expressed as a 2bytes unsigned integer refers to how many bytes should the sending of frames be stopped. Calculating in clocktime, how long the sender will stop sending frames is relative to the transmission rate; For example, one quanta is 512bytes, on a 10Ge link, that means that one quanta is roughly 51.2ns (512/10⁻⁸).
While Pause frames are essential to prevent congestion (at least that’s the goal), 802.3x frames cannot differentiate priority based traffic, in other words, assuming both LAN and storage traffic are processed by the same frabric, a Pause frame issued due to the beginning of a congestion on the LAN traffic would result as well in an interruption of the storage traffic.
Because of the limitation of current Pause Frames, PFC defines a new Pause frame standard, which is a reuse of the existing pause frame standard but with priority capabilities. As with the 802.3x frame, the PerPriority Pause carries the same OPCODE but at the difference of the following:
Once PPP is negotiated, 802.3x frames are no longer validated on the ports received PPP frames, hence it is not possible to use 802.3x on top of PPP.
In summary, PFC makes use of PPC as defined in 802.1p, introduces a new pause frame format designed to reduces the “probability” of packets drop, treating defined priority traffic as separate queues. Having said that, PFC isn’t without its downfall  without a Congestion protocol such 802.1Qau, PFC will results in a headofline blocking scenario.
Finally it is important to note when using PFC, it is important to carefully calculate the buffer size to prevent frame lost on the receiving size while a PFC pause frame is sent  another post will soon clarify the methodology to use to calculate the buffer size.
When it comes to the DiffieHellman algorithm, there seem to be many confusions from newbies in Cryptography as to whether an attacker could easily recompute the shared key by intercepting the prime numbers + public keys. While the answer is “no”, understanding why, requires an understanding the discrete logarithm problem. So here we go.
The discrete logarithm problem can be summarized as follow:
Given a prime, compute
In other words, is defined to be a primitive root of the finite group of order and we need to calculate the exponential such as resolving .
The logarithm is then calculated modulo the order of in the finite group such that if .
Because the discrete logarithm is not NPcomplete, it is not possible to compute our in polynomial time and therefore generating a big prime number to be used in generating a cryptographic key renders exhaustive search attack completely useless.
Before we dive further, here is a very short reminder of the DH algorithm:
User1 and User2 agree on a prime number and a generator such as there exist an exponent so that .
Both users secretly generate a random key, let’s call it ( for User1 and for User2); from there on, and are used by both users to generate their public keys.
The public key equation is: , let’s call it
After both users compute ( for User1 and for User2), they exchange their public keys. The shared secret key can then be generated from the opposite user’s public key.
The secret key equation is:
As you can see, to recompute the shared key, the attacker would need to resolve the discrete logarithm problem as discussed earlier, that is, detect and from and , since it is not possible to directly compute ; after all and not .
There are of course many algorithms that attempt to resolve the discrete logarithm, such as the “squarre root attack”, but this is the subject for another post.
I hope this clarifies to new comers why DH is secure enough to be used in key exchange.