|Abstract or Summary
- Network coding, as the next generation of data routing protocols, enables each intermediate node in a network to process and encode its received data before forwarding it to the next nodes. Hence, the core idea in network coding is to allow a network to encode the data that is being transmitted through it. This revolutionary idea of data routing results in dynamic change in the content of each data packet. That is, in a network coding setting, the original data symbols that are generated at the source nodes evolve hop-by-hop as they travel through the intermediate nodes. This property is clearly in stark contrast with the methods that are used in traditional data routing protocols, where every intermediate node acts as a plain relay. In other words, in the conventional data routing algorithms, every intermediate node solely replicates its incoming data on one or more of its outgoing channels. The criteria and the policies based on which an intermediate node makes decisions about the proper outgoing channels corresponding to each incoming packet depend on the employed routing protocol. Usually, each intermediate node utilizes a set of routing information (such as a routing table) in order to find the most cost effective path or paths to the final destinations. The cost criterion may be defined based on various parameters, but what is fixed is that the general goal is always to find the most optimum route that starts from the node and reaches the final destination at the lowest cost. Upon finding the best output channels, the intermediate node simply copies the pertinent data packet on the optimum channels without inflicting any change in the data payload. This common method of data routing in conventional routing protocols is indeed considered as a very special case in network coding theory. The fact that in network coding every node processes (encodes) its input data to create its outgoing symbols implies that the encoding operation at a given network node can be expressed as a multi-input multi-output function which intakes the node's incoming data symbols as its input arguments and generates the outgoing data symbols departing the node as its outputs. Since each node in the network has its own function, they are called "local encoding function". This way of looking at the network coding operation enables us to simply define linear and nonlinear network coding as the network codes with linear and nonlinear local encoding functions, respectively. Hence, in linear network coding, every node (including the source and the sink nodes) executes a linear function on its incoming data symbols in order to generate its output symbols, while in nonlinear network coding this function is nonlinear. The linearity indicates that every output symbol of a local encoding function can be stated as a unique linear combination of its input symbols. Therefore, in linear network coding, the encoding operations at the intermediate nodes can be stated as matrix multiplications. If linear network coding is applied then each individual network node will have a matrix with known entries and fixed dimensions that represents the network coding operation at that particular node. These matrices will be called "local encoding matrix" in this work, where linear network coding is considered as the employed data routing protocol. The main focus of this thesis is to thoroughly study the security aspects of linear network coding, and propose new ideas and superior solutions for various security challenges that are faced in this class of data routing protocols. In a broad sense, security attacks can be categorized into two major classes: passive attacks and active attacks. In passive attacks (also known as wiretapping or eavesdropping attacks), the attacker threatens the confidentiality aspect of the data; meaning his goal is to obtain illicit access to the content of the data symbols while he is unable or not interested to change (manipulate) the content of the original data symbols. On the other hand, in active attacks (also known as pollution or Byzantine attacks), the attacker threatens the integrity and authenticity of data. That is, while the content of each original data packet is open and visible to everyone (including the attacker), his goal is to change or corrupt the content of data symbols or interrupt the data transmission process by jamming and blocking the flow of data stream in the network. Under assumption of linear network coding being the applied data routing scheme, both of these two attack classes are comprehensively studied in this thesis. Since in linear network coding each and every node in the network mixes up its incoming data to generate its outputs, observing the content of a symbol flowing on a channel usually does not reveal useful information about the original data symbols. This means network coding (and more specifically, linear network coding in our case) offers the network users an inherent and intrinsic level of security which is normally not an option when traditional store-and-forward data routing protocols are employed. In some cases and for some basic applications, this inherent security is enough; however, for many security-aware and sensitive applications that demand tighter security provisions, additional modifications to the base coding scheme are necessary in order to provide higher levels of security. In Chapter 2, six innovative security protocols that are tailored to effectively and efficiently counteract passive attackers in linear network coding are proposed. The solutions proposed in this chapter may be categorized in three main groups. Two protocols (A and B) are designed solely based on the intrinsic security feature of linear network coding. Protocol (A) requires each global encoding vector that is assigned to one of the network channels to have more than one nonzero entry. This stipulation is shown to be satisfied probabilistically with a probability that drastically approaches "1" as the code field, network capacity, or attacker's limitations increase. Protocol (B) is based on re-ordering the original message vector before sending it through the network. The re-ordering process is designed to elaborately scramble the message vector content in such a way that no additional function is required, yet by sending the scrambled message vector, the eavesdropper will not be able to obtain any information about any of the original data symbols. The sink nodes are the only nodes that are able to de-scramble the data and recover the original message vector. Both of these two protocols are extremely light-weight with no throughput reduction. The second group of our security solutions includes two protocols (C and E), each of which utilizes a hash function and a noisy symbol at the source node in order to generate the required random symbols (masking symbols) that are used to conceal the data symbols in the secured message vector. These protocols reduce the data rate by only one unit while they assume very lax conditions on the attacker's ability in accessing the independent network channels. The offered independency between the number of attacked channels and the secure data rate, and also enabling the source node to independently create its own keys and to change them as often as it generates new data packets are two of the remarkable properties of these protocols. The two remaining protocols (D and F) constituting the third group are in fact two variations of the second group protocols. That is, in order to alleviate the computational complexity burden of the algorithms in the second group, the hash functions are substituted by simple light-weight bijective functions in these protocols. Hence, in addition to all the benefits of the algorithms in the second group, these two protocols are also very fast and easy to implement. The fact that in (linear) network coding data symbols are constantly mixed up and intermingled as they travel through the network is indeed a double-edged sword. That is, as mixing and combining the data symbols provides some degree of security against snooping attackers (and this is besides the other advantages of network coding), if a malicious node injects some bogus or invalid data into the main data stream, then the inflicted pollution propagates throughout the network at an enormously great rate and causes the final destination nodes to be unable to recover the original data symbols. This class of attacks in linear network coding (i.e., pollution attacks) is studied in Chapter 3, where we propose a very efficient hierarchical scheme that is able to accurately pinpoint any number of polluting nodes in the network with any locational distribution. Our protocol is also capable of isolating and disconnecting the violating nodes from the rest of the network and therefore fixing the pollution problem in a more fundamental way.