Difference between revisions of "Containers/Network virtualization"
| m |  (→Requirements) | ||
| (12 intermediate revisions by 3 users not shown) | |||
| Line 1: | Line 1: | ||
| − | There are a number of  | + | There are a number of approaches to the network virtualization, caused by different requirements for different usages. This page is made in order to summarize them and create solution suitable for all. | 
| == Usages == | == Usages == | ||
| Line 7: | Line 7: | ||
| == Approaches == | == Approaches == | ||
| − | + | ||
| − | + | === Virtualization on the 2nd level (OpenVZ) === | |
| − | + | ||
| − | + | ==== Requirements ==== | |
| − | + | ||
| − | + | The main requirement is that containers should have close to standalone servers networking capabilities. In details: | |
| + | # containers should have own loopback; | ||
| + | # containers should have ability to setup their own level 3 addresses; | ||
| + | # containers should have ability to sniff their traffic; | ||
| + | # containers should have ability to setup their own routes; | ||
| + | # containers should have ability to receive multicast/broadcast packets; | ||
| + | # containers should have their own netfilters; | ||
| + | # containers should have at least one level 2 device;   | ||
| + | |||
| + | |||
| + | ==== Current implementation ==== | ||
| + | |||
| + | For input packets context switching is performed in netif_receive_skb(), inherited from the device  context. For output, context is inherited from the socket one. | ||
| + | |||
| + | === Virtualization on the 3d level (IBM) === | ||
| + | |||
| + | ==== Requirements ==== | ||
| + | |||
| + | # One can run servers in several containers listening on *:port without conflict and __without__ forcing the bind to use the IP address assigned to the container; | ||
| + | # The source address will be filled with the container IP address; | ||
| + | # Keep sockets isolated by namespace; | ||
| + | # have the loopback isolated; | ||
| + | # have the performance near to native as possible; | ||
| + | # have broadcast and multicast working. | ||
| + | |||
| + | ==== Current implementation ==== | ||
| + | |||
| + | For input packets context switching is inherited from the routing entry, for output - inherited from the socket one.   | ||
| + | |||
| + | === Sockets isolation (Linux-VServer) === | ||
| + | |||
| + | ==== Requirements ==== | ||
| + | |||
| + | # all interfaces and IPs are visible on the host | ||
| + | # routing and iptables is configured on the host | ||
| + | # guest has a subset of IPs assigned for 'binding' | ||
| + | # source ip (of guest packets) is within the assigned set | ||
| + | # 'local' guest traffic is isolated from other guests | ||
| + | # no measurable overhead on packet routing | ||
| + | # normal routing not impaired (same behaviour as without) | ||
| + | # Guest-Guest and Guest-Host traffic via Loopback | ||
| + | |||
| + | ==== Current implementation ==== | ||
| + | |||
| + | Network Context with 'assigned' set of IPs, which are used for 'collision' checks at bind | ||
| + | time, 'source' checks at send time and 'destination' checks at receive time. The first | ||
| + | assigned IPs is handled special as it is used for routing decisions outside the IP set. | ||
| + | Loopback traffic isolation is done via IP 'remapping'. | ||
| + | |||
| == Virtualization table == | == Virtualization table == | ||
| − | This is a summary table in order to show which core networking objects are virtualized/isolated in above approaches  | + | This is a summary table in order to show which core networking objects are virtualized/isolated in the above approaches and which are not. | 
| − | {|  | + | {| class="wikitable" | 
| ! width="20%" | Virtualization approach | ! width="20%" | Virtualization approach | ||
| − | ! width=" | + | ! width="10%" | network devices | 
| − | ! Width=" | + | ! Width="10%" | routing tables | 
| − | ! Width=" | + | ! Width="10%" | network sockets | 
| − | ! Width=" | + | ! Width="10%" | loopback | 
| + | ! Width="10%" | netfilters | ||
| |- | |- | ||
| − | | 2d level virtualization || v || v/i || v || v   | + | | 2d level virtualization || v || v/i || v || v || v   | 
| |- | |- | ||
| − | | 3d level virtualization || - || i || i || - | + | | 3d level virtualization || - || i || i || i || - | 
| |- | |- | ||
| − | |  | + | | sockets isolation || - || - || i || - || - | 
| |} | |} | ||
| Line 34: | Line 83: | ||
| * 'v' - virtualized | * 'v' - virtualized | ||
| * 'i' - isolated | * 'i' - isolated | ||
| − | * '-' -  | + | * '-' - neither virtualized nor isolated | 
| + | |||
| + | [[Category:Containers]] | ||
Latest revision as of 16:47, 14 January 2010
There are a number of approaches to the network virtualization, caused by different requirements for different usages. This page is made in order to summarize them and create solution suitable for all.
Contents
Usages[edit]
Current known usages are:
- Virtual Environments - complete OS environment, with it's own users, groups, filesystems and devices;
- Application Containers - partly isolated environment with application inside.
Approaches[edit]
Virtualization on the 2nd level (OpenVZ)[edit]
Requirements[edit]
The main requirement is that containers should have close to standalone servers networking capabilities. In details:
- containers should have own loopback;
- containers should have ability to setup their own level 3 addresses;
- containers should have ability to sniff their traffic;
- containers should have ability to setup their own routes;
- containers should have ability to receive multicast/broadcast packets;
- containers should have their own netfilters;
- containers should have at least one level 2 device;
Current implementation[edit]
For input packets context switching is performed in netif_receive_skb(), inherited from the device context. For output, context is inherited from the socket one.
Virtualization on the 3d level (IBM)[edit]
Requirements[edit]
- One can run servers in several containers listening on *:port without conflict and __without__ forcing the bind to use the IP address assigned to the container;
- The source address will be filled with the container IP address;
- Keep sockets isolated by namespace;
- have the loopback isolated;
- have the performance near to native as possible;
- have broadcast and multicast working.
Current implementation[edit]
For input packets context switching is inherited from the routing entry, for output - inherited from the socket one.
Sockets isolation (Linux-VServer)[edit]
Requirements[edit]
- all interfaces and IPs are visible on the host
- routing and iptables is configured on the host
- guest has a subset of IPs assigned for 'binding'
- source ip (of guest packets) is within the assigned set
- 'local' guest traffic is isolated from other guests
- no measurable overhead on packet routing
- normal routing not impaired (same behaviour as without)
- Guest-Guest and Guest-Host traffic via Loopback
Current implementation[edit]
Network Context with 'assigned' set of IPs, which are used for 'collision' checks at bind time, 'source' checks at send time and 'destination' checks at receive time. The first assigned IPs is handled special as it is used for routing decisions outside the IP set. Loopback traffic isolation is done via IP 'remapping'.
Virtualization table[edit]
This is a summary table in order to show which core networking objects are virtualized/isolated in the above approaches and which are not.
| Virtualization approach | network devices | routing tables | network sockets | loopback | netfilters | 
|---|---|---|---|---|---|
| 2d level virtualization | v | v/i | v | v | v | 
| 3d level virtualization | - | i | i | i | - | 
| sockets isolation | - | - | i | - | - | 
Legend:
- 'v' - virtualized
- 'i' - isolated
- '-' - neither virtualized nor isolated
