Difference between revisions of "Containers/Network virtualization"
(→Approaches) |
(added Linux-VServer) |
||
Line 37: | Line 37: | ||
For input packets context switching is inherited from the routing entry, for output - inherited from the socket one. | For input packets context switching is inherited from the routing entry, for output - inherited from the socket one. | ||
− | === Socket virtualization | + | === Socket virtualization === |
'''Requirements''': | '''Requirements''': | ||
# implementation overhead for established tcp connections should be zero; | # implementation overhead for established tcp connections should be zero; | ||
Line 45: | Line 45: | ||
There is no context switching for packets at all, checks are performed between process and socket contexts. | There is no context switching for packets at all, checks are performed between process and socket contexts. | ||
+ | |||
+ | === Network Isolation (Linux-VServer) === | ||
+ | |||
+ | # all interfaces and IPs are visible on the host | ||
+ | # routing and iptables is configured on the host | ||
+ | # guest has a subset of IPs assigned for 'binding' | ||
+ | # source ip (of guest packets) is within the assigned set | ||
+ | # 'local' guest traffic is isolated from other guests | ||
+ | # no measurable overhead on packet routing | ||
+ | # normal routing not impaired (same behaviour as without) | ||
+ | # Guest-Guest and Guest-Host traffic via Loopback | ||
+ | |||
+ | '''Current implementation''': | ||
+ | |||
+ | Network Context with 'assigned' set of IPs, which are used for 'collision' checks at bind | ||
+ | time, 'source' checks at send time and 'destination' checks at receive time. The first | ||
+ | assigned IPs is handled special as it is used for routing decisions outside the IP set. | ||
+ | Loopback traffic isolation is done via IP 'remapping'. | ||
+ | |||
== Virtualization table == | == Virtualization table == | ||
Line 51: | Line 70: | ||
{| class="wikitable" | {| class="wikitable" | ||
! width="20%" | Virtualization approach | ! width="20%" | Virtualization approach | ||
− | ! width=" | + | ! width="10%" | network devices |
− | ! Width=" | + | ! Width="10%" | routing tables |
− | ! Width=" | + | ! Width="10%" | network sockets |
− | ! Width=" | + | ! Width="10%" | loopback |
+ | ! Width="10%" | netfilters | ||
+ | |- | ||
+ | | 2d level virtualization || v || v/i || v || v || v | ||
|- | |- | ||
− | | | + | | 3d level virtualization || - || i || i || i || - |
|- | |- | ||
− | | | + | | bind filtering || - || - || i || - || - |
|- | |- | ||
− | | | + | | network isolation || i/m || i || i || i/m || - |
|} | |} | ||
Line 66: | Line 88: | ||
* 'v' - virtualized | * 'v' - virtualized | ||
* 'i' - isolated | * 'i' - isolated | ||
+ | * 'm' - mapped | ||
* '-' - neither virtualized nor isolated | * '-' - neither virtualized nor isolated | ||
[[Category:Containers]] | [[Category:Containers]] |
Revision as of 17:13, 8 November 2006
There are a number of approaches to the network virtualization, caused by different requirements for different usages. This page is made in order to summarize them and create solution suitable for all.
Contents
Usages
Current known usages are:
- Virtual Environments - complete OS environment, with it's own users, groups, filesystems and devices;
- Application Containers - partly isolated environment with application inside.
Approaches
Virtualization on the 2nd level (OpenVZ)
Requirements:
The main requirement is that containers should have close to standalone servers networking capabilities. In details:
- containers should have own loopback;
- containers should have ability to setup their own level 3 addresses;
- containers should have ability to sniff their traffic;
- containers should have ability to setup their own routes;
- containers should have ability to receive multicast/broadcast packets;
- containers should have their own netfilters;
- containers should have at least one level 2 device;
Current implementation:
For input packets context switching is performed in netif_receive_skb(), inherited from the device context. For output, context is inherited from the socket one.
Virtualization on the 3d level (IBM)
Requirements:
- One can ran servers in several containers listening on *:port without conflict and __without__ forcing the bind to use the IP address assigned to the container;
- The source address will be filled with the container IP address;
- Keep sockets isolated by namespace;
- have the loopback isolated;
- have the performance near to native as possible;
- have broadcast and multicast working.
Current implementation:
For input packets context switching is inherited from the routing entry, for output - inherited from the socket one.
Socket virtualization
Requirements:
- implementation overhead for established tcp connections should be zero;
- FIXME
Current implementation:
There is no context switching for packets at all, checks are performed between process and socket contexts.
Network Isolation (Linux-VServer)
- all interfaces and IPs are visible on the host
- routing and iptables is configured on the host
- guest has a subset of IPs assigned for 'binding'
- source ip (of guest packets) is within the assigned set
- 'local' guest traffic is isolated from other guests
- no measurable overhead on packet routing
- normal routing not impaired (same behaviour as without)
- Guest-Guest and Guest-Host traffic via Loopback
Current implementation:
Network Context with 'assigned' set of IPs, which are used for 'collision' checks at bind time, 'source' checks at send time and 'destination' checks at receive time. The first assigned IPs is handled special as it is used for routing decisions outside the IP set. Loopback traffic isolation is done via IP 'remapping'.
Virtualization table
This is a summary table in order to show which core networking objects are virtualized/isolated in above approaches or not.
Virtualization approach | network devices | routing tables | network sockets | loopback | netfilters |
---|---|---|---|---|---|
2d level virtualization | v | v/i | v | v | v |
3d level virtualization | - | i | i | i | - |
bind filtering | - | - | i | - | - |
network isolation | i/m | i | i | i/m | - |
Legend:
- 'v' - virtualized
- 'i' - isolated
- 'm' - mapped
- '-' - neither virtualized nor isolated