kube-scheduler笔记之亲和性调度
综述
https://github.com/DavyJones2010/k8s-source-code-analysis/blob/master/core/scheduler/affinity.md
分类
分为如下几种:
NodeSelector
NodeAffinity
- preferredDuringSchedulingIgnoredDuringExecution: 软约束
- requiredDuringSchedulingIgnoredDuringExecution: 硬约束
PodAffinity:
- preferredDuringSchedulingIgnoredDuringExecution: 软约束
- requiredDuringSchedulingIgnoredDuringExecution: 硬约束
requiredDuringSchedulingIgnoredDuringExecution
与 NodeSelector 或者 LabelSelector 有啥区别? 都是硬约束, 都会导致Node被筛除掉
区别在于: LabelSelector 只能判断 “a.label.equals(b.label)”, 而NodeSelector支持完整的表达式, 例如
spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/e2e-az-name operator: In values: - e2e-az1 - e2e-az2 preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: another-node-label-key operator: In values: - another-node-label-value
“IgnoredDuringExecution”部分意味着,类似于
nodeSelector
的工作原理,如果节点的标签在运行时发生变更,从而不再满足 pod 上的亲和规则,那么 pod 将仍然继续在该节点上运行。也就是onetime的原则, 只会在调度时关注该标签, 后续节点标签变化, 不会重调度.
实现研究
NodeAffinity
代码样例
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
RequiredDuringSchedulingIgnoredDuringExecution(硬约束)
在Filter阶段生效
pkg/scheduler/algorithm/predicates/predicates.go:848 podMatchesNodeSelectorAndAffinityTerms
func podMatchesNodeSelectorAndAffinityTerms(pod *v1.Pod, node *v1.Node) bool {
// NodeSelector, 筛选Node
selector := labels.SelectorFromSet(pod.Spec.NodeSelector)
if !selector.Matches(labels.Set(node.Labels)) {
return false
}
// NodeAffinity.requiredDuringSchedulingIgnoredDuringExecution, 硬约束
if nodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution != nil {
nodeSelectorTerms := nodeAffinity.RequiredDuringSchedulingIgnoredDuringExecution.NodeSelectorTerms
nodeAffinityMatches = nodeAffinityMatches && nodeMatchesNodeSelectorTerms(node, nodeSelectorTerms)
}
}
PreferredDuringSchedulingIgnoredDuringExecution(软约束)
在Weighter阶段生效
pkg/scheduler/algorithm/priorities/node_affinity.go:34 CalculateNodeAffinityPriorityMap
func CalculateNodeAffinityPriorityMap(pod *v1.Pod, meta interface{}, nodeInfo *schedulernodeinfo.NodeInfo) (schedulerapi.HostPriority, error) {
node := nodeInfo.Node()
affinity := pod.Spec.Affinity
var count int32
// 遍历所有Soft约束
for i := range affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution {
preferredSchedulingTerm := &affinity.NodeAffinity.PreferredDuringSchedulingIgnoredDuringExecution[i]
// 根据表达式生成Selector
nodeSelector, err := v1helper.NodeSelectorRequirementsAsSelector(preferredSchedulingTerm.Preference.MatchExpressions)
// 根据Selector计算权重.
if nodeSelector.Matches(labels.Set(node.Labels)) {
count += preferredSchedulingTerm.Weight
}
}
// 返回计算好的权重
return schedulerapi.HostPriority{
Host: node.Name,
Score: int(count),
}, nil
}
PodAffinity
yaml样例
apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity
labels:
app: pod-affinity-pod
spec:
containers:
- name: with-pod-affinity
image: nginx
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- busybox-pod
topologyKey: kubernetes.io/hostname
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- node-affinity-pod
topologyKey: kubernetes.io/hostname
RequiredDuringSchedulingIgnoredDuringExecution(硬约束)
- 在filter阶段生效
如何实现的podAntiAffinity? 能强保障么?
Service亲和性
一个服务的第一个Pod被调度到带有Label “region=foo”的Nodes(资源集群)上, 那么其服务后面的其它Pod都将调度至Label “region=foo”的Nodes。
参考
https://www.qikqiak.com/post/understand-kubernetes-affinity/