Netflix's use of eBPF flow records at scale to gain network knowledge

in myarticles •  3 years ago 

Alok Tiagi, Hariharan Ananthakrishnan, Ivan Porto Carrero, and Keerti Lakshminarayan contributed to this article.
Netflix has created Flow Exporter, a network observability sidecar that captures TCP flows in near real time using eBPF tracepoints. This highly performant sidecar delivers flow data at scale for network insight while using less than 1% of the CPU and memory on the instance.Academic Master is a US based writing company that provides thousands of free essays to the students all over the World. If you want your essay written by a highly professional writers, then you are in a right place. We have hundreds of highly skilled writers working 24/7 to provide quality essay writing services to the students all over the World.

Difficulties

Netflix's cloud network infrastructure today consists of AWS services such as VPC, DirectConnect, VPC Peering, Transit Gateways, NAT Gateways, and so on, as well as Netflix-owned devices. Netflix software infrastructure is a vast distributed ecosystem made up of specific functional layers that run on AWS and Netflix-owned services. While we attempt to keep the ecosystem as simple as possible, the inherent nature of leveraging a range of technologies will provide us with problems such as:

App Dependencies and Data Flow Mappings: As the number of microservices grows by the day, it is difficult for both service owners and centralised teams to discover systemic issues without understanding and insight into an application's dependencies and data flows.

Pathway Validation: Netflix's rate of change in the production streaming and studio environment can cause services to be unable to communicate with other resources.

Service Segmentation: Due to the convenience of cloud deployments, different AWS accounts, deployment procedures, interconnection practises, and so on have grown organically. It is difficult to improve our dependability, security, and capacity posture without network visibility.

Network Availability: With our ecosystem's predicted continuing growth, it's tough to determine our network bottlenecks and potential restrictions.

Cloud Network Insight is a set of technologies that gives operational and analytical insight into the cloud network architecture in order to handle the detected issues. We can deliver network knowledge to users and central teams through numerous data visualisation approaches such as Lumen, Atlas, and others by collecting, accessing, and analysing network data from a range of sources such as VPC Flow Logs, ELB Access Logs, eBPF flow logs on instances, and so on.

Exporter of Flows

The Flow Exporter is a sidecar that captures TCP flows in near real time on instances that support the Netflix microservices architecture using eBPF tracepoints.

What exactly is BPF?

Berkeley Packet Filter (eBPF) is an in-kernel execution engine that processes a virtual instruction set. It has been extended as a secure means to increase kernel functionality. In some ways, eBPF does for the kernel what JavaScript does for websites: it enables the creation of a wide range of new applications.

An eBPF flow log record represents one or more network flows with TCP/IP statistics occurring within a variable aggregation interval.

The sidecar was built by combining the extremely performant eBPF with carefully selected transport protocols to utilise less than 1% of CPU and memory on each instance in our fleet. The choice of transport protocols such as GRPC, HTTPS, and UDP is determined at runtime based on the features of the instance placement.

The Flow Exporter's runtime behaviour can be adjusted dynamically through configuration changes using Fast Properties. In addition, the Flow Exporter sends numerous operational metrics to Atlas. Lumen, a self-service dashboarding platform, is used to visualise these metrics.

So, how do we scale up the intake and enrichment of these flows?

The Flow Collector service is a regional service that consumes and enriches flows. Within the cloud, IP addresses might shift from one EC2 instance or Titus container to another over time. We utilise Sonar to associate an IP address with a specific application at a specified moment. Sonar is a service that tracks the identity of IPv6 and IPv4 addresses.

The IP address change events from Sonar via Kafka are consumed by Flow Collector, as are the eBPF flow log data from the Flow Exporter sidecars. It links traffic data with application metadata from Sonar in real time. The attributed flows are routed to the Hive and Druid datastores by Keystone.

The attributed flow data is used in a variety of Netflix use cases, including network monitoring and network consumption forecasts via Lumen dashboards and machine learning-based network segmentation. Security and other partner teams also use the data for insight and incident analysis.

In conclusion

Using eBPF flow logs to provide network insight into cloud network architecture at scale is made possible by eBPF and a highly scalable and efficient flow collection pipeline. The approach has shown to be scalable after numerous iterations of the architecture and minor tweaks.

Currently, we are consuming and enriching billions of eBPF flow records every hour while offering insights into our cloud ecosystem. The augmented data enables us to examine networks across multiple dimensions (for example, availability, performance, and security) to ensure that apps can efficiently transport their data payload across a globally dispersed cloud-based ecosystem.

For More amazing articles, please visit MyArticles

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!