Introducing Faust¶
- Web:
- Download:
- Source:
- Keywords:
distributed, stream, async, processing, data, queue, state management
Table of Contents
What can it do?¶
- Agents
Process infinite streams in a straightforward manner using asynchronous generators. The concept of “agents” comes from the actor model, and means the stream processor can execute concurrently on many CPU cores, and on hundreds of machines at the same time.
Use regular Python syntax to process streams and reuse your favorite libraries:
@app.agent() async def process(stream): async for value in stream: process(value)
- Tables
Tables are sharded dictionaries that enable stream processors to be stateful with persistent and durable data.
Streams are partitioned to keep relevant data close, and can be easily repartitioned to achieve the topology you need.
In this example we repartition an order stream by account id, to count orders in a distributed table:
import faust # this model describes how message values are serialized # in the Kafka "orders" topic. class Order(faust.Record, serializer='json'): account_id: str product_id: str amount: int price: float app = faust.App('hello-app', broker='kafka://localhost') orders_kafka_topic = app.topic('orders', value_type=Order) # our table is sharded amongst worker instances, and replicated # with standby copies to take over if one of the nodes fail. order_count_by_account = app.Table('order_count', default=int) @app.agent(orders_kafka_topic) async def process(orders: faust.Stream[Order]) -> None: async for order in orders.group_by(Order.account_id): order_count_by_account[order.account_id] += 1
If we start multiple instances of this Faust application on many machines, any order with the same account id will be received by the same stream processing agent, so the count updates correctly in the table.
Sharding/partitioning is an essential part of stateful stream processing applications, so take this into account when designing your system, but note that streams can also be processed in round-robin order so you can use Faust for event processing and as a task queue also.
- Asynchronous with
asyncio
Faust takes full advantage of
asyncio
and the newasync
/await
keywords in Python 3.6+ to run multiple stream processors in the same process, along with web servers and other network services.Thanks to Faust and
asyncio
you can now embed your stream processing topology into your existingasyncio
/ https://pypi.org/project/eventlet//https://pypi.org/project/Twisted//https://pypi.org/project/Tornado/ applications.- Faust is…
- Simple
Faust is extremely easy to use. To get started using other stream processing solutions you have complicated hello-world projects, and infrastructure requirements. Faust only requires Kafka, the rest is just Python, so If you know Python you can already use Faust to do stream processing, and it can integrate with just about anything.
Here’s one of the easier applications you can make:
import faust class Greeting(faust.Record): from_name: str to_name: str app = faust.App('hello-app', broker='kafka://localhost') topic = app.topic('hello-topic', value_type=Greeting) @app.agent(topic) async def hello(greetings): async for greeting in greetings: print(f'Hello from {greeting.from_name} to {greeting.to_name}') @app.timer(interval=1.0) async def example_sender(app): await hello.send( value=Greeting(from_name='Faust', to_name='you'), ) if __name__ == '__main__': app.main()
You’re probably a bit intimidated by the async and await keywords, but you don’t have to know how
asyncio
works to use Faust: just mimic the examples, and you’ll be fine.The example application starts two tasks: one is processing a stream, the other is a background thread sending events to that stream. In a real-life application, your system will publish events to Kafka topics that your processors can consume from, and the background thread is only needed to feed data into our example.
- Highly Available
Faust is highly available and can survive network problems and server crashes. In the case of node failure, it can automatically recover, and tables have standby nodes that will take over.
- Distributed
Start more instances of your application as needed.
- Fast
A single-core Faust worker instance can already process tens of thousands of events every second, and we are reasonably confident that throughput will increase once we can support a more optimized Kafka client.
- Flexible
Faust is just Python, and a stream is an infinite asynchronous iterator. If you know how to use Python, you already know how to use Faust, and it works with your favorite Python libraries like Django, Flask, SQLAlchemy, NLTK, NumPy, SciPy, TensorFlow, etc.
How do I use it?¶
What do I need?¶
Faust requires Python 3.8 or later, and a running Kafka broker.
There’s no plan to support earlier Python versions. Please get in touch if this is something you want to work on.
Extensions¶
Name |
Version |
Bundle |
|
5.0 |
|
|
aredis 1.1 |
|
|
0.20.0 |
|
|
3.2.1 |
|
|
0.8.1 |
|
|
1.16.0 |
|
|
5.1.0 |
|
Optimizations¶
These can be all installed using pip install faust-streaming[fast]
:
Name |
Version |
Bundle |
|
1.1.0 |
|
|
1.1.0 |
|
|
2.1.0 |
|
|
0.9.26 |
|
|
2.0.0 |
|
|
1.1.0 |
|
Debugging extras¶
These can be all installed using pip install faust-streaming[debug]
:
Name |
Version |
Bundle |
|
0.3 |
|
|
1.1.0 |
|
Note
See bundles in the Installation instructions section of this document for a list of supported https://pypi.org/project/setuptools/ extensions.
To specify multiple extensions at the same time
separate extensions with the comma:
$ pip install faust-streaming[uvloop,fast,rocksdb,datadog,redis]
RocksDB On MacOS Sierra
To install https://pypi.org/project/python-rocksdb/ on MacOS Sierra you need to specify some additional compiler flags:
$ CFLAGS='-std=c++11 -stdlib=libc++ -mmacosx-version-min=10.10' \
pip install -U --no-cache python-rocksdb
Design considerations¶
- Modern Python
Faust uses current Python 3 features such as
async
/await
and type annotations. It’s statically typed and verified by the mypy type checker. You can take advantage of type annotations when writing Faust applications, but this is not mandatory.- Library
Faust is designed to be used as a library, and embeds into any existing Python program, while also including helpers that make it easy to deploy applications without boilerplate.
- Supervised
The Faust worker is built up by many different services that start and stop in a certain order. These services can be managed by supervisors, but if encountering an irrecoverable error such as not recovering from a lost Kafka connections, Faust is designed to crash.
For this reason Faust is designed to run inside a process supervisor tool such as supervisord, Circus, or one provided by your Operating System.
- Extensible
Faust abstracts away storages, serializers, and even message transports, to make it easy for developers to extend Faust with new capabilities, and integrate into your existing systems.
- Lean
The source code is short and readable and serves as a good starting point for anyone who wants to learn how Kafka stream processing systems work.
Getting Help¶
Slack¶
For discussions about the usage, development, and future of Faust, please join the faust-streaming Slack at https://fauststream.slack.com.
Resources¶
Bug tracker¶
If you have any suggestions, bug reports, or annoyances please report them to our issue tracker at https://github.com/faust-streaming/faust/issues/
License¶
This software is licensed under the New BSD License. See the LICENSE
file in the top distribution directory for the full license text.