Configuration Reference¶
Required Settings¶
id
¶
- type:
str
A string uniquely identifying the app, shared across all instances such that two app instances with the same id are considered to be in the same “group”.
This parameter is required.
The id and Kafka
When using Kafka, the id is used to generate app-local topics, and names for consumer groups.
Commonly Used Settings¶
autodiscover
¶
- type:
Any
- default:
False
Automatic discovery of agents, tasks, timers, views and commands.
Faust has an API to add different asyncio
services and other
user extensions, such as “Agents”, HTTP web views,
command-line commands, and timers to your Faust workers.
These can be defined in any module, so to discover them at startup,
the worker needs to traverse packages looking for them.
Warning
The autodiscovery functionality uses the https://pypi.org/project/Venusian/ library
to scan wanted packages for @app.agent
, @app.page
,
@app.command
, @app.task
and @app.timer
decorators,
but to do so, it’s required to traverse the package path and import
every module in it.
Importing random modules like this can be dangerous so make sure you follow Python programming best practices. Do not start threads; perform network I/O; do test monkey-patching for mocks or similar, as a side effect of importing a module. If you encounter a case such as this then please find a way to perform your action in a lazy manner.
Warning
If the above warning is something you cannot fix, or if it’s out
of your control, then please set autodiscover=False
and make
sure the worker imports all modules where your
decorators are defined.
The value for this argument can be:
bool
If
App(autodiscover=True)
is set, the autodiscovery will scan the package name described in theorigin
attribute.The
origin
attribute is automatically set when you start a worker using the faust command line program, for example:faust -A example.simple worker
The
-A
, option specifies the app, but you can also create a shortcut entry point by callingapp.main()
:if __name__ == '__main__': app.main()
Then you can start the faust program by executing for example
python myscript.py worker --loglevel=INFO
, and it will use the correct application.Sequence[str]
The argument can also be a list of packages to scan:
app = App(..., autodiscover=['proj_orders', 'proj_accounts'])
Callable[[], Sequence[str]]
The argument can also be a function returning a list of packages to scan:
def get_all_packages_to_scan(): return ['proj_orders', 'proj_accounts'] app = App(..., autodiscover=get_all_packages_to_scan)
False
If everything you need is in a self-contained module, or you import the stuff you need manually, just set
autodiscover
to False and don’t worry about it :-)
Django
When using https://pypi.org/project/Django/ and the DJANGO_SETTINGS_MODULE
environment variable is set, the Faust app will scan all packages
found in the INSTALLED_APPS
setting.
If you’re using Django you can use this to scan for
agents/pages/commands in all packages
defined in INSTALLED_APPS
.
Faust will automatically detect that you’re using Django and do the right thing if you do:
app = App(..., autodiscover=True)
It will find agents and other decorators in all of the reusable Django applications. If you want to manually control what packages are traversed, then provide a list:
app = App(..., autodiscover=['package1', 'package2'])
or if you want exactly None
packages to be traversed,
then provide a False:
app = App(.., autodiscover=False)
which is the default, so you can simply omit the argument.
Tip
For manual control over autodiscovery, you can also call the
app.discover()
method manually.
datadir
¶
- type:
- default:
'{conf.name}-data'
- environment:
APP_DATADIR
- related-command-options:
faust --data-dir
Application data directory.
The directory in which this instance stores the data used by local tables, etc.
See also
The data directory can also be set using the
faust --datadir
option, from the command-line, so there is usually no reason to provide a default value when creating the app.
tabledir
¶
Application table data directory.
The directory in which this instance stores local table data.
Usually you will want to configure the datadir
setting,
but if you want to store tables separately you can configure this one.
If the path provided is relative (it has no leading slash), then the
path will be considered to be relative to the datadir
setting.
debug
¶
- type:
- default:
False
- environment:
APP_DEBUG
- related-command-options:
Use in development to expose sensor information endpoint.
Tip
If you want to enable the sensor statistics endpoint in production,
without enabling the debug
setting, you can do so
by adding the following code:
app.web.blueprints.add(
'/stats/', 'faust.web.apps.stats:blueprint')
env_prefix
¶
New in version 1.11.
- type:
- default:
None
- environment:
APP_ENV_PREFIX
Environment variable prefix.
When configuring Faust by environent variables, this adds a common prefix to all Faust environment value names.
id_format
¶
- type:
- default:
'{id}-v{self.version}'
- environment:
APP_ID_FORMAT
Application ID format template.
The format string used to generate the final id
value
by combining it with the version
parameter.
origin
¶
- type:
- default:
None
The reverse path used to find the app.
For example if the app is located in:
from myproj.app import app
Then the origin
should be "myproj.app"
.
The faust worker program will try to automatically set the origin, but if you are having problems with auto generated names then you can set origin manually.
timezone
¶
New in version 1.4.
- type:
- default:
datetime.timezone.utc
- environment:
TIMEZONE
Project timezone.
The timezone used for date-related functionality such as cronjobs.
version
¶
- type:
- default:
1
- environment:
APP_VERSION
App version.
Version of the app, that when changed will create a new isolated instance of the application. The first version is 1, the second version is 2, and so on.
Source topics will not be affected by a version change.
Faust applications will use two kinds of topics: source topics, and
internally managed topics. The source topics are declared by the
producer, and we do not have the opportunity to modify any
configuration settings, like number of partitions for a source
topic; we may only consume from them. To mark a topic as internal,
use: app.topic(..., internal=True)
.
blocking_timeout
¶
- type:
- default:
None
- environment:
BLOCKING_TIMEOUT
- related-command-options:
faust --blocking-timeout
Blocking timeout (in seconds).
When specified the worker will start a periodic signal based timer that only triggers when the loop has been blocked for a time exceeding this timeout.
This is the most safe way to detect blocking, but could have adverse effects on libraries that do not automatically retry interrupted system calls.
Python itself does retry all interrupted system calls since version 3.5 (see PEP 475), but this might not be the case with C extensions added to the worker by the user.
The blocking detector is a background thread that periodically wakes up to either arm a timer, or cancel an already armed timer. In pseudocode:
while True:
# cancel previous alarm and arm new alarm
signal.signal(signal.SIGALRM, on_alarm)
signal.setitimer(signal.ITIMER_REAL, blocking_timeout)
# sleep to wakeup just before the timeout
await asyncio.sleep(blocking_timeout * 0.96)
def on_alarm(signum, frame):
logger.warning('Blocking detected: ...')
If the sleep does not wake up in time the alarm signal will be sent to the process and a traceback will be logged.
broker
¶
Broker URL, or a list of alternative broker URLs.
Faust needs the URL of a “transport” to send and receive messages.
Currently, the only supported production transport is kafka://
.
This uses the https://pypi.org/project/aiokafka/ client under the hood, for consuming and
producing messages.
You can specify multiple hosts at the same time by separating them using the semi-comma:
kafka://kafka1.example.com:9092;kafka2.example.com:9092
Which in actual code looks like this:
BROKERS = 'kafka://kafka1.example.com:9092;kafka2.example.com:9092'
app = faust.App(
'id',
broker=BROKERS,
)
You can also pass a list of URLs:
app = faust.App(
'id',
broker=['kafka://kafka1.example.com:9092',
'kafka://kafka2.example.com:9092'],
)
See also
You can configure the transport used for consuming and producing
separately, by setting the broker_consumer
and
broker_producer
settings.
This setting is used as the default.
Available Transports
kafka://
Alias to
aiokafka://
aiokafka://
The recommended transport using the https://pypi.org/project/aiokafka/ client.
Limitations: None
broker_credentials
¶
New in version 1.5.
- type:
- default:
None
- environment:
BROKER_CREDENTIALS
Broker authentication mechanism.
Specify the authentication mechanism to use when connecting to the broker.
The default is to not use any authentication.
- SASL Authentication
You can enable SASL authentication via plain text:
app = faust.App( broker_credentials=faust.SASLCredentials( username='x', password='y', ))
Warning
Do not use literal strings when specifying passwords in production, as they can remain visible in stack traces.
Instead the best practice is to get the password from a configuration file, or from the environment:
BROKER_USERNAME = os.environ.get('BROKER_USERNAME') BROKER_PASSWORD = os.environ.get('BROKER_PASSWORD') app = faust.App( broker_credentials=faust.SASLCredentials( username=BROKER_USERNAME, password=BROKER_PASSWORD, ))
- OAuth2 Authentication
You can enable SASL authentication via OAuth2 Bearer tokens:
import faust from asyncio import get_running_loop from aiokafka.helpers import create_ssl_context from aiokafka.conn import AbstractTokenProvider class TokenProvider(AbstractTokenProvider): async def token(self): return await get_running_loop().run_in_executor( None, self.get_token) def get_token(self): return 'token' app = faust.App( broker_credentials=faust.OAuthCredentials( oauth_cb=TokenProvider() ssl_context=create_ssl_context() ) ) .. info:: The implementation should ensure token reuse so that multiple calls at connect time do not create multiple tokens. The implementation should also periodically refresh the token in order to guarantee that each call returns an unexpired token. Token Providers MUST implement the :meth:`token` method
- GSSAPI Authentication
GSSAPI authentication over plain text:
app = faust.App( broker_credentials=faust.GSSAPICredentials( kerberos_service_name='faust', kerberos_domain_name='example.com', ), )
GSSAPI authentication over SSL:
import ssl ssl_context = ssl.create_default_context( purpose=ssl.Purpose.SERVER_AUTH, cafile='ca.pem') ssl_context.load_cert_chain( 'client.cert', keyfile='client.key') app = faust.App( broker_credentials=faust.GSSAPICredentials( kerberos_service_name='faust', kerberos_domain_name='example.com', ssl_context=ssl_context, ), )
- SSL Authentication
Provide an SSL context for the Kafka broker connections.
This allows Faust to use a secure SSL/TLS connection for the Kafka connections and enabling certificate-based authentication.
import ssl ssl_context = ssl.create_default_context( purpose=ssl.Purpose.SERVER_AUTH, cafile='ca.pem') ssl_context.load_cert_chain( 'client.cert', keyfile='client.key') app = faust.App(..., broker_credentials=ssl_context)
ssl_context
¶
- type:
- default:
None
SSL configuration.
See credentials
.
logging_config
¶
New in version 1.5.
- type:
- default:
None
Logging dictionary configuration.
Optional dictionary for logging configuration, as supported
by logging.config.dictConfig()
.
loghandlers
¶
- type:
[
Handler
]
- default:
None
List of custom logging handlers.
Specify a list of custom log handlers to use in worker instances.
processing_guarantee
¶
New in version 1.5.
- type:
- default:
<ProcessingGuarantee.AT_LEAST_ONCE: 'at_least_once'>
- environment:
PROCESSING_GUARANTEE
The processing guarantee that should be used.
Possible values are “at_least_once” (default) and “exactly_once”.
Note that if exactly-once processing is enabled consumers are
configured with isolation.level="read_committed"
and producers
are configured with retries=Integer.MAX_VALUE
and
enable.idempotence=true
per default.
Note that by default exactly-once processing requires a cluster of
at least three brokers what is the recommended setting for production.
For development you can change this, by adjusting broker setting
transaction.state.log.replication.factor
to the number of brokers
you want to use.
store
¶
Table storage backend URL.
The backend used for table storage.
Tables are stored in-memory by default, but you should
not use the memory://
store in production.
In production, a persistent table store, such as rocksdb://
is
preferred.
cache
¶
New in version 1.2.
Cache backend URL.
Optional backend used for Memcached-style caching. URL can be:
redis://host
rediscluster://host
, ormemory://
.
Advanced Agent Settings¶
agent_supervisor
¶
Default agent supervisor type.
An agent may start multiple instances (actors) when
the concurrency setting is higher than one (e.g.
@app.agent(concurrency=2)
).
Multiple instances of the same agent are considered to be in the same supervisor group.
The default supervisor is the mode.OneForOneSupervisor
:
if an instance in the group crashes, we restart that instance only.
These are the supervisors supported:
-
If an instance in the group crashes we restart only that instance.
-
If an instance in the group crashes we restart the whole group.
-
If an instance in the group crashes we stop the whole application, and exit so that the Operating System supervisor can restart us.
mode.ForfeitOneForOneSupervisor
If an instance in the group crashes we give up on that instance and never restart it again (until the program is restarted).
mode.ForfeitOneForAllSupervisor
If an instance in the group crashes we stop all instances in the group and never restarted them again (until the program is restarted).
Advanced Broker Settings¶
broker_consumer
¶
New in version 1.7.
Consumer broker URL.
You can use this setting to configure the transport used for producing and consuming separately.
If not set the value found in broker
will be used.
broker_producer
¶
New in version 1.7.
Producer broker URL.
You can use this setting to configure the transport used for producing and consuming separately.
If not set the value found in broker
will be used.
broker_api_version
¶
New in version 1.10.
- type:
- default:
'auto'
- environment:
BROKER_API_VERSION
Broker API version,.
This setting is also the default for consumer_api_version
,
and producer_api_version
.
Negotiate producer protocol version.
The default value - “auto” means use the latest version supported by both client and server.
Any other version set means you are requesting a specific version of the protocol.
Example Kafka uses:
Disable sending headers for all messages produced
Kafka headers support was added in Kafka 0.11, so you can specify
broker_api_version="0.10"
to remove the headers from messages.
broker_check_crcs
¶
- type:
- default:
True
- environment:
BROKER_CHECK_CRCS
Broker CRC check.
Automatically check the CRC32 of the records consumed.
broker_client_id
¶
- type:
- default:
'faust-0.8.9'
- environment:
BROKER_CLIENT_ID
Broker client ID.
There is rarely any reason to configure this setting.
The client id is used to identify the software used, and is not usually configured by the user.
broker_commit_every
¶
- type:
- default:
10000
- environment:
BROKER_COMMIT_EVERY
Broker commit message frequency.
Commit offset every n messages.
See also broker_commit_interval
, which is how frequently
we commit on a timer when there are few messages being received.
broker_commit_interval
¶
Broker commit time frequency.
How often we commit messages that have been fully processed (acked).
broker_commit_livelock_soft_timeout
¶
Commit livelock timeout.
How long time it takes before we warn that the Kafka commit offset has not advanced (only when processing messages).
broker_heartbeat_interval
¶
New in version 1.0.11.
Broker heartbeat interval.
How often we send heartbeats to the broker, and also how often we expect to receive heartbeats from the broker.
If any of these time out, you should increase this setting.
broker_max_poll_interval
¶
New in version 1.7.
Broker max poll interval.
The maximum allowed time (in seconds) between calls to consume messages If this interval is exceeded the consumer is considered failed and the group will rebalance in order to reassign the partitions to another consumer group member. If API methods block waiting for messages, that time does not count against this timeout.
See KIP-62 for technical details.
broker_max_poll_records
¶
New in version 1.4.
- type:
- default:
None
- environment:
BROKER_MAX_POLL_RECORDS
Broker max poll records.
The maximum number of records returned in a single call to poll()
.
If you find that your application needs more time to process
messages you may want to adjust broker_max_poll_records
to tune the number of records that must be handled on every
loop iteration.
broker_rebalance_timeout
¶
New in version 1.10.
Broker rebalance timeout.
How long to wait for a node to finish rebalancing before the broker will consider it dysfunctional and remove it from the cluster.
Increase this if you experience the cluster being in a state of
constantly rebalancing, but make sure you also increase the
broker_heartbeat_interval
at the same time.
Note
The session timeout must not be greater than the
broker_request_timeout
.
broker_request_timeout
¶
New in version 1.4.
Kafka client request timeout.
Note
The request timeout must not be less than the
broker_session_timeout
.
broker_session_timeout
¶
New in version 1.0.11.
Broker session timeout.
How long to wait for a node to finish rebalancing before the broker will consider it dysfunctional and remove it from the cluster.
Increase this if you experience the cluster being in a state of
constantly rebalancing, but make sure you also increase the
broker_heartbeat_interval
at the same time.
Note
The session timeout must not be greater than the
broker_request_timeout
.
Advanced Consumer Settings¶
consumer_api_version
¶
New in version 1.10.
- type:
- default (alias to setting):
- environment:
CONSUMER_API_VERSION
Consumer API version.
Configures the broker API version to use for consumers.
See broker_api_version
for more information.
consumer_max_fetch_size
¶
New in version 1.4.
- type:
- default:
1048576
- environment:
CONSUMER_MAX_FETCH_SIZE
Consumer max fetch size.
The maximum amount of data per-partition the server will return. This size must be at least as large as the maximum message size.
Note: This is PER PARTITION, so a limit of 1Mb when your workers consume from 10 topics having 100 partitions each, means a fetch request can be up to a gigabyte (10 * 100 * 1Mb), This limit being too generous may cause rebalancing issues: if the amount of time required to flush pending data stuck in socket buffers exceed the rebalancing timeout.
You must keep this limit low enough to account for many partitions being assigned to a single node.
consumer_auto_offset_reset
¶
New in version 1.5.
- type:
- default:
'earliest'
- environment:
CONSUMER_AUTO_OFFSET_RESET
Consumer auto offset reset.
Where the consumer should start reading messages from when there is no initial offset, or the stored offset no longer exists, e.g. when starting a new consumer for the first time.
Options include ‘earliest’, ‘latest’, ‘none’.
consumer_group_instance_id
¶
New in version 2.1.
- type:
- default:
None
- environment:
CONSUMER_GROUP_INSTANCE_ID
Consumer group instance id.
The group_instance_id for static partition assignment.
If not set, default assignment strategy is used. Otherwise, each consumer instance has to have a unique id.
consumer_metadata_max_age_ms
¶
New in version 0.8.5.
- type:
- default:
300000
- environment:
CONSUMER_METADATA_MAX_AGE_MS
Consumer metadata max age milliseconds
The period of time in milliseconds after which we force a refresh of metadata even if we haven’t seen any partition leadership changes to proactively discover any new brokers or partitions.
Default: 300000
consumer_connections_max_idle_ms
¶
New in version 0.8.5.
- type:
- default:
540000
- environment:
CONSUMER_CONNECTIONS_MAX_IDLE_MS
Consumer connections max idle milliseconds.
Close idle connections after the number of milliseconds specified by this config.
Default: 540000 (9 minutes).
ConsumerScheduler
¶
New in version 1.5.
Consumer scheduler class.
A strategy which dictates the priority of topics and partitions for incoming records. The default strategy does first round-robin over topics and then round-robin over partitions.
Example using a class:
class MySchedulingStrategy(DefaultSchedulingStrategy):
...
app = App(..., ConsumerScheduler=MySchedulingStrategy)
Example using the string path to a class:
app = App(..., ConsumerScheduler='myproj.MySchedulingStrategy')
Serialization Settings¶
key_serializer
¶
Default key serializer.
Serializer used for keys by default when no serializer is specified, or a model is not being used.
This can be the name of a serializer/codec, or an actual
faust.serializers.codecs.Codec
instance.
See also
The Codecs section in the model guide – for more information about codecs.
value_serializer
¶
Default value serializer.
Serializer used for values by default when no serializer is specified, or a model is not being used.
This can be string, the name of a serializer/codec, or an actual
faust.serializers.codecs.Codec
instance.
See also
The Codecs section in the model guide – for more information about codecs.
Advanced Producer Settings¶
producer_acks
¶
- type:
- default:
-1
- environment:
PRODUCER_ACKS
Producer Acks.
The number of acknowledgments the producer requires the leader to have received before considering a request complete. This controls the durability of records that are sent. The following settings are common:
0
: Producer will not wait for any acknowledgment fromthe server at all. The message will immediately be considered sent (Not recommended).
1
: The broker leader will write the record to its locallog but will respond without awaiting full acknowledgment from all followers. In this case should the leader fail immediately after acknowledging the record but before the followers have replicated it then the record will be lost.
-1
: The broker leader will wait for the full set of in-syncreplicas to acknowledge the record. This guarantees that the record will not be lost as long as at least one in-sync replica remains alive. This is the strongest available guarantee.
producer_api_version
¶
New in version 1.5.3.
- type:
- default (alias to setting):
- environment:
PRODUCER_API_VERSION
Producer API version.
Configures the broker API version to use for producers.
See broker_api_version
for more information.
producer_compression_type
¶
- type:
- default:
None
- environment:
PRODUCER_COMPRESSION_TYPE
Producer compression type.
The compression type for all data generated by the producer.
Valid values are gzip, snappy, lz4, or None
.
producer_linger
¶
Producer batch linger configuration.
Minimum time to batch before sending out messages from the producer.
Should rarely have to change this.
producer_max_batch_size
¶
- type:
- default:
16384
- environment:
PRODUCER_MAX_BATCH_SIZE
Producer max batch size.
Max size of each producer batch, in bytes.
producer_max_request_size
¶
- type:
- default:
1000000
- environment:
PRODUCER_MAX_REQUEST_SIZE
Producer maximum request size.
Maximum size of a request in bytes in the producer.
Should rarely have to change this.
producer_partitioner
¶
New in version 1.2.
Producer partitioning strategy.
The Kafka producer can be configured with a custom partitioner to change how keys are partitioned when producing to topics.
The default partitioner for Kafka is implemented as follows, and can be used as a template for your own partitioner:
import random
from typing import List
from kafka.partitioner.hashed import murmur2
def partition(key: bytes,
all_partitions: List[int],
available: List[int]) -> int:
'''Default partitioner.
Hashes key to partition using murmur2 hashing
(from java client) If key is None, selects partition
randomly from available, or from all partitions if none
are currently available
Arguments:
key: partitioning key
all_partitions: list of all partitions sorted by
partition ID.
available: list of available partitions
in no particular order
Returns:
int: one of the values from ``all_partitions``
or ``available``.
'''
if key is None:
source = available if available else all_paritions
return random.choice(source)
index: int = murmur2(key)
index &= 0x7fffffff
index %= len(all_partitions)
return all_partitions[index]
producer_request_timeout
¶
New in version 1.4.
Producer request timeout.
Timeout for producer operations. This is set high by default, as this is also the time when producer batches expire and will no longer be retried.
producer_threaded
¶
New in version 0.4.5.
- type:
- default:
False
- environment:
PRODUCER_THREADED
Thread separate producer for send_soon.
If True, spin up a different producer in a different thread to be used for messages buffered up for producing via send_soon function.
producer_metadata_max_age_ms
¶
New in version 0.8.5.
- type:
- default:
300000
- environment:
PRODUCER_METADATA_MAX_AGE_MS
Producer metadata max age milliseconds
The period of time in milliseconds after which we force a refresh of metadata even if we haven’t seen any partition leadership changes to proactively discover any new brokers or partitions.
Default: 300000
producer_connections_max_idle_ms
¶
New in version 0.8.5.
- type:
- default:
540000
- environment:
PRODUCER_CONNECTIONS_MAX_IDLE_MS
Producer connections max idle milliseconds.
Close idle connections after the number of milliseconds specified by this config.
Default: 540000 (9 minutes).
Advanced Stream Settings¶
recovery_consistency_check
¶
New in version 0.4.7.
- type:
- default:
True
- environment:
RECOVERY_CONSISTENCY_CHECK
Check Kafka and local offsets for consistency.
If True, assert that Kafka highwater offsets >= local offset in the rocksdb state storee
store_check_exists
¶
New in version 0.6.0.
- type:
- default:
True
- environment:
STORE_CHECK_EXISTS
Execute exists on the underlying store.
If True, executes exists on the underlying store. If False client has to catch KeyError
crash_app_on_aerospike_exception
¶
New in version 0.6.3.
- type:
- default:
True
- environment:
CRASH_APP_ON_AEROSPIKE_EXCEPTION
Crashes the app on an aerospike Exceptions.
If True, crashes the app and prevents the commit offset on progressing. If False client has to catch the Error and implement a dead letter queue
aerospike_retries_on_exception
¶
New in version 0.6.10.
- type:
- default:
60
- environment:
AEROSPIKE_RETRIES_ON_EXCEPTION
Number of retries to aerospike on a runtime error from the aerospike client.
Set this to the number of retries using the aerospike client on a runtime Exception thrown by the client
aerospike_sleep_seconds_between_retries_on_exception
¶
New in version 0.6.10.
- type:
- default:
1
- environment:
AEROSPIKE_SLEEP_SECONDS_BETWEEN_RETRIES_ON_EXCEPTION
Seconds to sleep between retries to aerospike on a runtime error from the aerospike client.
Set this to the sleep in seconds between retries using the aerospike client on a runtime Exception thrown by the client
stream_buffer_maxsize
¶
- type:
- default:
4096
- environment:
STREAM_BUFFER_MAXSIZE
Stream buffer maximum size.
This setting control back pressure to streams and agents reading from streams.
If set to 4096 (default) this means that an agent can only keep at most 4096 unprocessed items in the stream buffer.
Essentially this will limit the number of messages a stream can “prefetch”.
Higher numbers gives better throughput, but do note that if your agent sends messages or update tables (which sends changelog messages).
This means that if the buffer size is large, the
broker_commit_interval
or broker_commit_every
settings must be set to commit frequently, avoiding back pressure
from building up.
A buffer size of 131_072 may let you process over 30,000 events a second as a baseline, but be careful with a buffer size that large when you also send messages or update tables.
stream_processing_timeout
¶
New in version 1.10.
Stream processing timeout.
Timeout (in seconds) for processing events in the stream. If processing of a single event exceeds this time we log an error, but do not stop processing.
If you are seeing a warning like this you should either
increase this timeout to allow agents to spend more time on a single event, or
add a timeout to the operation in the agent, so stream processing always completes before the timeout.
The latter is preferred for network operations such as web requests. If a network service you depend on is temporarily offline you should consider doing retries (send to separate topic):
main_topic = app.topic('main')
deadletter_topic = app.topic('main_deadletter')
async def send_request(value, timeout: Optional[float] = None) -> None:
await app.http_client.get('http://foo.com', timeout=timeout)
@app.agent(main_topic)
async def main(stream):
async for value in stream:
try:
await send_request(value, timeout=5)
except asyncio.TimeoutError:
await deadletter_topic.send(value)
@app.agent(deadletter_topic)
async def main_deadletter(stream):
async for value in stream:
# wait for 30 seconds before retrying.
await stream.sleep(30)
await send_request(value)
stream_publish_on_commit
¶
- type:
- default:
False
Stream delay producing until commit time.
If enabled we buffer up sending messages until the
source topic offset related to that processing is committed.
This means when we do commit, we may have buffered up a LOT of messages
so commit needs to happen frequently (make sure to decrease
broker_commit_every
).
stream_recovery_delay
¶
New in version 1.3.
Changed in version 1.5.3: Disabled by default.
Stream recovery delayl
Number of seconds to sleep before continuing after rebalance. We wait for a bit to allow for more nodes to join/leave before starting recovery tables and then processing streams. This to minimize the chance of errors rebalancing loops.
stream_wait_empty
¶
- type:
- default:
True
- environment:
STREAM_WAIT_EMPTY
Stream wait empty.
This setting controls whether the worker should wait for the currently processing task in an agent to complete before rebalancing or shutting down.
On rebalance/shut down we clear the stream buffers. Those events will be reprocessed after the rebalance anyway, but we may have already started processing one event in every agent, and if we rebalance we will process that event again.
By default we will wait for the currently active tasks, but if your streams are idempotent you can disable it using this setting.
Agent RPC Settings¶
Advanced Table Settings¶
table_cleanup_interval
¶
Table cleanup interval.
How often we cleanup tables to remove expired entries.
table_key_index_size
¶
New in version 1.7.
- type:
- default:
1000
- environment:
TABLE_KEY_INDEX_SIZE
Table key index size.
Tables keep a cache of key to partition number to speed up table lookups.
This setting configures the maximum size of that cache.
table_standby_replicas
¶
- type:
- default:
1
- environment:
TABLE_STANDBY_REPLICAS
Table standby replicas.
The number of standby replicas for each table.
Topic Settings¶
topic_allow_declare
¶
New in version 1.5.
- type:
- default:
True
- environment:
TOPIC_ALLOW_DECLARE
Allow creating new topics.
This setting disables the creation of internal topics.
Faust will only create topics that it considers to be fully owned and managed, such as intermediate repartition topics, table changelog topics etc.
Some Kafka managers does not allow services to create topics,
in that case you should set this to False
.
topic_disable_leader
¶
New in version 1.7.
- type:
- default:
False
- environment:
TOPIC_DISABLE_LEADER
Disable leader election topic.
This setting disables the creation of the leader election topic.
If you’re not using the on_leader=True
argument to task/timer/etc.,
decorators then use this setting to disable creation of the topic.
topic_partitions
¶
- type:
- default:
8
- environment:
TOPIC_PARTITIONS
Topic partitions.
Default number of partitions for new topics.
Note
This defines the maximum number of workers we could distribute the workload of the application (also sometimes referred as the sharding factor of the application).
topic_replication_factor
¶
- type:
- default:
1
- environment:
TOPIC_REPLICATION_FACTOR
Topic replication factor.
The default replication factor for topics created by the application.
Note
Generally this should be the same as the configured replication factor for your Kafka cluster.
Advanced Web Server Settings¶
web
¶
New in version 1.2.
Web server driver to use.
web_bind
¶
New in version 1.2.
- type:
- default:
'0.0.0.0'
- environment:
WEB_BIND
- related-command-options:
Web network interface binding mask.
The IP network address mask that decides what interfaces the web server will bind to.
By default this will bind to all interfaces.
This option is usually set by faust worker --web-bind
,
not by passing it as a keyword argument to app
.
web_cors_options
¶
New in version 1.5.
- type:
- default:
None
Cross Origin Resource Sharing options.
Enable Cross-Origin Resource Sharing options for all web views in the internal web server.
This should be specified as a dictionary of
URLs to ResourceOptions
:
app = App(..., web_cors_options={
'http://foo.example.com': ResourceOptions(
allow_credentials=True,
allow_methods='*'k,
)
})
Individual views may override the CORS options used as
arguments to to @app.page
and blueprint.route
.
web_enabled
¶
New in version 1.2.
- type:
- default:
True
- environment:
APP_WEB_ENABLED
- related-command-options:
faust worker --with-web
Enable/disable internal web server.
Enable web server and other web components.
This option can also be set using faust worker --without-web
.
web_host
¶
New in version 1.2.
- type:
- default (template):
'{conf.NODE_HOSTNAME}'
- environment:
WEB_HOST
- related-command-options:
Web server host name.
Hostname used to access this web server, used for generating
the canonical_url
setting.
This option is usually set by faust worker --web-host
,
not by passing it as a keyword argument to app
.
web_in_thread
¶
New in version 1.5.
- type:
- default:
False
Run the web server in a separate thread.
Use this if you have a large value for
stream_buffer_maxsize
and want the web server
to be responsive when the worker is otherwise busy processing streams.
Note
Running the web server in a separate thread means web views and agents will not share the same event loop.
web_port
¶
New in version 1.2.
- type:
- default:
6066
- environment:
WEB_PORT
- related-command-options:
Web server port.
A port number between 1024 and 65535 to use for the web server.
This option is usually set by faust worker --web-port
,
not by passing it as a keyword argument to app
.
web_ssl_context
¶
New in version 0.5.0.
- type:
- default:
None
Web server SSL configuration.
See credentials
.
web_transport
¶
New in version 1.2.
Network transport used for the web server.
Default is to use TCP, but this setting also enables you to use Unix domainN sockets. To use domain sockets specify an URL including the path to the file you want to create like this:
unix:///tmp/server.sock
This will create a new domain socket available
in /tmp/server.sock
.
canonical_url
¶
- type:
- default (template):
'http://{conf.web_host}:{conf.web_port}'
- environment:
NODE_CANONICAL_URL
- related-command-options:
- related-settings:
Node specific canonical URL.
You shouldn’t have to set this manually.
The canonical URL defines how to reach the web server on a running
worker node, and is usually set by combining the
web_host
and web_port
settings.
Advanced Worker Settings¶
worker_redirect_stdouts
¶
- type:
- default:
True
- environment:
WORKER_REDIRECT_STDOUTS
Redirecting standard outputs.
Enable to have the worker redirect output to sys.stdout
and
sys.stderr
to the Python logging system.
Enabled by default.
worker_redirect_stdouts_level
¶
Level used when redirecting standard outputs.
The logging level to use when redirect STDOUT/STDERR to logging.
Extension Settings¶
Agent
¶
Agent class type.
The Agent
class to use for agents, or the
fully-qualified path to one (supported by
symbol_by_name()
).
Example using a class:
class MyAgent(faust.Agent):
...
app = App(..., Agent=MyAgent)
Example using the string path to a class:
app = App(..., Agent='myproj.agents.Agent')
Event
¶
Event class type.
The Event
class to use for creating new event objects,
or the fully-qualified path to one (supported by
symbol_by_name()
).
Example using a class:
class MyBaseEvent(faust.Event):
...
app = App(..., Event=MyBaseEvent)
Example using the string path to a class:
app = App(..., Event='myproj.events.Event')
Schema
¶
Schema class type.
The Schema
class to use as the default
schema type when no schema specified. or the fully-qualified
path to one (supported by symbol_by_name()
).
Example using a class:
class MyBaseSchema(faust.Schema):
...
app = App(..., Schema=MyBaseSchema)
Example using the string path to a class:
app = App(..., Schema='myproj.schemas.Schema')
Stream
¶
Stream class type.
The Stream
class to use for streams, or the
fully-qualified path to one (supported by
symbol_by_name()
).
Example using a class:
class MyBaseStream(faust.Stream):
...
app = App(..., Stream=MyBaseStream)
Example using the string path to a class:
app = App(..., Stream='myproj.streams.Stream')
Table
¶
Table class type.
The Table
class to use for tables, or the
fully-qualified path to one (supported by
symbol_by_name()
).
Example using a class:
class MyBaseTable(faust.Table):
...
app = App(..., Table=MyBaseTable)
Example using the string path to a class:
app = App(..., Table='myproj.tables.Table')
SetTable
¶
SetTable extension table.
The SetTable
class to use for table-of-set tables,
or the fully-qualified path to one (supported
by symbol_by_name()
).
Example using a class:
class MySetTable(faust.SetTable):
...
app = App(..., Table=MySetTable)
Example using the string path to a class:
app = App(..., Table='myproj.tables.MySetTable')
GlobalTable
¶
GlobalTable class type.
The GlobalTable
class to use for tables,
or the fully-qualified path to one (supported by
symbol_by_name()
).
Example using a class:
class MyBaseGlobalTable(faust.GlobalTable):
...
app = App(..., GlobalTable=MyBaseGlobalTable)
Example using the string path to a class:
app = App(..., GlobalTable='myproj.tables.GlobalTable')
SetGlobalTable
¶
SetGlobalTable class type.
The SetGlobalTable
class to use for tables, or the
fully-qualified path to one (supported by
symbol_by_name()
).
Example using a class:
class MyBaseSetGlobalTable(faust.SetGlobalTable):
...
app = App(..., SetGlobalTable=MyBaseGlobalSetTable)
Example using the string path to a class:
app = App(..., SetGlobalTable='myproj.tables.SetGlobalTable')
TableManager
¶
Table manager class type.
The TableManager
used for managing tables,
or the fully-qualified path to one (supported by
symbol_by_name()
).
Example using a class:
from faust.tables import TableManager
class MyTableManager(TableManager):
...
app = App(..., TableManager=MyTableManager)
Example using the string path to a class:
app = App(..., TableManager='myproj.tables.TableManager')
Serializers
¶
Serializer registry class type.
The Registry
class used for
serializing/deserializing messages; or the fully-qualified path
to one (supported by symbol_by_name()
).
Example using a class:
from faust.serialiers import Registry
class MyRegistry(Registry):
...
app = App(..., Serializers=MyRegistry)
Example using the string path to a class:
app = App(..., Serializers='myproj.serializers.Registry')
Worker
¶
Worker class type.
The Worker
class used for starting a worker
for this app; or the fully-qualified path
to one (supported by symbol_by_name()
).
Example using a class:
import faust
class MyWorker(faust.Worker):
...
app = faust.App(..., Worker=Worker)
Example using the string path to a class:
app = faust.App(..., Worker='myproj.workers.Worker')
PartitionAssignor
¶
Partition assignor class type.
The PartitionAssignor
class used for assigning
topic partitions to worker instances; or the fully-qualified path
to one (supported by symbol_by_name()
).
Example using a class:
from faust.assignor import PartitionAssignor
class MyPartitionAssignor(PartitionAssignor):
...
app = App(..., PartitionAssignor=PartitionAssignor)
Example using the string path to a class:
app = App(..., Worker='myproj.assignor.PartitionAssignor')
LeaderAssignor
¶
Leader assignor class type.
The LeaderAssignor
class used for assigning
a master Faust instance for the app; or the fully-qualified path
to one (supported by symbol_by_name()
).
Example using a class:
from faust.assignor import LeaderAssignor
class MyLeaderAssignor(LeaderAssignor):
...
app = App(..., LeaderAssignor=LeaderAssignor)
Example using the string path to a class:
app = App(..., Worker='myproj.assignor.LeaderAssignor')
Router
¶
Router class type.
The Router
class used for routing requests
to a worker instance having the partition for a specific
key (e.g. table key); or the fully-qualified path to one
(supported by symbol_by_name()
).
Example using a class:
from faust.router import Router
class MyRouter(Router):
...
app = App(..., Router=Router)
Example using the string path to a class:
app = App(..., Router='myproj.routers.Router')
Topic
¶
Topic class type.
The Topic
class used for defining new topics; or the
fully-qualified path to one (supported by
symbol_by_name()
).
Example using a class:
import faust
class MyTopic(faust.Topic):
...
app = faust.App(..., Topic=MyTopic)
Example using the string path to a class:
app = faust.App(..., Topic='myproj.topics.Topic')
HttpClient
¶
Http client class type
The aiohttp.client.ClientSession
class used as
a HTTP client; or the fully-qualified path to one (supported by
symbol_by_name()
).
Example using a class:
import faust
from aiohttp.client import ClientSession
class HttpClient(ClientSession):
...
app = faust.App(..., HttpClient=HttpClient)
Example using the string path to a class:
app = faust.App(..., HttpClient='myproj.http.HttpClient')
Monitor
¶
Monitor sensor class type.
The Monitor
class as the main sensor
gathering statistics for the application; or the
fully-qualified path to one (supported by
symbol_by_name()
).
Example using a class:
import faust
from faust.sensors import Monitor
class MyMonitor(Monitor):
...
app = faust.App(..., Monitor=MyMonitor)
Example using the string path to a class:
app = faust.App(..., Monitor='myproj.monitors.Monitor')