Skip to content

apache_beam + beam integration not sending exceptions in GCP #4203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gshoultz42 opened this issue Mar 26, 2025 · 14 comments
Open

apache_beam + beam integration not sending exceptions in GCP #4203

gshoultz42 opened this issue Mar 26, 2025 · 14 comments

Comments

@gshoultz42
Copy link

How do you use Sentry?

Sentry Saas (sentry.io)

Version

2.23.1

Steps to Reproduce

  1. Add sentry to python dataflow job
  2. Add try catch to process on a DoFn
  3. inside of catch, capture_Exception
  4. set dataflow job as stream in GCP
  5. Deploy dataflow job to cloud runner
  6. send event that will trigger exception

Expected Result

Error exception is sent to Sentry

Actual Result

Sentry does not get an exception.

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Mar 26, 2025
@antonpirker
Copy link
Member

Hey @gshoultz42 !
Can you show me your sentry_sdk.init() and also the call to capture_exception() including the try/except?

What you also can do:

  • Enable debug output (sentry_sdk.init(debug=True)) then you will see console output of what the Sentry SDK is doing. (Look for log messages that contain Sending envelope)
  • Make sure that wherever the Sentry SDK is running, it can make a HTTP connection to ingest.sentry.io.

Hope this helps

@antonpirker
Copy link
Member

I just fed your question into our bot in our discord server, and it has these suggestions: https://discord.com/channels/621778831602221064/1354782809020960779/1354782809020960779 (its not super helpful I have to admit, but it is a nice experiment)

@gshoultz42
Copy link
Author

gshoultz42 commented Mar 27, 2025

Sentry Init

sentry_sdk.init(
        dsn=sentry_dsn,
        environment=env,
        integrations=[BeamIntegration()],
    )

Exception call

class ParseEvent(beam.DoFn):
    def process(self, element: bytes, *args, **kwargs):
        try:
            event_json = json.loads(element)
            event_name = event_json["message"]["payload"]["eventName"]
            model = get_model(event_name)
            event = model.model_validate_json(element)
            yield event
        except (ValidationError, KeyError, JSONDecodeError, TypeError) as error:
            logging.exception("Error parsing event: %r", element)
            error_output = {
                "error": repr(error),
                "payload": (
                    event_json
                    if not isinstance(error, JSONDecodeError)
                    else element.decode()
                ),
            }
            yield TaggedOutput("parse_errors", json.dumps(error_output).encode())

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Mar 27, 2025
@antonpirker
Copy link
Member

Thanks for the follow up!

The line

logging.exception("Error parsing event: %r", element)

should send an error if the logging integration is enabled (which the sentry SDK enables by default.)

If you want to have your error to show up in sentry you can add a sentry_sdk.capture_exception(error) in your except block.

Because you handle the exception yourself, it never bubbles up for Sentry to capture it.

You could also add a simple raise after your last yield so the Exception is raised again and eventually captured by Sentry.

@gshoultz42
Copy link
Author

Wanted to provide an update. I am still looking into this. I added debug to sentry but did not get the log. Working with our admin to make sure the logging is setup correctly in our GCP

@gshoultz42
Copy link
Author

gshoultz42 commented Apr 23, 2025

I see this on the main job but I don't see any logs from the worker logs where the exception is happening
`[sentry] DEBUG: Setting up integrations (with default = True)
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.aiohttp.AioHttpIntegration: AIOHTTP not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.anthropic.AnthropicIntegration: Anthropic not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.ariadne.AriadneIntegration: ariadne is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.arq.ArqIntegration: Arq is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.asyncpg.AsyncPGIntegration: asyncpg not installed.
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.boto3.Boto3Integration: botocore is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.bottle.BottleIntegration: Bottle not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.celery.CeleryIntegration: Celery not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.chalice.ChaliceIntegration: Chalice is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.clickhouse_driver.ClickhouseDriverIntegration: clickhouse-driver not installed.
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.cohere.CohereIntegration: Cohere not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.django.DjangoIntegration: Django not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.falcon.FalconIntegration: Falcon not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.fastapi.FastApiIntegration: Starlette is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.flask.FlaskIntegration: Flask is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.gql.GQLIntegration: gql is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.graphene.GrapheneIntegration: graphene is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.httpx.HttpxIntegration: httpx is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.huey.HueyIntegration: Huey is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.huggingface_hub.HuggingfaceHubIntegration: Huggingface not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.langchain.LangchainIntegration: langchain not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.litestar.LitestarIntegration: Litestar is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.loguru.LoguruIntegration: LOGURU is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.openai.OpenAIIntegration: OpenAI not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.pyramid.PyramidIntegration: Pyramid not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.quart.QuartIntegration: Quart is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.rq.RqIntegration: RQ not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.sanic.SanicIntegration: Sanic not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.sqlalchemy.SqlalchemyIntegration: SQLAlchemy not installed.
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.starlette.StarletteIntegration: Starlette is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.starlite.StarliteIntegration: Starlite is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.strawberry.StrawberryIntegration: strawberry-graphql is not installed
[sentry] DEBUG: Did not import default integration sentry_sdk.integrations.tornado.TornadoIntegration: Tornado not installed
[sentry] DEBUG: Setting up previously not enabled integration beam
[sentry] DEBUG: Setting up previously not enabled integration argv
[sentry] DEBUG: Setting up previously not enabled integration atexit
[sentry] DEBUG: Setting up previously not enabled integration dedupe
[sentry] DEBUG: Setting up previously not enabled integration excepthook
[sentry] DEBUG: Setting up previously not enabled integration logging
[sentry] DEBUG: Setting up previously not enabled integration modules
[sentry] DEBUG: Setting up previously not enabled integration stdlib
[sentry] DEBUG: Setting up previously not enabled integration threading
[sentry] DEBUG: Setting up previously not enabled integration pymongo
[sentry] DEBUG: Setting up previously not enabled integration redis
[sentry] DEBUG: Enabling integration beam
[sentry] DEBUG: Enabling integration argv
[sentry] DEBUG: Enabling integration atexit
[sentry] DEBUG: Enabling integration dedupe
[sentry] DEBUG: Enabling integration excepthook
[sentry] DEBUG: Enabling integration logging
[sentry] DEBUG: Enabling integration modules
[sentry] DEBUG: Enabling integration stdlib
[sentry] DEBUG: Enabling integration threading
[sentry] DEBUG: Enabling integration pymongo
[sentry] DEBUG: Enabling integration redis
[sentry] DEBUG: Setting SDK name to 'sentry.python.beam'
[sentry] DEBUG: [Profiling] Setting up continuous profiler in thread mode

[sentry] DEBUG: Sending envelope [envelope with 1 items (error)] project:xxx host:o58632.ingest.us.sentry.io
DEBUG:sentry_sdk.errors:Sending envelope [envelope with 1 items (error)] project:xxx host:o58632.ingest.us.sentry.io
DEBUG:urllib3.connectionpool:Starting new HTTPS connection (1): o58632.ingest.us.sentry.io:443
assert duration or terminated, (
AssertionError: Job did not reach to a terminal state after waiting indefinitely. Console URL: https://console.cloud.google.com/dataflow/jobs//2025-04-23_13_00_24-3193749921814308417?project=
[sentry] DEBUG: atexit: got shutdown signal
DEBUG:sentry_sdk.errors:atexit: got shutdown signal
[sentry] DEBUG: atexit: shutting down client
DEBUG:sentry_sdk.errors:atexit: shutting down client
[sentry] DEBUG: Flushing HTTP transport
DEBUG:sentry_sdk.errors:Flushing HTTP transport
[sentry] DEBUG: background worker got flush request
DEBUG:sentry_sdk.errors:background worker got flush request
[sentry] DEBUG: 2 event(s) pending on flush
DEBUG:sentry_sdk.errors:2 event(s) pending on flush
Sentry is attempting to send 2 pending events
Waiting up to 2 seconds
Press Ctrl-C to quit
DEBUG:urllib3.connectionpool:https://xx.ingest.us.sentry.io:443 "POST /api/xx/envelope/ HTTP/1.1" 200 0
[sentry] DEBUG: background worker flushed
DEBUG:sentry_sdk.errors:background worker flushed
[sentry] DEBUG: Killing HTTP transport
DEBUG:sentry_sdk.errors:Killing HTTP transport
[sentry] DEBUG: background worker got kill request
DEBUG:sentry_sdk.errors:background worker got kill request
`

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Apr 23, 2025
@antonpirker
Copy link
Member

Hey @gshoultz42 thanks for the logs.

From the log you posted I see:
The SDK sends one error to Sentry. (envelope is the generic format we created to send all kinds of data types to Sentry)

[sentry] DEBUG: Sending envelope [envelope with 1 items (error)] project:xxx host:o58632.ingest.us.sentry.io

There is also a debug from urllib3 that the post request succeeded:

DEBUG:urllib3.connectionpool:[https://xx.ingest.us.sentry.io:443](https://xx.ingest.us.sentry.io/) "POST /api/xx/envelope/ HTTP/1.1" 200 0

So everything looks normal and the error should end up in Sentry. As far as I can tell, the SDK is not the problem.

You can have a look at https://sentry.io/stats/ to see if received errors have been dropped. (Errors can be dropped by "inbound data filters" (that you can configure) or if you are over quota, or by rate limitting if you send tons of errors at the same time)

@gshoultz42
Copy link
Author

gshoultz42 commented Apr 24, 2025 via email

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Apr 24, 2025
@antonpirker
Copy link
Member

The line logging.exception("Error parsing event: %r", element) is sending one error to Sentry. But the error you are capturing in your except (ValidationError, KeyError, JSONDecodeError, TypeError) as error: block is handled by your code, so it is NOT sent to sentry.

You can do a sentry_sdk.capture_exception(error) in your except block to send it to Sentry.

In general Sentry is only capturing unhandled errors from your application. If you catch them and handle them, they will not be sent to Sentry.

@gshoultz42
Copy link
Author

gshoultz42 commented Apr 25, 2025

Current code block that is not working

except (ValidationError, KeyError, JSONDecodeError, TypeError) as error:
            with sentry_sdk.push_scope() as scope:
                # Add any additional context
                scope.set_extra("exception", str(error))
                sentry_sdk.capture_exception(error)
                exc_info = exc_info_from_error(error)
                exceptions = exceptions_from_error_tuple(exc_info)
                sentry_sdk.capture_event(
                    {
                        "message": str(error),
                        "level": "error",
                        "exception": {"values": exceptions},
                    }
                )
            logging.exception("Error parsing event: %r", element)
            error_output = {
                "error": repr(error),
                "payload": (
                    event_json
                    if not isinstance(error, JSONDecodeError)
                    else element.decode()
                ),
            }

I should have posted this sooner. I made some changes mentioned in the discord channel as well.

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Apr 25, 2025
@gshoultz42
Copy link
Author

I did the following actions as well.

  • I updated to a newer version as I saw some changes that may have been related to this issue. 2.27.0
  • I created a custom transport, one was an httpTransport and another was just a Transport. Both would send log entries and print out the information into std output. I did not get any logs
  • I confirmed that the workers did have the correct sentry_sdk and at one point, I removed the sdk from the requirements file which resulted in an expected error.

The sentry apache_beam demo is not a streaming dataflow job and I am curious if it is something related to that or if it is related to the GCP included parts of the apache_beam sdk apache_beam[gcp]

@szokeasaurusrex
Copy link
Member

Hi @gshoultz42, to send the exception to Sentry, all you should need is the sentry_sdk.capture_exception line. Please try modifying your snippet like so:

     except (ValidationError, KeyError, JSONDecodeError, TypeError) as error:
+           sentry_sdk.capture_exception(error)
-           with sentry_sdk.push_scope() as scope:
-               # Add any additional context
-               scope.set_extra("exception", str(error))
-               sentry_sdk.capture_exception(error)
-               exc_info = exc_info_from_error(error)
-               exceptions = exceptions_from_error_tuple(exc_info)
-               sentry_sdk.capture_event(
-                   {
-                       "message": str(error),
-                       "level": "error",
-                       "exception": {"values": exceptions},
-                   }
-               )
            logging.exception("Error parsing event: %r", element)
            error_output = {
                "error": repr(error),
                "payload": (
                    event_json
                    if not isinstance(error, JSONDecodeError)
                    else element.decode()
                ),
            }

Note that the default issue view in Sentry only shows issues with medium or high issue priority. Manually captured exceptions typically have low priority, and would be filtered out by the default filter.

You can see the filter in the "Issues" page search bar:

Image

To see the manually-captured exception, you need to remove the issue priority filter.

@gshoultz42
Copy link
Author

That was what I had in the beginning but I updated to the code just in case and I got the same result.
Still can't see the debug log

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Apr 28, 2025
@szokeasaurusrex
Copy link
Member

That was what I had in the beginning but I updated to the code just in case and I got the same result.
Still can't see the debug log

@gshoultz42 I understand, that must be quite frustrating.

In any case, you should use the simplified code snippet I sent you with only the sentry_sdk.capture_exception; the other stuff you added should not have any affect on the logs, and they may also cause unexpected behavior, since you are essentially sending the exception twice with what you wrote.

But, just to confirm, which Python SDK version were you using just now? And, you seeing any other log outputs from sources other than the Sentry-Python SDK? Also, is the error event still not showing up in Sentry?

It would be super helpful if you could also provide the full logs from your latest attempt, and also the code you are running (especially the sentry_sdk.init call, and the lines where the exception is raised).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

No branches or pull requests

5 participants