Skip to content

Failed to load ingestion sources #13181

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
michaelkrieg opened this issue Apr 11, 2025 · 4 comments
Open

Failed to load ingestion sources #13181

michaelkrieg opened this issue Apr 11, 2025 · 4 comments
Labels
bug Bug report

Comments

@michaelkrieg
Copy link

Describe the bug
After updating to the recent 1.0.0 version, ingestion sources do not work anymore.

To Reproduce
Go to "Admin / Data Sources" and recognize a red error message at the top and an empty list of sources.

Expected behavior
Both existing data sources and newly added ones should appear.

Additional context
The following errors show in container logs:

2025-04-11 06:58:26,277 [ForkJoinPool.commonPool-worker-1779] ERROR c.l.m.s.e.query.ESSearchDAO:165 - Search query failed java.lang.NullPointerException: null 2025-04-11 06:58:26,277 [ForkJoinPool.commonPool-worker-1779] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:45 - Failed to execute java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to list ingestion sources at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1770) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1760) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165) Caused by: java.lang.RuntimeException: Failed to list ingestion sources at com.linkedin.datahub.graphql.resolvers.ingest.source.ListIngestionSourcesResolver.lambda$get$1(ListIngestionSourcesResolver.java:119) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ... 6 common frames omitted Caused by: com.datahub.util.exception.ESQueryException: Search query failed: at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.lambda$executeAndExtract$1(ESSearchDAO.java:166) at io.datahubproject.metadata.context.TraceContext.withSpan(TraceContext.java:110) at io.datahubproject.metadata.context.OperationContext.withSpan(OperationContext.java:391) at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:147) at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.search(ESSearchDAO.java:338) at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:173) at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:155) at com.linkedin.metadata.client.JavaEntityClient.search(JavaEntityClient.java:458) at com.linkedin.datahub.graphql.resolvers.ingest.source.ListIngestionSourcesResolver.lambda$get$1(ListIngestionSourcesResolver.java:83) ... 7 common frames omitted Caused by: java.lang.NullPointerException: null 2025-04-11 06:58:26,278 [ForkJoinPool.commonPool-worker-1773] ERROR c.datahub.graphql.GraphQLController:153 - Errors while executing query: query listIngestionSources($input: ListIngestionSourcesInput!) { listIngestionSources(input: $input) { start count total ingestionSources { urn name type config { recipe version executorId debugMode extraArgs { key value __typename } __typename } schedule { interval timezone __typename } platform { urn __typename ..., result: {errors=[{message=An unknown error occurred., locations=[{line=2, column=3}], path=[listIngestionSources], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}], data={listIngestionSources=null}, extensions={tracing={version=1, startTime=2025-04-11T06:58:26.265025322Z, endTime=2025-04-11T06:58:26.278222119Z, duration=13199038, parsing={startOffset=622316, duration=594755}, validation={startOffset=1183528, duration=514351}, execution={resolvers=[{path=[listIngestionSources], parentType=Query, returnType=ListIngestionSourcesResult, fieldName=listIngestionSources, startOffset=1879409, duration=10967547}]}}}}, errors: [DataHubGraphQLError{path=[listIngestionSources], code=SERVER_ERROR, locations=[SourceLocation{line=2, column=3}]}]

@michaelkrieg michaelkrieg added the bug Bug report label Apr 11, 2025
@FatRunner
Copy link

Hi. similar problem.
After updating to the recent 1.0.0 version, ingestion sources do not work.

Logs pod datahub-gms

2025-04-15 16:04:43,809 [Thread-27267] INFO c.datahub.graphql.GraphQLController:143 - Executing operation listIngestionSources for qtp1093181064-8195 2025-04-15 16:04:43,812 [Thread-27267] INFO c.l.d.g.i.DataHubFieldComplexityCalculator:38 - Query complexity for query: listIngestionSources is 38 2025-04-15 16:04:43,834 [Thread-27268] ERROR c.l.m.s.e.query.ESSearchDAO:165 - Search query failed java.lang.NullPointerException: Cannot invoke "Object.toString()" because the return value of "java.util.Map.get(Object)" is null at com.linkedin.metadata.search.elasticsearch.query.request.SearchRequestHandler.getUrnFromSearchHit(SearchRequestHandler.java:536) at com.linkedin.metadata.search.elasticsearch.query.request.SearchRequestHandler.getResult(SearchRequestHandler.java:511) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:992) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) at com.linkedin.metadata.search.elasticsearch.query.request.SearchRequestHandler.getRestrictedResults(SearchRequestHandler.java:530) at com.linkedin.metadata.search.elasticsearch.query.request.SearchRequestHandler.extractResult(SearchRequestHandler.java:406) at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.lambda$executeAndExtract$1(ESSearchDAO.java:163) at io.datahubproject.metadata.context.TraceContext.withSpan(TraceContext.java:110) at io.datahubproject.metadata.context.OperationContext.withSpan(OperationContext.java:391) at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:147) at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.search(ESSearchDAO.java:338) at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:173) at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:155) at com.linkedin.metadata.client.JavaEntityClient.search(JavaEntityClient.java:458) at com.linkedin.datahub.graphql.resolvers.ingest.source.ListIngestionSourcesResolver.lambda$get$1(ListIngestionSourcesResolver.java:83) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) at java.base/java.lang.Thread.run(Thread.java:840) 2025-04-15 16:04:43,835 [Thread-27268] ERROR c.l.d.g.e.DataHubDataFetcherExceptionHandler:45 - Failed to execute java.util.concurrent.CompletionException: java.lang.RuntimeException: Failed to list ingestion sources at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:315) at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:320) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1770) at java.base/java.lang.Thread.run(Thread.java:840) Caused by: java.lang.RuntimeException: Failed to list ingestion sources at com.linkedin.datahub.graphql.resolvers.ingest.source.ListIngestionSourcesResolver.lambda$get$1(ListIngestionSourcesResolver.java:119) at java.base/java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1768) ... 1 common frames omitted Caused by: com.datahub.util.exception.ESQueryException: Search query failed: at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.lambda$executeAndExtract$1(ESSearchDAO.java:166) at io.datahubproject.metadata.context.TraceContext.withSpan(TraceContext.java:110) at io.datahubproject.metadata.context.OperationContext.withSpan(OperationContext.java:391) at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.executeAndExtract(ESSearchDAO.java:147) at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.search(ESSearchDAO.java:338) at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:173) at com.linkedin.metadata.search.elasticsearch.ElasticSearchService.search(ElasticSearchService.java:155) at com.linkedin.metadata.client.JavaEntityClient.search(JavaEntityClient.java:458) at com.linkedin.datahub.graphql.resolvers.ingest.source.ListIngestionSourcesResolver.lambda$get$1(ListIngestionSourcesResolver.java:83) ... 2 common frames omitted Caused by: java.lang.NullPointerException: Cannot invoke "Object.toString()" because the return value of "java.util.Map.get(Object)" is null at com.linkedin.metadata.search.elasticsearch.query.request.SearchRequestHandler.getUrnFromSearchHit(SearchRequestHandler.java:536) at com.linkedin.metadata.search.elasticsearch.query.request.SearchRequestHandler.getResult(SearchRequestHandler.java:511) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:992) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:682) at com.linkedin.metadata.search.elasticsearch.query.request.SearchRequestHandler.getRestrictedResults(SearchRequestHandler.java:530) at com.linkedin.metadata.search.elasticsearch.query.request.SearchRequestHandler.extractResult(SearchRequestHandler.java:406) at com.linkedin.metadata.search.elasticsearch.query.ESSearchDAO.lambda$executeAndExtract$1(ESSearchDAO.java:163) ... 10 common frames omitted 2025-04-15 16:04:43,836 [Thread-27267] ERROR c.datahub.graphql.GraphQLController:153 - Errors while executing query: query listIngestionSources($input: ListIngestionSourcesInput!) { listIngestionSources(input: $input) { start count total ingestionSources { urn name type config { recipe version executorId debugMode extraArgs { key value __typename } __typename } schedule { interval timezone __typename } platform { urn __typename ..., result: {errors=[{message=An unknown error occurred., locations=[{line=2, column=3}], path=[listIngestionSources], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}], data={listIngestionSources=null}, extensions={tracing={version=1, startTime=2025-04-15T16:04:43.810372225Z, endTime=2025-04-15T16:04:43.835651725Z, duration=25323791, parsing={startOffset=674702, duration=540421}, validation={startOffset=1365884, duration=662052}, execution={resolvers=[{path=[listIngestionSources], parentType=Query, returnType=ListIngestionSourcesResult, fieldName=listIngestionSources, startOffset=2466446, duration=22261874}]}}}}, errors: [DataHubGraphQLError{path=[listIngestionSources], code=SERVER_ERROR, locations=[SourceLocation{line=2, column=3}]}] 2025-04-15 16:04:43,836 [Thread-27267] INFO c.datahub.graphql.GraphQLController:166 - Executed operation listIngestionSources in 25 ms 2025-04-15 16:04:43,837 [Thread-27267] INFO c.datahub.graphql.GraphQLController:171 - Operation listIngestionSources execution result size: 263

@Lion2me
Copy link

Lion2me commented Apr 16, 2025

Hi, Me too

I'm using datahub 1.0.0 ( es_prefix = datahub_ )
I searched indices datahub__datahubingestionsourceindex_v2 in elasticsearch. it was existed!!!

In the datahub slack channel, i asked RunLLM.

#9775

query listIngestionSources($input: ListIngestionSourcesInput!) { listIngestionSources(input: $input) { start count total ingestionSources { urn name schedule { interval timezone } platform { name } type config { version executorId recipe } } } }
this query is working on datahub 0.12.0. but not datahub 1.0.0

here is my error log

prod-datahub-v1-datahub-gms-67894d5bff-mps69 Caused by: java.lang.NullPointerException: null prod-datahub-v1-datahub-gms-67894d5bff-mps69 2025-04-16 04:31:52,475 [Thread-2233] ERROR c.datahub.graphql.GraphQLController:153 - Errors while executing query: query prod-datahub-v1-datahub-gms-67894d5bff-mps69 { prod-datahub-v1-datahub-gms-67894d5bff-mps69 listIngestionSources(input: { start: 0, count: 10 }) { prod-datahub-v1-datahub-gms-67894d5bff-mps69 count prod-datahub-v1-datahub-gms-67894d5bff-mps69 } prod-datahub-v1-datahub-gms-67894d5bff-mps69 } prod-datahub-v1-datahub-gms-67894d5bff-mps69 prod-datahub-v1-datahub-gms-67894d5bff-mps69 , result: {errors=[{message=An unknown error occurred., locations=[{line=3, column=3}], path=[listIngestionSources], extensions={code=500, type=SERVER_ERROR, classification=DataFetchingException}}], data={listIngestionSources=null}, extensions={tracing={version=1, startTime=2025-04-16T04:31:52.458978104Z, endTime=2025-04-16T04:31:52.475495892Z, duration=16520747, parsing={startOffset=919917, duration=477427}, validation={startOffset=1237993, duration=294066}, execution={resolvers=[{path=[listIngestionSources], parentType=Query, returnType=ListIngestionSourcesResult, fieldName=listIngestionSources, startOffset=1718867, duration=14438314}]}}}}, errors: [DataHubGraphQLError{path=[listIngestionSources], code=SERVER_ERROR, locations=[SourceLocation{line=3, column=3}]}] prod-datahub-v1-datahub-gms-67894d5bff-mps69 2025-04-16 04:31:52,476 [Thread-2233] INFO c.datahub.graphql.GraphQLController:166 - Executed operation graphql in 16 ms prod-datahub-v1-datahub-gms-67894d5bff-mps69 2025-04-16 04:31:52,476 [Thread-2233] INFO c.datahub.graphql.GraphQLController:171 - Operation graphql execution result size: 263 prod-datahub-v1-datahub-gms-67894d5bff-mps69 2025-04-16 04:32:05,397 [pool-5-thread-1] INFO c.d.m.d.throttle.KafkaThrottleSensor:99 - MCL medianLag: {MCL_VERSIONED_LAG=1005, MCL_TIMESERIES_LAG=47} prod-datahub-v1-datahub-gms-67894d5bff-mps69 2025-04-16 04:32:05,397 [pool-5-thread-1] INFO c.d.m.d.throttle.KafkaThrottleSensor:157 - Throttle exponential backoff reset. prod-datahub-v1-datahub-gms-67894d5bff-mps69 2025-04-16 04:32:05,398 [pool-5-thread-1] INFO c.d.m.d.throttle.KafkaThrottleSensor:157 - Throttle exponential backoff reset.

@hyungryuk
Copy link

Hello! I figured it out what causes this issue.
When you installed datahub, "bootstrap-ingestion-datahub-gc" ingestion job created by default.
but without few json fields in it (graphql tries to get those fields!)

If you remove "bootstrap-ingestion-datahub-gc" ingestion job from your IngestionSource index from elasticsearch. it will work fine. (This is not the proper solution BTW)

@FatRunner
Copy link

@hyungryuk you're right. If remove _doc bootstrap-ingestion-datahub-gc from datahubingestionsourceindex_v2, ingestion work fine.

curl -XGET "http://localhost:9200/datahubingestionsourceindex_v2_1715768050160/_doc/_search?pretty"

curl -X DELETE "http://localhost:9200/datahubingestionsourceindex_v2_1715768050160/_doc/urn%253Ali%253AdataHubIngestionSource%253Adatahub-gc?pretty"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

4 participants