Skip to content

Conversation

@ntnn
Copy link
Member

@ntnn ntnn commented Jul 23, 2025

Summary

Fix workspace deletion leaks as found in #3350

What Type of PR Is This?

/kind bug

Related Issue(s)

Fixes #3350

Release Notes

Prevent goroutine leaks when deleting workspaces

@kcp-ci-bot kcp-ci-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. dco-signoff: yes Indicates the PR's author has signed the DCO. kind/bug Categorizes issue or PR as related to a bug. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 23, 2025
@ntnn
Copy link
Member Author

ntnn commented Jul 23, 2025

/hold

Until after 0.28 released

@kcp-ci-bot kcp-ci-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 23, 2025
@ntnn ntnn force-pushed the kcp3350-clusterwithcontext branch 3 times, most recently from 631efea to 61a0632 Compare July 23, 2025 14:57
Comment on lines 27 to 29
// KnownGoroutineLeaks are leaks from just running KCP collected
// with TestGoleakWithDefaults and run through:
// grep 'on top of the stack' output.log | cut -d, -f2- | cut -d' ' -f3 | sort | uniq
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of the leaks were just KCP not shutting down properly because it was stopped too fast after reporting ready, as noted in #3488
The leftover are probably similar things as in the webhooks - goroutines or informer hooks that were started but never bound by a context.

@ntnn ntnn force-pushed the kcp3350-clusterwithcontext branch from 88d1cc0 to 3cecee0 Compare July 23, 2025 15:59
@ntnn ntnn changed the title [WIP] Prevent workspace deletion leaks Prevent workspace deletion leaks Jul 23, 2025
@kcp-ci-bot kcp-ci-bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 23, 2025
@ntnn ntnn force-pushed the kcp3350-clusterwithcontext branch from 3cecee0 to 555f612 Compare July 23, 2025 16:17
Comment on lines -104 to -114
// GoleakWithDefaults verifies that there are no goroutine leaks.
// Goleak tests cannot be run in parallelized tests.
func GoleakWithDefaults(tb testing.TB, in ...goleak.Option) {
tb.Helper()

tb.Logf("waiting %v for goroutines to finish", WaitTime)
time.Sleep(WaitTime)

opts := append(KnownGoroutineLeaks, in...)
goleak.VerifyNone(tb, opts...)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea here was to use IgnoreCreatedBy to ignore any goroutines started by KCP when testing a component for leaks. However that requires the PR to be merged: uber-go/goleak#136

Since that doesn't seem to happen anytime soon I removed the entire helper function. The leak test for KCP itself is in framework/leak_test for now, the workspace leak test in workspace/leak_test.

@ntnn ntnn force-pushed the kcp3350-clusterwithcontext branch from 1bdfbaa to cbac814 Compare July 23, 2025 17:22
ntnn added 6 commits July 28, 2025 13:09
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
In large traces some frames are elided to show the top and bottom of the
stack trace. This wasn't handled in goleak until this PR:
uber-go/goleak#120

But the maintainers haven't tagged a new version since this PR.

Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
@ntnn ntnn force-pushed the kcp3350-clusterwithcontext branch from cbac814 to c00bc30 Compare July 28, 2025 11:11
@ntnn
Copy link
Member Author

ntnn commented Jul 28, 2025

/hold cancel

@ntnn ntnn force-pushed the kcp3350-clusterwithcontext branch 3 times, most recently from 39e5fe7 to 414435f Compare July 28, 2025 12:19
@ntnn
Copy link
Member Author

ntnn commented Jul 28, 2025

I'm not entirely sure why but the leak test always failed immediately with kcptesting.Eventually. Using require.EventuallyWitht seems stable:

image

Signed-off-by: Nelo-T. Wallus <n.wallus@sap.com>
@ntnn ntnn force-pushed the kcp3350-clusterwithcontext branch from 414435f to 032f189 Compare July 28, 2025 12:28
@ntnn
Copy link
Member Author

ntnn commented Jul 28, 2025

/retest

@ntnn
Copy link
Member Author

ntnn commented Jul 28, 2025

/hold cancel

@kcp-ci-bot kcp-ci-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 28, 2025
Copy link
Member

@embik embik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

🚀

@kcp-ci-bot kcp-ci-bot added the lgtm Indicates that a PR is ready to be merged. label Jul 29, 2025
@kcp-ci-bot
Copy link
Contributor

LGTM label has been added.

DetailsGit tree hash: bd25b0627061a35360ef7ce2246a18acb7f15cef

@kcp-ci-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: embik

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kcp-ci-bot kcp-ci-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 29, 2025
@kcp-ci-bot kcp-ci-bot merged commit 4393038 into kcp-dev:main Jul 29, 2025
14 checks passed
@ntnn ntnn deleted the kcp3350-clusterwithcontext branch July 29, 2025 07:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. dco-signoff: yes Indicates the PR's author has signed the DCO. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: Creating and then deleting workspaces causes leaked goroutines

3 participants