Creating a connector for Bitbucket

nejdet · 31 July 2024 11:18

Hi,
I need monitor the bitbucket activities and save everything on s3. I am looking for a bitbucket connector which will do everything github connector does.

Any idea where to start or any known solution?
Any suggestions?

I tried to search about bitbucket connectors but it seems that bitbucket is not a mainstream tool.

Thanks in advance for any kind of advice.

dtroiano · 31 July 2024 15:37

I’m not seeing any open source options for this.

Are you considering writing a connector? If so, this dev guide is a good place to start. There are a bunch of design considerations to work through. As a starting point:

Start with single task connector (you can scale out later if needed)
Take inventory of which BitBucket objects you need, i.e., which specific APIs here you need to scrape
How will you call the BitBucket API? I am not seeing a Java client. I am assuming you are using BitBucket Cloud? I found this client but it’s for server only, so you might have to hit the BitBucket Cloud REST API directly.
Figure out how you’re going to map poll calls to BitBucket API calls
Plan how you’re going to track offsets.
Plan for failures (dealing with dupes and/or skipping events)

Re: #4: For mapping poll calls to API calls, you’ll likely have to throttle requests to the BitBucket API to prevent being rate limited. I see that most API calls have created_on and updated_on fields that you can filter on. I’d consider throttling the API calls to start and end windows and calling the BitBucket API with these parameters. I.e. for each call to poll, query for objects from the previous call’s end time to “now”. Then take a nap for a configurable nap time.

Re: #5: For tracking offsets, the simplest thing I can think of would be to store the last start and end times queried per API call (source partition is the repository and object type / API call, and offset is the start/end time pair). I.e., basically do the simplest analog of what the dev guide does with file names and offsets. On startup, the first API call should use these same start and end times in case something went wrong during the previous call.

Re: #6, using the final point in the previous (reusing offsets when recovering from a failiure), you can either live with duplicates, or consider landing data in a compacted topic. If you go down the compacted topic route, it might take some playing with the BitBucket API to figure out the key to use. E.g. for pull requests, will just using the id suffice? Or do you want to log changes across calls (id + updated_on?).

Topic		Replies	Views
S3 sink connector generates files twice in a day Managed Connectors	12	4097	30 June 2024
🎥 Kafka Connect in Action : S3 Sink Self-Managed Connectors	0	3386	17 November 2020
Amazon S3 Sink Connector - offset Kafka Connect	1	86	9 September 2024
Access kafka connectors through the cloud API Confluent Cloud	0	958	20 December 2023
Salesforce Platform Event Sink Connector - pub/sub API Confluent Cloud	0	1879	9 May 2023

Creating a connector for Bitbucket

Related topics