Hi,
I need monitor the bitbucket activities and save everything on s3. I am looking for a bitbucket connector which will do everything github connector does.
Any idea where to start or any known solution?
Any suggestions?
I tried to search about bitbucket connectors but it seems that bitbucket is not a mainstream tool.
Thanks in advance for any kind of advice.
I’m not seeing any open source options for this.
Are you considering writing a connector? If so, this dev guide is a good place to start. There are a bunch of design considerations to work through. As a starting point:
- Start with single task connector (you can scale out later if needed)
- Take inventory of which BitBucket objects you need, i.e., which specific APIs here you need to scrape
- How will you call the BitBucket API? I am not seeing a Java client. I am assuming you are using BitBucket Cloud? I found this client but it’s for server only, so you might have to hit the BitBucket Cloud REST API directly.
- Figure out how you’re going to map
poll calls to BitBucket API calls
- Plan how you’re going to track offsets.
- Plan for failures (dealing with dupes and/or skipping events)
Re: #4: For mapping poll calls to API calls, you’ll likely have to throttle requests to the BitBucket API to prevent being rate limited. I see that most API calls have created_on and updated_on fields that you can filter on. I’d consider throttling the API calls to start and end windows and calling the BitBucket API with these parameters. I.e. for each call to poll, query for objects from the previous call’s end time to “now”. Then take a nap for a configurable nap time.
Re: #5: For tracking offsets, the simplest thing I can think of would be to store the last start and end times queried per API call (source partition is the repository and object type / API call, and offset is the start/end time pair). I.e., basically do the simplest analog of what the dev guide does with file names and offsets. On startup, the first API call should use these same start and end times in case something went wrong during the previous call.
Re: #6, using the final point in the previous (reusing offsets when recovering from a failiure), you can either live with duplicates, or consider landing data in a compacted topic. If you go down the compacted topic route, it might take some playing with the BitBucket API to figure out the key to use. E.g. for pull requests, will just using the id suffice? Or do you want to log changes across calls (id + updated_on?).