This year’s CodeGeist Unleashed is a nice twist on Atlassian’s annual hackathon. The developers are challenged with creating an AI app for either collaboration, data-driven insights, or elevated developer experience.
Our team had several ideas for AI-powered apps for Jira. CodeGeist seemed like a perfect opportunity for bringing them into reality.
Let’s walk you through our journey of coming up with Smart AI Insight from idea to MVP to Hackathon submission.
Step 1: Come up with a “killer” app idea
Easier said than done, eh?
There were dozens of ideas regarding generative AI uses in Jira flowing in our heads. Even more were actively discussed by the Atlassian Community. All of them have potential and offer great value to the end user. We needed to come up with something that could:
- Change the way the companies are managed
- Improve our own processes
- Be feasibly delivered in a timely manner
Our team typically relies on BRIDGeS, a flexible decision-making framework at Railsware. The BRIDGeS framework offers a holistic approach to product discovery as it shifts the perspective from a single subject, like the end-user or product, to multiple tactical and strategic goals at once.
A typical BRIDGeS session can take between two to six hours, depending on the context complexity. It consists of four primary steps.
- Describing the problem. This is a brainstorming session that’s aimed at defining the subjects and their descriptors.
- Prioritization. The limitations of a Hackathon aside, you wouldn’t have the time and resources to tackle everything at once, even in a perfect real-world scenario. That’s why it is crucial that your team prioritizes the benefits, risks, and issues before moving to the solution stage. We tend to use the MoSCoW framework to identify Musts, Shoulds, Clouds, and Wonts.
- Move to the Solution Space and start creating potential high-level solutions that satisfy the Benefits, Risks, and Issues mentioned in the Problem Space.
- Describe the high-level solution through epics and the nested tasks. Epics and tasks are a helpful way to split high-level solutions into smaller blocks. Later, they will transform into the implementation plan or the roadmap.
This session got us thinking about the potential use cases for generative AI that has access to the data from your Jira instance. This is how we discovered the use case for summarizing the scope based on the content and contexts that are already in Jira. More importantly, we understood why we should make the Smart AI Insight app:
- The concept is simple and deliverable. The app doesn’t need a lot of data. The users can simply upload the issues they want to be summarized.
- Summarization is something GPT excels with.
- There is clearly a need for Smart AI Insight. Team/progress updates and notes are something everyone at every company discusses every day in one way or another.
Step 2: Create a very basic PoC
Now that we have focused on one particular use case – summarization – the next problem we needed to solve was a basic Proof of Concept prototype. Our goals for this stage were:
- Validate if the summarization app can be done
- Identify the best prompts to feed the AI
- Compare GPT-3.5 and GPT-4 models
- Validate the useability and usefulness of our solution
Obviously, we could get straight to developing the app. Yet we chose to take a step back and ask ourselves if we can really afford to invest time and resources into an app before being 100% sure it can work like we want it to.
The BRIDGeS session highlighted a very important asset we have on our hands – the Open AI API that – similarly to Jira – can be integrated with a lot of things. Things like Google Sheets, for example. And they were a perfect fit for building the first PoC prototype.
Google Sheets gave us the option to:
- Deliver a prototype quickly, with minimal coding
- Create a very basic yet easily understandable UI for the user;
- The ability to import issues for Jira;
- Access to Open AI API;
We used a third-party tool called GPT for Sheets and Docs to access the Open AI API. This tool has proven to be quite handy as it offers an interface for making requests to their server and receiving responses. We simply needed to set up our API keys.
Thanks to this, we had a working prototype that could help us achieve all of the aforementioned goals.
- The users can now import their Jira tickets with the necessary data like Project, Issue key, Summary, Description, Status, Assignee, etc.
- Then, they can use a simple interface to specify their use case, like:
- Generate release notes,
- Show progress on an Epic,
- Explain what the Epic is about,
- Generate a feature announcement.
- And they can access a selection of filters like a filter by date or issue type.
Long story short, we had a lightweight, quick, no-code solution we could use to explore and validate Smart AI Insight further.
Step 3: A viability study
It’s hard to overestimate the benefits of a working prototype in the earliest stages of the development cycle.
For starters, we had a “new shiny toy” our own Product Managers could explore. We relied on their feedback to validate our idea.
They have confirmed our hypothesis. There is, in fact, a need for a summarization solution.
This early user testing has also revealed several problems we needed to solve.
For starters, we needed to perfect our prompt engineering. The prototype had a very basic prompt explaining to the AI what needed to be done. Unfortunately, the end result wasn’t reliably consistent. Chat GPT was often confused and sometimes forgot about certain tickets in its summary.
This gave us a new goal. We needed to identify the best prompt that could reliably prepare team updates or release notes based on the content inside of Jira tickets. Our approach was to start with the basic prompt we used and enhance it with certain prompt engineering techniques. Then we needed to evaluate the new prompts using the diverse input examples from various user stories. And lastly, we needed to compare the GPT-3.5 and GPT-4 models.
We used the following prompt engineering techniques:
- Role assignment: This did not yield any significant results for GPT-3.5 yet significantly improved the output of GPT-4.
- Providing Project Context: This has proven to be quite effective for both GPT models.
- Showing examples: This worked great for GPT-4 as the AI could mimic various formatting and styling examples. However, this prompt was a bit too complex for GPT-3.5.
- Giving instructions: This has significantly improved the output of GPT-3.5, yet GPT-4 was disrupting the output without additional context.
- Prompt Syntax: This also did not yield any significant results.
The primary finding was that GPT-3.5 is not a reliable model for our app if the prompt is overcomplicated with details. The results we want to achieve are pushing the model to its limits, and more complex prompts lead to many inconsistencies with results. However, GPT-3.5 offers great results with simpler prompts and less context.
On the other hand, GPT-4 has more than enough capacity to handle more complex tasks. And we’ve discovered that it performs much better when provided with detailed instructions and context within the prompt.
This understanding has helped us design the most fitting prompts for both models.
- GPT 3.5: Base prompt + Project context + Instructions + Basic output example
- GPT 4: Role + Base prompt + Project сontext + Instructions + Basic output example
The next steps were to compare the models between themselves, compare our solution with the offerings of our competitors, and finetune the model parameters to optimize the results.
Despite AI’s best efforts to slowly and gradually take over the world, it’s not quite there yet. A lot of our Product Managers who have tested the app were not entirely sure if they could trust the generated results. Primarily, they questioned whether the AI model really summarized all of the issues. They were also worried that Chat GPT could simply lie straight to their face 😱
That’s why we made sure that the Smart AI Insight app had a dedicated screen showing both the JQL parameters as well as the tickets that were selected. Then, after the users themselves have validated the input, Smart AI Insight will let Open AI do its magic.
Overcomplicated user flow
The current flow with filters was complex and confusing. Users wanted a simpler solution as well as the ability to receive results faster.
This gave us an understanding of an interesting factor regarding user experience. We can’t afford to overload them with dozens of filters. That’s why the final app offers one filter. It is based on JQL, allowing users to specify their needs.
Lacking context on impact
Our Product Managers have noticed that the summary does not clearly show the importance and impact of issues. It did not have a way to highlight high-priority tasks.
The solution to this lies in prompt engineering. We had to specify that the AI model needed to pay attention to the potential value of a story.
A lightweight prototype has helped us validate our ideas and identify previously unexplored risks and challenges. But, most importantly, it gave us the freedom and flexibility to implement the solutions even before the first line of code was written.
Step 4: It’s time to Forge the app
We have a prototype. Our theories have been validated. We’ve also discovered some of the potential pitfalls as well as certain ways of mitigating them. It should be smooth sailing from here on out, right?
Well, not exactly. There’s still an app we need to make and submit. And, while the development journey has been simplified as much as possible thanks to a backbone of an early prototype, there were still interesting challenges we had to overcome.
Let’s start by answering an interesting question: Why’d we choose Atlassian’s Forge over developing a connect app? What were the pros and cons of both of these approaches?
- The setup is quite simple.
- The solution is serverless. It also offers us a ready-to-use back-end.
- There are several UI options. Custom UI offers us almost the same flexibility as we would have had with a connect app.
- Advanced security out of the box.
- CSRF is provided by Atlassian.
- The back end doesn’t support streaming.
- Requests have a hard timeout of 25 seconds (55 seconds for background jobs)
- The advanced security that comes out of the box is nice, but we will still need to configure a lot of permissions.
- You can’t create an app for Jira DC.
- More flexibility when building apps.
- Gives us an option to release an app for both Jira Cloud and Jira DC.
- Has the ability to call Jira’s API directly from the UI.
- Might result in certain trust issues. Sure, they can be overcome by becoming an Atlassian Marketplace Platinum Partner like we have with Smart Checklist 😉 but it is still quite a journey.
- Nothing is secured by Atlassian. CSRF, CSP, DDoS – everything needs to be handled manually. You’ll also be investigated for GDPR compliance separately.
Having the pros and cons of both approaches in mind, we went with the Forge. The pros of being able to start quickly and with security covered out of the box (with custom configurations) were simply too hard to beat. In addition to that, we didn’t need to worry about supporting the infrastructure on our end as the back-end is managed by Atlassian.
Sure, a connect app can offer more flexibility in certain cases. For us, however, it would have been an overkill. Especially given that we could have access to the functionality we needed with a UI kit.
Which UI kit to choose?
Our choice lay between the UI-Kit, UI-Kit 2, and Custom UI.
- UI-Kit: This is the first option we’ve tried. And, to be completely honest, it is a nice, lightweight option for creating a basic UI. But it simply isn’t what we need for Smart AI Insight. The challenges we’ve explored during user testing call for a responsive, intuitive front-end. The server-rendered HTML of the UI-Kit is not enough to develop a dynamic UI. The limitation in components and inability to add custom elements were the final nail. The components are hard to support. Sure, their architecture is react-like, but the first impression may be misleading. The hooks you are familiar with are present, but they are running in a completely different manner.
- UI-Kit 2: This option has an expanded list of components. It also supports React, which was a boon. On the downside, this is an unstable feature that is still in the preview stage, so there is no testament to its reliability.
- Custom UI: This is the solution we went with, as it allows us to render our app in a provided iframe. Its back-end is on Lambda, which gives us an option to connect our front-end. We were free to use React, the libraries we needed, as well as the standard Atlas kit we are using in our connect apps. This option lets our front-end call the back -end or Jira. As for the downsides, the requests are limited. We can either call our Lambda or Jira. That’s it. We can’t make a request to Chat GPT directly as this interaction would violate CSP policies.
All in all, the availability of options is quite nice, and all of them certainly have their uses. And they also make an excellent case for the importance of early user testing. Our first thought was to go with the UI-Kit, and we would have gone with it for the sake of simplicity and saving time if we didn’t know that the users could get lost and require certain specific UI elements in order to have a satisfactory experience.
Vite offers a simple solution for front-end development that works out of the box. It allows you to avoid setting up the builder, bundler, and so on. Vite has a nice selection of boilerplates. We used the TypeScript + React boilerplate and put it inside the Custom UI.
We’d like to point out the excellent support for Forge app developers in regards to incorporating themes thanks to Atlassian’s design tokens. Sure, this isn’t necessarily a challenge we had to overcome, but it is still an excellent improvement we couldn’t help but point out, given the amount of trouble we’ve had with implementing a dark theme for Smart Checklist (an app we have developed some time ago).
CSP for inline styles
Atlassian tends to take security seriously. As a result, external inline styles were not available in the iframe offered by Custom UI. Luckily, there’s an option to allow inline styles by having a separate point that allows for the use of inline styles in the permissions.
permissions: scopes: - storage: app - read: jira-work content: styles: - unsafe-inline
Luckily, this was an option. Otherwise, the development process would have been much more painful and troublesome.
No streaming from Jira’s side
When developing the prototype, we’ve realized that there is streaming from Chat GPT but not from Jira’s side.
The issue here is with Lambda itself. The front-end could stream but could not make a direct request to either Chat GPT or to our back-end due to SCRF policies. It can only make a request to Lambda, and Lambda itself can’t stream.
Request time limitations
Adding salt to injury, original queries are limited to 25 seconds. Any unfinished work is terminated after that time. This meant that most of the responses from Open AI that required the generation of more than 100 tokens or 50-100 words would fail due to timeout.
As we were trying to figure out a solution, Atlassian announced an increased time limit for the Async events API. Still not perfect. Implementation would require a bit of work. But this was a solid start.
The increased time limit enabled us to add tasks that make a request to OpenAI to a queue. The results could then be stored with the Storage API. Additionally, we make requests from our Custom UI front-end to check if the job has finally received a response.
So, instead of simply making a request to the desired API, we need to:
- Put a payload for the request into storage
- Put an event into the queue
- Process the event and make the request to OpenAI with the saved payload
- Save the response from OpenAI in the storage if it does not exceed the 55-second limit (we received it before the process died)
- Take the saved response and display it in our front-end (if we have a response)
- Show an error message if we haven’t received a response from OpenAI in time
- The provided
`@forge/api`fetch doesn’t allow the use of streams, so we are limited to either a full response or nothing. We can’t show intermediate values or progress.
While this approach has proven to be helpful, it is still far from perfect. It is rather complicated and still doesn’t allow us to receive a part of the streamed response from OpenAI if the response takes too much time.
Chat GPT limitations
Fortunately, Open AI will share the real number of used Tokens after each use so we can store the data on how many Tokens a user spends. This can give us a rough estimation of the average use of tokens.
Typically, these kinds of articles end with a brief summary. This article, however, is focused on the process of developing an AI app for summarization, so it’s only fitting that the summary would be provided by Smart AI Insight.
“In conclusion, the journey from ideation to MVP to hackathon submission for our Smart AI Insight was filled with challenges, insights, and revelations. It taught us the importance of proof of concept, early user testing, prompt engineering, and how to navigate through limitations. We experienced firsthand how a decision-making framework like BRIDGeS can lead to impactful product discoveries and how AI technology can be leveraged to improve company processes. While we faced obstacles along the way, from prompt inconsistencies to trust issues and overcomplicated user flow, we found solutions that allowed us to improve our product. We learnt to work within the constraints of our chosen platforms, and found workarounds for their limitations. As we move forward, we are confident that the lessons we learned from this journey will be invaluable in guiding our future product development endeavors. We’re excited to continue refining and expanding Smart AI Insight, bringing valuable and efficient AI-powered solutions to Jira users.”