ChatGPT is the latest Open AI solution available in preview for two weeks now as of writing this. ChatGPT is a fine-tuned model in the GPT-3.5 series, which finished training in early 2022; most of its knowledge is limited to data available in 2021.
Can ChatGPT be used for answering Business Central end-user functional-related questions? And how does it compare to a trained Q-A type chatbot built on Microsoft’s Bot Framework in the same field? Dmitry has tested ChatGPT usefulness for AL developers for generating code and concluded that due to limited public AL code repos it’s not good enough yet. Knowing that there are many more publicly available resources on BC functionality, let’s see how it will do with a sample of questions.
There are certain limitations and differences that are to be expected between the two bots:
- The main difference between the two chatbots is in the way they are trained. ChatGPT is trained with a huge dataset of publicly available information; our model is trained with only Business Central relevant information.
- Since ChatGPT cannot distinguish between what Business Central is and what SAP is, for example, we may expect some answers to have merged information from both ERPs, assuming it’s similar and providing non-sensical answers.
- ChatGPT generates answers instead of copy-pasting publicly available information. It sometimes creates answers that may seem correct on the surface, but one wrong word generated could change the meaning completely. There is the possibility to fine-tune GPT-3.5 models to receive better quality answers though it’s expensive and not in the scope of this test.
- ChatGPT does not have a confidence score defined – it will attempt to answer any question you provide, even if it does not have a valid answer. Our trained model has a confidence score, so it will not show an answer if it thinks it does not fit, but it may sometimes show an answer from another Q-A pair.
- While a trained chatbot does not have any of the above limitations, it takes much more effort to train it.
Testing the chatbots
I have randomly selected 8 questions about the functionality of Business Central + 1 bonus question and ran them all at once in both chatbots without any prior tests. Below I review what I received:
1. How to set the default dimension for G/L Account?
Is ChatGPT’s answer acceptable: No. It generated a non-sensical answer, suggesting a dimension field in the General FastTab of the G/L Account card.
Which chatbot’s the answer I like better: trained chatbot.
ChatGPT:
Trained MS Bot Framework Chatbot:
2. Which fields in G/L account define Tax Posting?
Is ChatGPT answer acceptable: Yes. While I expected a different answer, the one provided makes sense.
Which chatbot’s answer I like better: trained chatbot.
ChatGPT:
Trained Chatbot:
3. How to register a sales invoice?
Is ChatGPT answer acceptable: No. While it nailed down most of the instructions, it has suggested selecting “OK” button to “Save the invoice”, suggesting it will appear as a posted invoice. We know that it’s saving automatically and you need to post it at the end of the process. It also generated a suggestion for selecting “Exchange rate” field.
Which chatbot’s answer I like better: trained chatbot.
ChatGPT:
Trained chatbot:
4. How to renumber lines in General Journal?
Is ChatGPT answer acceptable: No, the answer does not make sense.
Which chatbot’s answer I like better: trained chatbot.
ChatGPT:
Trained Chatbot:
5. How to reverse G/L entries?
Is the ChatGPT answer acceptable: No, it does not make sense.
Which chatbot’s answer I like better: trained chatbot.
ChatGPT:
Trained Chatbot:
6. Which report shows customer balance?
Is ChatGPT answer acceptable: Yes. While it missed action in the first answer and generated false instructions for the “Overview” tab and “Balance” section, it still allows the user to navigate in the right direction.
Which chatbot’s answer I like better: trained chatbot, it generated prompts to check each option individually.
ChatGPT:
Trained Chatbot:
7. Can I adjust costs manually?
Is ChatGPT answer acceptable: No. It gave an answer on how to adjust inventory levels. The answer for inventory adjustment has some fields that are non-existent, but it’s close.
Which chatbot’s answer I like better: trained chatbot.
ChatGPT:
Trained Chatbot:
8. How to Pick Items for Warehouse Shipment?
Is ChatGPT answer acceptable: No.
Which chatbot’s answer I like better: trained chatbot; see below.
ChatGPT:
Trained Chatbot (couldn’t fit the answer):
9. Bonus question: How to upgrade code from C/AL to AL?
Is ChatGPT answer acceptable: No, the answer does not make sense. I have specified the question adding How to upgrade code from C/AL to AL with the Txt2AL tool? Only to receive another non-sensical answer.
Which chatbot’s answer I like better: trained chatbot. The functional bot did not come up with any answer since it’s not trained to answer upgrade-related questions. However, we have entered the same question into Simplanova’s Upgrade Chatbot (part of Simplanova AL Tools platform) and got the following prompts to dive further into it.
ChatGPT:
Trained Chatbot:
Conclusion
2 out of 9 answers on Business Central functionality from ChatGPT were acceptable, and 1 could be considered acceptable. Most of the answers could have made more sense, and on all occasions, I preferred the trained chatbot answers. While it’s a lot easier to start working with ChatGPT, I would not trust it for giving to end-users at this stage. We should wait and see if maybe the GPT-4.0 model will allow for better answers or maybe fine-tuning the GPT-3.5 model would.
For anyone who wants to test ChatGPT, it’s available in preview for free here.
For anyone who wants to test or start using the trained chatbot – Business Central Chatbot by Simplanova – you can test or give it for your users as of today as a Teams bot for free, simply using this link.
If you are interested in your own Reseller branded chatbot connected to your own knowledgebases and support system, please contact us using the form below.