Mar 6

Mar 6 Expert Thinking and AI (Part 2)

Learning Scientists Posts, For Teachers, For Students, For Researchers, For Parents

Cover Image by cottonbro studio from Pexels

By Althea Need Kaminske

Note: To the best of my knowledge I did not use generative AI to write this post. Any mistakes or insights are my own.

The first post in this series gave an overview of Artificial Intelligence - a broad field that seeks to both better understand human cognition through computer models and to improve task-based computer models - and some of the different AI tools that have been developed. These different AI tools have different pros and cons that make them more or less suited to certain tasks. One of the things that struck me as I was researching was that many AI tools are purpose-built to solve specific problems. This overview article from MIT’s Sloan School of Management gives a handy flow-chart to help you decide what kind of AI tool to use for your specific problem (1). This approach to AI - working with an AI expert to develop a tool specific to your needs - looks very different from how I typically see AI used and talked about. Increasingly, AI is used as shorthand to refer to generative AI that is found in chatbots. Unlike other uses of machine learning, chatbots are used as a general tool. Someone might turn to ChatGPT to ask for a summary of ancient greek philosophy, a treatment plan for their child’s flu, or help with a speeding ticket. The chatbot can quickly return a few sentences that are clear and concise, making the chatbots appear to be experts at all manner of tasks.

In this post I want to explore how generative AI, specifically chatbots, are used and how they affect our thinking and development of expertise.

What Is An Expert Anyway? (Is AI coming for my job?)

One of the most striking aspects of chatbots is how human and expert-like their responses are. In a recent paper, Imundo and colleagues examined how chatbots can support (or harm) human expertise (2). To determine how, and when, a chatbot might be able to support human expertise, or replace it all together, they first break down what it means to have expertise.

Expertise is about more than simply knowing a lot. Imundo and colleagues describe four characteristics of human expertise: general, cognitive, social, and physical (2).

General

At a general level, experts routinely produce exceptional performance in specialized domains. At this point in my career I have likely given thousands of presentations (as a professor at a small liberal arts college with a 12 credit-hour teaching load I taught 4 classes 2-3x a week for 15 weeks, twice a year, which works out to about 300 presentations a year, not counting invited talks, conference presentations, or the occasional summer class). I am very good at talking about memory and learning in 45 minute chunks. I am less good at talking about other things.

Cognitive

At a cognitive level experts can do all manner of impressive feats:

Hold a large quantity of domain knowledge - both facts and strategies for problem solving
Access said knowledge quickly and efficiently
Rapidly integrate new knowledge
Process information relevant to their domains in large “chunks”
Identify patterns in problems
Categorize problems by deep, rather than surface-level features
Metacognitive monitoring (3)
And more!

Network with different colored nodes and links — Image by Elisa from Pixabay

In short, experts don’t just know a lot of information, they organize and interact with that information in ways that are categorically different than novices. These differences allow experts to think and solve problems more efficiently than novices.

Social

At a social level experts perform better than their non-expert peers, know how to talk to their peers to share knowledge effectively (e.g. using specific terminology, turns of phrases, paper formats, etc.), leverage their social networks for support, and demonstrate responsibility for their actions. In other words, part of being an expert in a field is being a member of a community and understanding how to interact within that community. Being a part of a community of experts also means that you are held accountable by that community. In many fields credentials are important because they recognize that you have met a standard set by that community and, if you were to violate the trust put into you by that community, you might have those credentials taken away.

Physical

At a physical level experts can use their bodies to intuit and solve problems, as well as use physical gestures. For example, a physical exam - using sight, touch, and even smell - is one of the tools that expert physicians use to diagnose and treat patients (4). Expert teachers use physical gestures when they recognize that material is unfamiliar to students (5).

With this more nuanced understanding of expertise, is AI expertise the same as human expertise? Not really. Certainly, some AI tools have an impressive ability to mimic cognitive expertise. They can store, access, and integrate new information in increasingly complex ways. How they learn and store that knowledge, however, differs in fundamental ways from how human experts learn and store that knowledge. Human expertise is built in a hierarchical way. We learn foundational concepts and, over time, develop more and more complex associative networks around those concepts. For example, it would be hard to understand how diabetes works without first understanding what insulin is (3). Generative chatbots, however, do not build on foundational concepts. Instead, they simply detect how frequently information co-occurs together. There is no fundamental understanding of a topic, just a bunch of associations. This means that ChatGPT is prone to some odd errors, like being able to solve undergraduate math problems while still failing at simple addition (3). Or not understanding how many ‘r’s are in “strawberry”.

Another way in which human cognition differs from chatbots is our ability to detect errors. Experts, as humans, are prone to overconfidence and errors. However, experts are better at detecting, and correcting, errors than novices (3). One of the ways that we can monitor our errors is through fluency. Fluency describes how easy it is to engage with a task - to remember a piece of information, perceive relevant stimuli, or how familiar something felt (3). Experts, given their vast amount of experience in an area, are likely to feel a relatively high degree of fluency when they engage in tasks in their areas of expertise. When they experience disfluency - when a task feels slightly off - it can be a sign that something has gone awry. As highlighted above, chatbots are capable of making bizarre errors, but they do not possess an ability to detect that errors have been made.

Binary code of 1's and 0's are projected onto a person obscuring their features — Image by cottonbro from Pexels

Generative chatbots like ChatGPT also have a remarkable ability to pass for human-like performance in some limited social contexts, scoring well on standardized exams assessments typically used to measure aptitude and performance in a field (2). However, the lack of agency in chatbots means that they are unable to take responsibility for their actions.They cannot fully be members of the community if they operate outside of the ethics and morality of that community. If a generative chatbot makes up data we call it “hallucinations”, if a professional makes up or misrepresents their knowledge on a topic, they can be stripped of their credentials. Researchers who make up data are often stripped of their funding, title, and degree, medical doctors can have their license taken away, and layers can be disbarred.

AI also currently lack the ability to demonstrate physical expertise in a robust way, though there are plenty of companies working to make AI robots that unironically look like the machines that either turn against you or teach you how to love in scifi movies.

Can Chatbots Help Develop Expertise?

While chatbots may not have all of the dimensions of human expertise, they are often seen as potential tools to either give laypeople or novices access to expertise, or help them to develop expertise. A common refrain that I see in conversations around students using ChatGPT is that we should be teaching students how to use these tools. The logic here is that the horse is well out of the barn, and so it would be a mistake to ignore the widespread use of this tool and instead embrace it. It is worth exploring, then, how chatbots might be used to support learning.

One potential area where chatbots could be used is in intelligent tutor systems (ITS), which are designed to help support student learning by providing a simulated learning environment and/or responsive chatbot to help coach students through the learning process. Imundo et al. (2024) provides a brief overview of several of these systems, and while some of them show some promise, they generally are in very early stages (2). For example, Betty’s Brain ITS is a computer agent that students can teach material to and then test to see how well Betty’s Brain learned the information. While early tests showed promise, students who taught Betty made more complete concept maps than those who did not (6), classroom implementation presented some challenges as there was a wide variation in how well students were able to use the program (7). It is worth noting that these tutor systems are specifically designed to help students with specific strategies or within specific domains.

Within medical education, one of the use cases for chatbots is to generate clinical practice scenarios (8). There is a large demand for clinical practice scenarios within medical education because these form the basis of how students are assessed on their licensing exams (see: USMLE). A common study practice for students preparing for these exams is to engage with as many practice questions as they can - potentially thousands of practice questions. Access to these practice questions is typically through third-party resources that can cost hundreds of dollars to access (a one-month subscription to Uworld costs $319, a base subscription to AMBOSS is $19/month, but it costs $149 to have full access to their library of practice questions, TrueLearn starts with a base price of $149 for a month of access to its question bank). In this context, it makes sense why students might turn to ChatGPT to generate practice questions. A recent news article from AAMC reports on an AI tool that was developed to create questions for a course about the blood system. They report that 85% of the questions it created met the criteria and, after human review, 75% of the questions were given to students as study material. While AI might be a useful tool to aid in question creation, it is important to note that it is still prone to errors and biases and thus needs a fair degree of human oversight to ensure that medical students are not being taught inaccurate information (8). Again, I note that a creation of a specific generative tool trained for a specific purpose might be useful with human oversight.

Concerns About using ChatGPT as a Learning Tool

As noted above, chatbots have the ability to assist with learning. However, these tools have the most utility when they are designed for a specific purpose and are used with direct oversight from experts. Experts have the ability to help develop an AI tool to solve a specific problem, determine the appropriate training data for the AI, and check the quality of the output. The last part is incredibly important as even a small error rate can have disastrous consequences if the tool is used on a large scale. It is difficult to estimate the error rate of ChatGPT given the range of prompts and requests it is asked to complete and estimates vary depending on the complexity of the task it is asked to complete. In the case of a complex task like replicating the results of a systematic review, generative chatbots like ChatGPT and Bard produce misleading and incorrect information 28.6% - 91.4% of the time (9). OpenAI estimates that its most accurate ChatGPT model is only misleading or wrong .8% of the time. Even if we go with the conservative estimate of .8%, OpenAI also reports that 200 million people use ChatGPT each week. That works out to 1,600,000 misleading or flat out incorrect responses each week. How do you, as a non-expert in a field, know whether or not you’ve gotten one of the million or so misleading or incorrect responses?

AI errors, particularly errors from generative chatbots, are especially concerning because these tools are particularly good at getting us to trust them. Garry, Kenkel, & Foster (2024) outlined how we decide how real or true something is, a process called reality monitoring (10). When we decide how real or true a piece of information is, we tend to rely on heuristics - does that sound familiar? Was the source confident? We can also rely on more effortful processing to determine the truthfulness of information; analyzing logic, checking sources, etc. Garry et al., highlight several ways in which chatbots exploit reality monitoring to seem more trustworthy. First, the conversational way in which people engage with chatbots help to imbue it with person-like characteristics. Second, chatbots often pause when the model is processing the request. ChatGPT will explain that it is “thinking”, “translating the problem”, “defining variables”, “figuring out equations”, and then “adjusting the calculations”. All of this gives the impression that ChatGPT is giving you a slow and deliberate answer. Third, unlike experts who tend to focus on the nuance of their field, chatbots give precise and definitive answers, which people tend to interpret as confidence in credible and accurate answers. All of these make it feel like we are interacting with a trustworthy source, our own personal assistant. It can be tempting, then, for people to think of AI as more objective and perhaps even more credible than expert sources (9).

Perhaps the biggest concern I have about using AI as a tool for learning is that it has the potential to remove deliberate practice for learners. In the main article that I’ve covered here, Imundo and colleagues largely assumed a good-faith engagement with AI (2). They highlighted ways in which purpose-built and supervised AI tools might used, with supervision, to improve learning or practice. As I noted above, this is very different from how I hear AI being used in education. My friends who teach in K-12 and my colleagues who teach at the university level are not dealing with an influx of targeted, specific AI models that students are using. They’re dealing with students using ChatGPT to outline, refine, and some times just wholesale write papers and complete assignments. I’ve noted at the top of each of these posts that, to the best of my knowledge, I have not used generative AI to write this post. As I edit this in Squarespace an AI tool in the editing pane (next to where I can choose my font and heading style) is constantly flashing. When I searched for articles to explain AI, an AI overview was the first thing to appear. I do most of my writing in Google Docs and it occasionally gives me a little popup asking if I want to try their AI tool, Gemini. It is, frankly, almost harder to not use generative AI at this point. If students are using AI for everything it’s at least in some part because AI is everywhere.

“For a Student Who Used AI to Write A Paper

Now I let it fall back
in the grasses.
I hear you. I know
this life is hard now.
I know your days are precious
on this earth.
But what are you trying to be free of?
The living? The miraculous
task of it?
Love is for the ones who love the work.”

— Joseph Fasano

The use of AI to complete papers and assignments may be more or less harmful to learning depending on how it was used and what the ultimate goal of the assignment is. I cannot speak to every specific circumstance, but I can speak to how I assigned and designed papers and projects for my classes: the point of these assignments was the process not the product. While it is always nice to read a well-written paper, that was not the ultimate goal of the assignment. The goal of the assignment was to learn how to identify a research topic, how to search for and find information within a specific field of research, how to think critically about a topic, how to better design a research project with the foreknowledge of how the data will be analyzed. The goal of the assignment was to help them develop their own expertise so they can ask their own questions and come to their own conclusions. Generative AI potentially deprives students of the opportunity to practice building that knowledge for themselves (2).

As a tool, AI is neither inherently good or bad for learning. However, when we talk about AI in educational settings there is often lack of clarity around what kind of AI tool is being used and how best to use it. AI can be a knife that we use to cut through large swaths of information, but without better understanding how to use AI in education we risk harming the very processes that we wish to support, like grabbing the knife by the blade.

References

Sara Brown (2021, April 21). Machine learning, explained. https://mitsloan.mit.edu/ideas-made-to-matter/machine-learning-explained
Imundo, M. N., Watanabe, M., Potter, A. H., Gong, J., Arner, T., & McNamara, D. S. (2024). Expert Thinking with Generative Chatbots. Journal of Applied Research in Memory and Cognition, 13(4), 465-484. https://doi.org/10.1037/mac0000199
Cash, T. N., & Oppenheimer, D. M. (2024). Generative chatbots AIn’t experts: Exploring cognitive and metacognitive limitations that hinder expertise in generative chatbots. Journal of Applied Research in Memory and Cognition, 13(4), 490-494. https://doi.org/10.1037/mac0000202
Kelly, M. Ellaway, R., Scherpbier, A., King, N., & Dornan, T. (2019). Body pedagogics: Embodied learning for the health professions. Medical Education, 53(10), 967-977. https://doi.org/10.1111/medu.13916
Alibali, M. W., Nathan, M. J., Wolfram, M. S., Church, R. B., Jacobs, S. A., Johnson Martinez, C., & Knuth, E. J. (2014). How teachers link ideas in mathematics instruction using speech and gesture: A corpus analysis. Cognition and Instruction, 32(1), 65-100. https://doi.org/10.1080/07370008.2013.858161
Biswas, G., Leelawong, K., Schwarts, D., Vye, N., & the Teachable Agents Group at Vanderbilt. (2005). Learning by teaching: A new agent paradigm for educational software. Applied Artificial Intelligence, 19(3-4), 363-392. https://doi.org/10.1080/08839510590910200
Biswas, G., Segedy, J.R. & Bunchongchit, K. (2016). From Design to Implementation to Practice a Learning by Teaching System: Betty’s Brain. International Journal of Artificial Intelligence in Education, 26, 350–364. https://doi.org/10.1007/s40593-015-0057-9
Meng, X., Yan, X., Zhang, K., Liu, D., Cui, X., Yang, Y., Zhang, M., Cao, C., Wang, J., Wang, X., Gao, J., Wang, Y. G., Ji, J. M., Qiu, Z., Li, M., Qian, C., Guo, T., Ma, S., Wang, Z., Guo, Z., … Tang, Y. D. (2024). The application of large language models in medicine: A scoping review. iScience, 27(5), 109713. https://doi.org/10.1016/j.isci.2024.109713
Agnoli, S., & Rapp, D. N. (2024). Understanding and supporting thinking and learning with generative artificial intelligence. Journal of Applied Research in Memory and Cognition, 13(4), 495-499. https://doi.org/10.1037/mac0000203
Garry, M., Henkel, L. A., & Foster, J. L. (2024). Wires crossed? On chatbots as threats to reality monitoring. Journal of Applied Research in Memory and Cognition, 13(4), 485-489. https://doi.org/10.1037/mac0000208