ACS Publications. Most Trusted. Most Cited. Most Read
Enhancing AI Responses in Chemistry: Integrating Text Generation, Image Creation, and Image Interpretation through Different Levels of Prompts
My Activity

Figure 1Loading Img
  • Open Access
Article

Enhancing AI Responses in Chemistry: Integrating Text Generation, Image Creation, and Image Interpretation through Different Levels of Prompts
Click to copy article linkArticle link copied!

Open PDFSupporting Information (1)

Journal of Chemical Education

Cite this: J. Chem. Educ. 2024, 101, 9, 3767–3779
Click to copy citationCitation copied!
https://doi.org/10.1021/acs.jchemed.4c00230
Published August 19, 2024

Copyright © 2024 The Authors. Published by American Chemical Society and Division of Chemical Education, Inc. This publication is licensed under

CC-BY 4.0 .

Abstract

Click to copy section linkSection link copied!

Generative Artificial Intelligence technologies can potentially transform education, benefiting teachers and students. This study evaluated various GAIs, including ChatGPT 3.5, ChatGPT 4.0, Google Bard, Bing Chat, Adobe Firefly, Leonardo.AI, and DALL-E, focusing on textual and imagery content. Utilizing initial, intermediate, and advanced prompts, we aim to simulate GAI responses tailored to users with varying levels of knowledge. We aim to investigate the possibilities of integrating content from Chemistry Teaching. The systems presented responses appropriate to the scientific consensus for textual generation, but they revealed alternative chemical content conceptions. In terms of the interpretation of chemical system representations, only ChatGPT 4.0 accurately identified the content in all of the images. In terms of image production, even with more advanced prompts and subprompts, Generative Artificial Intelligence still presents difficulties in content production. The use of prompts involving the Python language promoted an improvement in the images produced. In general, we can consider content production as support for chemistry teaching, but only with more advanced prompts do the answers tend to present fewer errors. The importance of previously understanding chemistry concepts and systems’ functioning is noted.

This publication is licensed under

CC-BY 4.0 .
  • cc licence
  • by licence
Copyright © 2024 The Authors. Published by American Chemical Society and Division of Chemical Education, Inc.

Special Issue

Published as part of Journal of Chemical Education special issue “Investigating the Uses and Impacts of Generative Artificial Intelligence in Chemistry Education”.

Introduction

Click to copy section linkSection link copied!

Artificial Intelligence (AI), in simple terms, refers to the ability of machines to exhibit intelligent behaviors, including learning, making decisions based on data, and making predictions. A specific segment within this broad spectrum comprises AI capable of creating media content on demand, employing human-like language. This category is known as Generative Artificial Intelligence (GAI). (1)
AI has shown potential applications in Chemistry, (2) such as predicting the three-dimensional structure of proteins based on amino acid residues, (3) conducting screenings to aid in discovering new drugs, (4) assisting in synthesis planning, (5) and even contributing to the resolution of complex wave functions. (6) In Chemistry Education, there has been an increasing interest in AI, specifically in GAI, notably due to advancements in the simulation of human language, which facilitates content creation across various topics.
However, it is relevant to underscore that integrating digital technologies into Chemistry Education is not a recent practice. Teachers and students have been exploring a variety of resources inserted in Digital Information and Communication Technologies (ICT), (7) such as software, (8) applications, (9) videos, (10,11) digital games, (12) and the Internet. (13) In addition, there are norms and guidelines (14−16) that emphasize the importance of developing activities that enhance students’ digital competencies. However, the parties involved often question the integration of these technologies. For example, the introduction of Wikipedia sparked vigorous debates within academic institutions regarding how this tool could foster intellectual passivity among students.
When considering the adoption of technologies in the educational realm, it is critical to think beyond merely utilizing the available resources. We find ourselves in front of an array of tools that, although initially not intended for educational purposes, are being incorporated into this context. Such integration necessitates a meticulous analysis to ensure that the deployment explores their pedagogical potential. As a result, it is critical not only to recognize the functionalities of these tools for general use but also to adapt them to the educational context, considering didactic planning and the ongoing process of evaluation and reflection on their use. (17,18)
In this emerging scenario, it is also essential to consider the specificities of different resources and the digital competencies required for education that are necessary for their use. Acknowledging that GAI Tools are instruments that necessitate digital knowledge and skills different from those of a simple search engine is important. Concurrently, testing the use of resources by simulating users with varied competencies can yield data that facilitate the promotion of teaching and learning strategies. In this realm, the perspective and coordination with studies on developing pedagogical and technological content knowledge in teacher education for using technologies is also a vital field of study. (15,19,20)
Clark and colleagues (21) compared the responses of general chemistry students and those of ChatGPT 3.5 to a similar task. The results indicated that ChatGPT achieved a success rate of only 44% in solving problems on General Chemistry exams, lower than the class average of 69%. In open-ended questions, ChatGPT stood out for its language processing ability, showing better performance in problems that could be addressed with more general information. Interestingly, even in incorrect responses and flawed explanations, ChatGPT often exhibited consistent logical reasoning, which could be persuasive to someone new to the subject.
Other studies identified possibilities and limitations regarding using the framework available in distinct GAI systems. Emenike and Emenike (22) explored the impact of text-generating GAI systems, such as ChatGPT 3.5, on higher education and research in chemistry, highlighting opportunities for assistance in teaching and learning while raising concerns about academic integrity and information accuracy. Regarding the use by students, a study by Tassoti (23) examined the use of GAI by chemistry students, with an emphasis on prompting strategies and prompt engineering. Tassoti identified the tendency of students to copy and paste questions, evidencing a lack of refined prompting skills that are crucial to effectively harnessing the capabilities of generative AI.
Humphry and Fuller’s (24) examined the use of ChatGPT 3.5 in undergraduate chemistry laboratories, highlighting its potential to assist in report writing and analysis but noting limitations in accuracy and chemical analysis. In a General Chemistry question test, ChatGPT 3.5 exhibited limited performance, correctly answering only 32.43% of the questions. This underscores the necessity of a solid understanding of chemistry fundamentals to utilize GAI as an effective learning tool. Arajo and Saudé (25) investigated the application of ChatGPT in chemistry laboratory activities, stating that ChatGPT effectively interprets and reproduces chemistry’s symbolic language. However, they highlight significant limitations in the scientific and pedagogical accuracy of the suggested activities, emphasizing safety and sustainability issues and a lack of detail.
Talanquer (26) explored the reasoning of ChatGPT 3.5 and Google Bard while posing a series of conceptual chemistry questions. The results revealed that chatbots, like novice chemistry students, tend to exhibit patterns of reasoning and alternative conceptions. Even when attempting to refine their responses, chatbots often do not entirely correct errors or explain concepts accurately, opting for plausibility over accuracy.
One of the main points of existing studies is their exclusive focus on text-GAI. Notably, to the best of our knowledge, no research to date has explored the potential of image-GAI in chemical education, specifically in creating didactic content such as chemical bonding representations and Lewis structures. This gap represents a significant opportunity to enhance chemical education by exploring GAI’s visual capabilities and working with symbolic levels in chemistry language. Another observation pertains to adapting nontextual information within questions for the chatbot. Some studies were confined to queries devoid of Lewis structures, (21,26) whereas others incorporated structures without proper adaptation. (27) This approach hinders artificial intelligence’s ability to recognize the chemical structure of the compounds mentioned in the questions.
Another critical point is the cruciality of research in chemistry teaching related to the study of alternative conceptions. In this context, it becomes central to investigate the behavior of artificial intelligence in response to questions that may elicit such conceptions. The importance of this investigation lies in the GAI’s potential ability to identify, interpret, and adapt its responses to these alternative conceptions. This approach would allow for an assessment of how GAI interprets and responds to questions that may not follow the conventional understanding of chemical concepts; if the responses reinforce alternative conceptions, then understanding how GAI deals with concepts at different levels of complexity would enrich our knowledge about GAI and its applications in an educational context.
Furthermore, using images, schemes, or representations, it is essential to know how aspects related to the chemistry triplet─macroscopic, submicroscopic, and symbolic─are expressed. (28) Such analysis would allow inferences about the GAI image construction process, evaluating its effectiveness and accuracy. Understanding these mechanisms would not only enrich our knowledge about the capabilities of GAI but also indicate how its representations can be used in the educational process to facilitate the understanding of abstract chemical concepts.
By simulating three distinct kinds of prompts, beginner, intermediate, or advanced, this study attempts to investigate how changes to prompts affect refined Generative AI responses in creating chemical content. It seeks to discuss factors that may enhance or limit the use of these systems by diverse users in the field of Chemistry Education.
Therefore, the research questions can be articulated as follows:
How does prompt engineering influence the refinement of Generative AI responses in generating chemical content, both in text and in images, relevant to the content and formalism of chemistry? What is the capability of visual Generative Artificial Intelligence tools in recognizing and generating images relevant to the content and formalism of chemistry?

Methods

Click to copy section linkSection link copied!

The tests were conducted between November 2023 and May 2024, using the Windows 11 operating system and the Google Chrome browser. Text GAI was employed to address general and organic chemistry exercises, challenging chatbots to define concepts such as chemical bonds.

Selection of GAI Tools and Types of Tasks Performed

Aiming to analyze the capabilities of text and image GAI in producing and interpreting chemical content, emphasizing the precision and format of the questions relevant to the subject matter, in this study, we employed prompts that involved questions with text and image to assess the functionalities of seven specific Text and Image GAI, namely, ChatGPT 3.5, ChatGPT 4.0, Google Bard, Bing Chat, Adobe Firefly, Leonardo.AI, and DALL-E. These were selected based on their ability to generate text and/or images and their availability as either free or paid services, as detailed in Figure 1. The tests were divided into three types of tasks that AIs can perform: generation of text from textual commands, generation of images from textual descriptions, and prompts using computer vision to identify pictures and produce text. Figure 1 shows a chart of the Artificial Intelligences we discuss in this study, sorted by the kind of task they do─either with text or images, and whether they are free to use or require a paid subscription.

Figure 1

Figure 1. Flowchart illustrating the segmentation of GAI based on access type (free or paid) and capability (text or image generation).

For the first type of task, text generation, we asked different AI systems to define the various types of chemical bonds using a systematic approach in triplicate. Thus, in separate conversations, we queried each AI three times for definitions of covalent, ionic, and metallic bonds. The prompts were tailored according to the prompt’s profile. Four types of text-generative AIs, three types of bonds, and three identical prompts per level resulted in 108 prompts exclusively on chemical bonds.
The second type of task required the interpretation of images related to organic chemistry content such as resonance, energy diagrams, and reaction mechanisms. The evaluation of these images focused on the resolution of chemistry exercises, covering both the development of text descriptions and the formulation of solutions to the challenges presented. The images used in the computer vision identification parts were specifically created for this research to prevent identification based on matches with images available on the Internet, thereby ensuring the originality of the analyzed material and the validity of the results obtained in the identification by the AI. Additionally, the instruction “Could you identify what is happening in this image?” was employed immediately before presenting the corresponding image. The descriptive textual content of the images was intentionally omitted to avoid influencing the responses, aiming to assess the accuracy with which the AI identified chemical information in the images.
For the third type of task “image generation”, the four previously referenced AI generators were tasked with producing visual representations for three concepts: chemical bonds, Lewis structures, and atomic models. This resulted in 12 images of chemical bonds, 8 of Lewis structures, and 16 of atomic models.

Prompt Development

This sequence of prompts has two objectives: first, to recognize that different levels of prompting can lead to varied Artificial Intelligence responses, influenced by the level of contextualization of the specific content and the methodology of the prompt formulation. This encompasses both the specifications regarding technology and the chemistry content embedded in the question. Subsequently, the objective is to evaluate direct and sequential prompts to refine the responses in alignment with the principles of prompt engineering. (29−31)
The beginner category comprises basic prompts designed to simulate prompts that reflect little to no experience in GAI technology and require a knowledge of chemistry. The intermediate category addresses simulated prompts with more developed digital skills yet still with limited proficiency in chemistry; therefore, there is no simulation of an iterative process in this user. Finally, the advanced category is aimed at a prompt level that reflects digital competence and chemistry proficiency. For the last category, specific subprompts were developed to replicate the previously provided response, if necessary.
Table 1 displays the prompts related to covalent bonds, exemplifying the prompt levels. For further information on other prompts, it is recommended to consult the Supporting Information.
Table 1. Prompts Were Developed for the Content on Covalent Bonding and Categorized According to the Type of Prompt, Reflecting Users’ Level of Digital Competence and Proficiency in Chemistrya
Prompt LevelDescriptionPrompt
BeginnerPrompt developed to simulate a user with low levels of competence in both digital skills and chemistry.Define covalent bond (SI, pg.2)
IntermediatePrompt developed to simulate a user with digital proficiency but lacking skills in chemistry.I need a well-structured definition of covalent bonding that is suitable for a higher level. It should be noted that I do not have a refined knowledge of chemistry, therefore, please make your answer consulting reliable sources. (SI, pg. 6)
AdvancedPrompt developed to simulate a user with high digital proficiency and competence in chemistry.As a general chemistry professor in higher education, when requesting a definition of chemical bonding, I encountered a response that presented alternative conceptions. In the provided explanation, covalent bonds were inaccurately compared to intermolecular interactions, which does not accurately reflect the true nature of these bonds. Additionally, there was a tendency to state that bonds occur for the purpose of achieving a noble gas configuration, which is an incorrect notion, as this configuration is a consequence of the formation of the chemical bond, not its primary objective. Faced with these issues, I would like you to produce a definition of covalent bonding aligned with the expectations of a higher education course. This definition should precisely and concisely address the fundamental principles of covalent bonds, avoiding inappropriate comparisons and clarifying the true reason why chemical bonds occur. The emphasis should be on understanding electronic interactions and the formation of stable molecular structures. (SI, pg. 15)a
a

For the advanced prompts, aimed at simulated users proficient in chemistry, subprompts were created to correct potential conceptual errors in the initial response, striving for improvement. This practice is not applied to beginner and intermediate users, as they would not be able to identify issues related to chemistry content.

Data Analysis

The choice of content was based on three main aspects, which also guided the analysis of the results. First, the complexity or level of knowledge requested through the questions asked for the AI and GAI, in terms of knowledge typologies based on Bloom’s Taxonomy Revisited (TBR). (32) Thus, in this case, we evaluated prompts that required low cognitive levels from the GAI, such as defining, describing, and explaining, associated with the verbs “memorize” and “understand” from TBR. A second aspect relates to concepts with alternative conceptions or associated elements of common sense. The third aspect considered is the production and interpretation of images related to representations and chemical language from the point of view of the macroscopic, symbolic, and submicroscopic levels. The aspects related to alternative conceptions and representation levels are detailed in the analysis for each prompt.
The responses were analyzed using the MaxQDA (33) software and categorized according to their origin: textual responses derived from text prompts (prefix Ct) and textual responses resulting from image interpretation (prefix Ci). We employed thematic content analysis, as outlined by Bardin, (34) exclusively to evaluate textual responses.
According to Bardin’s framework, the units of analysis are fragments of meaning associated with a specific category. These can vary in length, from a short passage to a longer one, which, within a specific context, provides relevant meaning to the assigned category. (34) We adopted this framework in our analysis, and the units are sentences. If they are within the context of the text, which is found in the Supporting Information, they present the meaning for that category. Furthermore, multiple categories can classify a single segment. We applied specific categories to the generated images to evaluate the presence of elements, symbolism, and macroscopic aspects. The categories were developed based on data analysis (Table 2), the categories Ct1, Ct2, and Ci1 were created exclusively to differentiate the prompt (which simulates the user’s expertise through prompt engineering) from the response to the prompt.
Table 2. Categories for Analyzing Responses to Text Generation, Image Interpretation, and Image Creation Prompts
TaskCategory of Content AnalysisDescriptionExample
Textural response from textual promptCt1Initial PromptDefine Covalent bond (SI, pg. 2)
Ct2SubpromptSubprompts only exist for advanced prompts and vary from conversation to conversation.
Ct3Content appropriate to scientific consensusA covalent bond is a chemical bond that arises from the sharing of an electron pair between two atoms. (SI, pg.4)
Ct4Alternative conceptions and common student difficulties[...]share one or more pairs of electrons to achieve a more stable electron configuration... (SI, pg.2)
Ct5Aspects related to the macroscopic, submicroscopic, and symbolictable salt (NaCl)... (SI, pg. 31)
Ct6Presence of referencesFor more detailed information, you can refer to the article on Britannica. (1) (SI, pg. 52)
Ct7Content related to the promptCertainly! Let us provide a precise and concise definition of ionic bonding... (SI, pg. 50)
Ct8UncategorizedIf you have any other questions, please do not hesitate to ask. (SI, pg. 74)
Textural response from image promptCi1Initial PromptAn image for identification (SI, pg. 104)
Ci2Accurate Identification of Chemical Content in the ImageMarked if the chatbot correctly identified the image content. (SI, pg. 104)
Ci3Inaccurate Identification of Chemical Content in the ImageMarked if the chatbot incorrectly identified the image content. (SI, pg. 106)
Ci4Correct and prompt-related Chemical ContentText content correctly generated, both in relation to the image and the associated chemistry content. (SI, pg. 104)
Ci5Incorrect but prompt-related Chemical ContentIncorrect chemistry content but related to the image prompt. (SI, pg. 109)
Ci6Presence of Unrelated Elements to the Prompt ContentContent unrelated to the prompt. (SI, pg. 106)
Ci7Presence of Descriptive Elements of PromptsDescriptive text of the prompt, such as recognizing and transcribing phrases from images. (SI, pg. 108)
Generated imagesIg1Correct General Appearance According to Prompt RequestAlthough it is possible that the image is erroneous, its general appearance matches the prompt request and models found on the Internet. (SI, pg. 89)
Ig2Incorrect General Appearance According to Prompt RequestThe image does not match the prompt requests or models found on the Internet. (SI, pg. 86)
Ig3Symbolic Level RepresentationContains some type of chemical symbolism, example: Na. (SI, pg. 88)
Ig4Material that refers to the Macroscopic LevelContains some aspect related to the macroscopic level, whether material or object. Examples include shadows, metallic shines, and books, among others. (SI, pg. 85)
Ig5Same Particle in Different SizesParticles that represent the same entity, for example, electrons, are in different sizes. (SI, pg. 85)
Ig6Same Particle in Different ColorsParticles that represent the same entity, for example, electrons, are in different colors. (SI, pg. 85)
Ig7Energy RepresentationContains something that can be related to energy representation. Example: electric rays. (SI, pg. 88)
Ig8Entity Connecting ParticlesPresence of some object that connects particles, example: sticks. (SI, pg. 87)
Ig9Unrecognizable TextPresence of undecipherable text. (SI, pg. 88)
Finally, two prompts were developed to evaluate alternatives to using the Dall-e image generation engine in ChatGPT. Described in Table 3:
Table 3. Alternative Prompts Developed for Image Generation Using DALL-E
Prompt TypeDescription
Simple“Using Dall-e, create a representation of the Lewis structure of methane CH4.”
Advanced“Using Dall-e, develop a visual 3D representation of the Lewis structure for the methane molecule, CH4. The drawing should display the carbon atom at the center, surrounded by four hydrogen atoms, indicating a tetrahedral geometry. Each bond between carbon and hydrogen should be represented by a single line, symbolizing the single covalent bond. It is crucial that the drawing highlights the valence electrons of carbon and hydrogen, using dots or small circles around the atoms to represent the electrons. The aim is to create a didactic and accurate representation that can be used for educational purposes, facilitating the understanding of the basic molecular structure of methane according to the Lewis model.”

Results and Discussion

Click to copy section linkSection link copied!

Textual Responses

We will provide the outcomes and evaluations of text production from four distinct GAI, considering the diversity of the complexity of prompts. This investigation encompasses three separate categories of simulated prompts. Figure 2 displays graphs that represent the average of encoded segments in the analysis of responses to text generation prompts about various types of chemical bonds. In total, 2,896 segments were coded and distributed according to the categories mentioned above. The following discussion simplifies the main alternative conceptions about chemical bonding in the generative AI’s responses.

Figure 2

Figure 2. Graphs quantifying the proportional average of categorized segments in relation to the content of the chatbots’ individual responses, pertaining to three types of prompts on chemical bonds: covalent, ionic, and metallic.

Before the results are discussed, it is crucial to note the variation in response length among the chatbots. For instance, Google Bard’s responses were notably more extended, resulting in more encoded segments (ES); however, this does not necessarily indicate superior quality. Furthermore, a more significant presence of alternative conceptions in a chatbot does not automatically equate to lower response quality, as the type of conception varies among them. For a more accurate analysis, we compared the graphs of each chatbot individually, assessing how different kinds of prompts influence responses on the same chemical topic.
The prompts for defining chemical bonds were conducted in triplicate; as a result, Figure 2 presents the average number of encoded segments (ES) for each of the established categories. Analyzing the categorization of results, it is possible to notice a more correct content. The number of sections with alternative conceptions decreases for the intermediate and advanced prompts when compared proportionally to the increase in the number of sentences categorized as Ct3. This means that a prompt that involves more digital proficiency and knowledge of chemistry will be able to generate quality material designed for teaching chemistry, either as a teacher or with the student. However, analyzing the mistakes made and their possible impacts is necessary.
The analysis demonstrated that segments categorized under Ct3 are predominant in all responses (854 ES), compared to Ct4 (168 ES). However, this predominance does not ensure the quality of the response, as a single inaccurate sentence can compromise the outcome. For instance, the statement “Covalent bonds typically form between non-metal atoms, as these elements tend to gain, lose, or share electrons to achieve a stable electron configuration” (Beginner Prompt – ChatGPT 3.5) could lead to the alternative conceptions that covalent bond formation involves the gain and loss of electrons, causing students to confuse covalent with ionic bonds. Yik and Dood (35) documented similar findings, noting that although prompt engineering enhances the sophistication of ChatGPT’s explanations, it does not necessarily ensure the accuracy of the responses.
In the Ct4 category, the responses from all chatbots exhibited a teleological approach, (36) suggesting that phenomena, such as chemical bonds, aim at specific objectives like reducing the system’s energy level or achieving a complete octet. Terms like “atoms seek”, “in order to achieve”, and “desire of atoms” for a lower energy level personify atoms, adding an animistic aspect to the responses.
We noticed that when the prompts grew more intricate, ranging from intermediate to advanced levels, the chatbots tried to align more closely with the prompts. This was particularly evident when responses from more advanced users tended to categorize segments into Ct7 in prominent parts of the responses to subprompts. However, this did not necessarily result in improved responses. Next, we present a discussion on common alternative conceptions among students about chemical bonds, which also surfaced in the texts produced by Generative AI.
Various alternative conceptions about chemical bonds are common among students. One such belief is that bonds form to complete the atoms’ octet. (37) Another common alternative conception is that in covalent bonding, atoms share only one electron. (38) There is also frequent confusion between electron transfer and ionic bonding, with many students believing that chemical bonds are limited to situations of electron donation or acceptance. (39) The notion that intermolecular interactions are equivalent to chemical bonds, such as covalent ones, is also a recurring error. (40) There is also an alternative conception that metallic bonding is not a chemical bond and would be weaker than covalent and ionic bonds. (37) Finally, it is common among students to believe that breaking chemical bonds releases energy, (41) among other incorrect conceptions. In this context, it is evident that the GAI textual responses reflected a variety of alternative conceptions that are frequently exhibited by students, albeit in a more structured and suggestive manner. Table 4 illustrates segments that are classified as alternative conceptions and student difficulties. The remaining responses are found in Supporting Information (SI).
Table 4. Examples of Segments Categorized as “Alternative Conceptions and Common Student Difficulties (Ct4)”, Organized by Type of Prompt and GAI
GAIPrompt levelSegment
Google BardBeginnerCovalent bonds are formed when atoms of different elements share electrons to achieve a stable electron configuration. (Ct4) (SI, pg. 5)
IntermediateAtoms strive for a stable electronic configuration, typically an octet (eight electrons) in their outermost shell (valence shell). (Ct4) (SI, pg. 12)
AdvancedaThe formation of the bond occurs as a result of seeking a lower energy state, where atoms achieve a more stable electronic configuration. This increased stability is the main driving force behind the formation of covalent bonds. (Ct4) (SI, pg. 24)
Bing ChatBeginnerIt is formed when two atoms have similar electronegativity and share one or more pairs of electrons to achieve a more stable configuration. (Ct4) (SI, pg. 5)
IntermediateImagine two atoms holding hands, each contributing an electron to the bond. (Ct4) (SI, pg. 10)
AdvancedaHowever, it is important to note that the formation of the covalent bond is not motivated by the goal of achieving a noble gas configuration, but rather by the need to stabilize the molecular structure. (Ct4) (SI, pg. 21)
ChatGPT 3.5BeginnerCovalent bonds typically form between nonmetal atoms, as these elements have a tendency to gain, lose, or share electrons to achieve a stable electron configuration. (Ct4) (SI, pg. 2)
IntermediateA covalent bond is a type of chemical bond in which two atoms share pairs of electrons to achieve a more stable electronic configuration. (Ct4) (SI, pg. 6)
AdvancedaContrary to the misconception that atoms seek to attain the noble gas configuration, the formation of covalent bonds is a natural consequence of the pursuit of greater electronic stability. (Ct4) (SI, pg. 16)
ChatGPT 4.0BeginnerThis sharing allows each atom to attain the electron configuration of a noble gas, achieving greater stability. (Ct4) (SI, pg. 2)
IntermediateCovalent bonding is a fundamental chemical interaction where atoms share pairs of electrons to form molecules. (Ct4) (SI, pg. 11)
AdvancedaA covalent bond is a type of chemical bond that occurs when two atoms share one or more pairs of electrons in their valence shells, with the aim of achieving a more stable electronic configuration. (Ct4) (SI, pg. 18)
a

For the advanced prompts aimed at users proficient in chemistry, subprompts were created to correct potential conceptual errors in the initial response, aiming for improvement. This practice is not applied to beginner and intermediate users, as they would not be able to identify issues related to chemistry content.

Regarding students’ perceptions of representational levels in chemistry, it is common for students to have difficulties transitioning between different levels, (29) often attributing macroscopic characteristics, such as color and brightness, to submicroscopic entities. Furthermore, students face challenges in relating the representations of chemical equations to the molecular system, often manipulating only the symbols without forming an adequate mental model of the represented process. (42) The responses from the GAI integrate elements from various representational levels of chemistry. Notably, the symbolic level was predominant in the responses concerning covalent bonds, whereas the macroscopic and submicroscopic levels were more accentuated in the ionic and metallic bond responses. For example, the description of metallic bonding often included characteristics such as luster, conductivity, and malleability (see Google Bard in the SI, pg. 59). In ionic bonding, macroscopic characteristics like high melting and boiling points were common (see ChatGPT 3.5 in the SI, pg. 45). For covalent bonding, such material characteristics were less mentioned, with the responses focusing more on the symbolic level of knowledge (see Google Bard in the SI, pg. 13).
Although the results indicate an increased trend toward prompt conformity, alternative conceptions, especially teleological biases, remained present in advanced prompts, suggesting a persistence of errors in GAI. The lack of proportional improvement in responses relative to the detail of prompts indicates variability that requires meticulous user analysis, regardless of the initial request’s specificity and quality. These findings emphasize the need to develop digital and chemical competencies that enable users, whether students or teachers, to identify explanatory biases, analogy issues, and conceptual errors in the responses.
Despite chatbots exhibiting alternative conceptions and explanatory biases in their textual responses, their use in educational settings is notably beneficial. They enable students to evaluate and critique the responses provided by chatbots, fostering active learning. (43,44) Furthermore, analyzing the textual construction and the strategies employed to address questions can afford students profound insights, enhancing their critical thinking skills and content understanding.

Image Interpretation by the Machine and Generation of Textual Response

The images selected for the study encompass organic chemistry topics: acid–base reaction mechanisms, resonance of the tosylate ion, coordinate graphs for bimolecular nucleophilic substitution reactions, diagrams of these reactions, and exercises in classifying acidity and basicity. These subjects were chosen due to their chemical formalism, including Lewis structures and electron movement indicated by arrows, among others. (45,46) The objective was to assess the capability of the GAI computational vision to recognize these formalisms and generate responses that align with the content presented in the prompts. Figure 3 presents the set of imagery prompts used in the analysis.

Figure 3

Figure 3. Series of visual prompts for detection by chatbots, including (1) an acid–base reaction mechanism; (2) resonance of the tosylate ion; (3) coordinated diagram of the Sn2 reaction; (4) Sn2 reaction mechanism involving methyl bromide and hydroxide; (5) comparative analysis of acidity and basicity. The numbers correspond to those represented in Figure 4.

Among the five prompts provided, only ChatGPT 4.0 accurately identified the content in all images, despite some inaccuracies in the generated text. Both Bing Chat and Google Bard correctly responded to two of the five prompts, displaying inferior performance in identification and text generation. The outcomes of this analysis are depicted in Figure 4.

Figure 4

Figure 4. Portrait grids, each consisting of 40 × 40 squares, for (i) ChatGPT 4.0; (ii) Google Bard; (iii) Bing Chat. The circled numbers denote the prompts, with green indicating the number of prompts each GAI correctly identified and red denoting those identified incorrectly. The colors of the squares in the portraits correlate with the categories listed at the bottom of the figure into which the text segments of the responses have been classified. The black line rectangle highlights the correct interpretation of the image made available to the machine. The numbers correspond to those image prompts represented in Figure 3, with green for correct identification and red for incorrect identification.

Figure 4 presents a graphical comparison of the accuracy in image content identification by three AI assistants: ChatGPT 4 Vision, BingChat/Copilot, and Google Bard/Gemini. Divided into three panels corresponding to each assistant, the figure displays lines of colored circles, where each color indicates the proportion of a specific category in the response. Below each panel, a numerical sequence from 1 to 5 categorizes the questions: in red, those incorrectly identified, and in green, those correctly identified; the black rectangles have been added to facilitate the identification of the start of the AI’s response regarding image interpretation. The questions refer to Figure 3.
This portrait graph allows for the perception of the whole and, primarily, the discrepancy between the responses of the different computer vision systems analyzed. The dominance of “Accurate Identification of Chemical Content in the Image, Ci2” in ChatGPT 4.0 Vision highlights its proficiency in interpreting image prompts and providing accurate chemical responses, including the correct identification of molecules and arrow types, except for the inaccuracies noted in Table 5. Alasadi and Baiz (47) reported a similar result, observing that ChatGPT 4.0 demonstrated proficiency in identifying and interpreting images with various chemical contents, including diagrams, equations, and digital and handwritten text, even with variations in the quality of the image presented as a prompt.
Table 5. Examples of Segments Classified as “Incorrect but Prompt-Related Chemical Content” from Image Interpretation Prompts
GAIPromptSegment
ChatGPT 4.0Coordinate graph for Sn2 reaction involving methyl bromide and hydroxide ion.The diagram specifically illustrates a chemical process where a hydroxide ion (OH) is attacking a bromine molecule where one of the hydrogens is partially positive (indicating it is susceptible to nucleophilic attack), leading to the formation of water (H–OH) and a bromide ion (Br). (SI, pg. 108)
Google Bard/GeminiDiagram of Sn2 reaction involving methyl bromide and hydroxide ion.The reaction can be written as follows: HO + H2Br → HO2 + Br (SI, pg. 111)
Bing Chat/CopilotAn acid–base reaction mechanism involving H2O and OH.The equation is OH + H3O+ → H2O + OH. This is a neutralization reaction. (SI, pg. 104)
In contrast, Google Bard/Gemini and Bing Chat/Copilot showed a higher frequency in category Ci6, which includes elements irrelevant to the prompt, reflecting their failure to identify the content in 3/5 of the prompts. Even in the prompts where they correctly identified the context, the responses were imprecise, with a predominance of categories Ci5 and Ci6. Table 5 shows each chatbot’s segment as “Incorrect and Prompt-Related Chemical Content”.
The error illustrated in the first segment (Table 5) relates to ChatGPT 4.0s failure to accurately recognize the Lewis structure as specified in the prompt, which explicitly showed hydrogen atoms but not carbon atoms (refer to the Supporting Information image). Modifying the image to make the carbon atom explicit, the chatbot correctly identified the partial positive charge on the carbon and its ensuing reaction with the bromide ion. This result suggests that ChatGPT 4.0 encountered difficulties like those of novice organic chemistry students, (43) who often overlook implicit atoms in structures. It is essential to develop the skill of creating prompts suitable for the capabilities and limitations of the artificial intelligence in question. This competence will become increasingly crucial to teaching to students.
The analysis of the second segment (Google Bard) indicates that in contrast to ChatGPT 4.0, Google Bard failed to recognize that the reaction substrate was an alkyl bromide. It produced a response including H2Br in a completely unbalanced reaction diagram devoid of chemical relevance to the context. Such a response suggests that the chatbot generated a response with fabricated information due to its inability to identify the correct structure. The segment from Bing Chat’s response reveals that the chatbot incorrectly replaced a water molecule with a hydronium ion and identified H2O and OH as reaction products, interpreting the process as a neutralization reaction. It is important to note that the clear and neutral language used in the responses from Google Bard and Bing Chat could lead users without a chemistry background to misinterpret the information, accepting it as accurate, because of the confidence that the text conveys.

Image-GAI for Atomic Models, Chemical Bonds, and Lewis structures

GAI does not present accurate responses when directly questioned to construct representations of the chemical content. Furthermore, some elements presented in the images may reinforce conceptual errors or alternative conceptions, as discussed below. Figure 5 presents a selection of images produced by generative image artificial intelligence, focusing on covalent bonds. (The other generated and analyzed images are available in the Supporting Information, pages 83 to 92).

Figure 5

Figure 5. Images generated by AI in response to the prompt “Draw a representation of the concept of covalent bonding”, produced respectively by ChatGPT 4.0, Bing Chat, Leonardo.AI, and Adobe Firefly.

The AI-generated representations of covalent bonding tended to depict closely positioned nuclei with particles of varied sizes and colors. Electrons, represented as smaller points with distinct colors, are sometimes shown with trajectories, suggesting shared orbits. These representations can lead to confusion due to the unclear distinction between nuclei and electrons. The choice of material in the illustration, imparting a metallic sheen to the particles, may incorrectly imply macroscopic characteristics at the submicroscopic level. Figure 6 displays graphs related to the quantity of image segments that are specifically categorized. We used the MaxQDA (33) software in this analysis to code parts of the images instead of sentences. For example, we identify and classify regions of the images that show representations of electrons of different sizes according to the category Ig5.

Figure 6

Figure 6. Graphs quantifying the number of categorized segments in relation to the content of the chatbots’ individual responses for generated images, pertaining to three types of prompts: atomic models, chemical bonds and Lewis structures. (Additional images are available in the Supporting Information, pg. 96).

The results indicate significant variation in how different aspects of the content are represented in the images, resulting in visual representations that are often inaccurate. This suggests that although some images contain correct elements (as indicated by the green bars), there are inconsistencies in the representation of content across all generated images. This pattern in the graphs highlights the need for adjustments or improvements in image generation technology to ensure that representations are accurate and consistent with the specifications outlined in the prompts before they can be used for educational purposes, and it is advisable to use traditional search engines such as Google and Bing rather than relying on AI-generated images.
For instance, for representations of ionic bonds, AIs attempted to illustrate electron transfer in ionic bonding through their representations of ionic bonds. Like textual responses, there was a tendency to reduce ionic bonds to the phenomenon of electron transfer (SI, pg. 87). For metallic bonding, the illustrations depicted metallic characteristics such as color and luster, again applying macroscopic features to submicroscopic entities, (SI, pg. 88). None of these representations were considered accurate. We observed significant hallucinations on several occasions, exemplified by Adobe Firefly’s improper representation of metallic bonding, which depicted metal armor, (SI, pg. 88).
Considering that OpenAI, the developer of ChatGPT, classifies content related to chemical, biological, radiological, and nuclear (CBRN) risks, as well as general scientific knowledge, as sensitive topics in its DALL-e API, it indicates that the system is intentionally designed not to accurately represent these themes. (48) Bearing this in mind, we requested that DALL-e represent simple molecules such as H2O, H2SO4, and CH4 to assess its representational capability. However, whether this restriction applies to all molecules or only those potentially used in chemical weapons production is unclear.
To explore the capabilities and limitations in image generation, we requested ChatGPT 4.0 to create images using its Python code generator, replacing “Using DALL-e...” with “Using Python Code...” at the beginning of both prompts. The results are located in Box 1.
Box 1

Differences in the Quality of Lewis Structures Generated with Prompts in DALL-E and Python

Variations in the Quality of Lewis Structures Generated with Prompts in DALL-E and Python, (i) Beginner prompt using DALL-e; (ii) Advanced prompt using DALL-e; (iii) Beginner prompt using Python; (iv) Advanced prompt using Python.
DALL-e was replaced with Python code, resulting in representations more aligned with the chemical formalism. We categorized the generated Lewis structures based on correct or incorrect atom connectivity and structure geometry. We observed that the atom connectivity was correct in all cases from the simplest to the most detailed prompts. However, only the more detailed prompts exhibited geometric accuracy. For example, CH4 generated by the first Python prompt correctly displayed atom connectivity and the number of bonds, matching a canonical Lewis structure. The CH4 from the advanced prompt showed correct connectivity, valence, and the requested three-dimensional tetrahedral geometry. Other tested molecules repeated this pattern, indicating that ChatGPT 4.0 does not produce accurate representations with DALL-e, but the request to use Python for image generation overcomes this limitation.
We emphasize that our aim was not to circumvent the chatbot’s image generation policy but to explore its limitations, capabilities, and potential educational applications. Given the increasing integration of artificial intelligence in the educational setting, activities that encourage students to create molecular representations using the chat can be beneficial for teaching Lewis structures and VSEPR Theory (Valence Shell Electron Pair Repulsion).

Conclusion

Click to copy section linkSection link copied!

The findings indicate that the GAI performs adequately when addressing chemical content in textual prompts, with all analyses highlighting the category Ct3 (appropriate to scientific consensus) as being the most prominent. However, incorrect segments were identified in the responses. A trend was observed in the approach to chemical bonding concepts, particularly the teleological bias. In various responses from different chatbots, the use of terms suggesting a purpose in chemical bonds, such as the pursuit of a lower energy state or the completion of an octet, was expected. Furthermore, it was noted that the chatbots maintain the teleological bias even after being cautioned about this inclination.
Chatbots tend to simplify the description of ionic bonding, mainly focusing on electron transfer and using the “sea of electrons” analogy for metallic bonds. Although such simplifications pose issues, it is feasible for educators to use them as starting points for classroom discussions, thereby nurturing critical thinking in students concerning the responses provided by artificial intelligence. This skill is essential as there is no assurance that the generated responses, although predominantly correct, will be free from conceptual errors.
The results for image interpretation prompts using computational vision revealed a significant variance among artificial intelligence, contrasting with textual prompts. ChatGPT 4.0, the sole paid version, exhibited proficiency in identifying chemical content, Lewis’s structure formalisms, and arrow orientations, generating textual responses that aligned with the image content despite minor inaccuracies, especially in molecule identification, which did not significantly impact the response quality. Conversely, Google Bard and Bing Chat performed poorly, correctly identifying the image context in less than 50% of the prompts and, even when accurate, including conceptual errors in the responses. Therefore, caution is advised when utilizing computational vision for immediate assistance or feedback on specific content or exercises by students or teachers.
In image generation, the evaluated artificial intelligence could not create chemical content without resorting to “hallucinations”, instances where the AIs fail to provide the appropriate response and generate fanciful elements based on the available information. However, when the AIs are requested to create a Python code that plots the desired chemical structure, the results show greater alignment with the chemical formalisms.
Content generated by Artificial Intelligence should not be viewed as a finished product but rather as a raw material akin to clay, necessitating human manipulation and refinement. In this context, AI can significantly reduce labor hours in routine tasks by providing a starting point for texts, drawings, or other media that do not need to be initiated from scratch but are enhanced and tailored to specific requirements.
Attention should be paid to the fact that as systems are still under development and have recently been accessed by many users, their use requires knowledge and digital competence. The data indicate that artificial intelligence (AIs) better adapt the content of responses based on the formulation of the question without ensuring that the responses are free from alternative conceptions or explanatory biases. Therefore, it is crucial to critically analyze and develop this skill in students in training.
We consider that GAI systems have the potential to be used as complementary educational resources to those already existing. The data from this study illustrate the feasibility of creating teaching materials if teachers critically assess the content and are willing to make modifications. Although high-quality educational materials are available, creating structures that consider specific educational contexts can support personalized learning.
Given these aspects, we can consider some direct implications for both teachers and students. Users of GAI systems should understand that they must analyze the generated content before using it. For example, when using GAI to assist in the production of content for educational activities, teachers can use these systems with the goal of producing content embedded in a teaching sequence. However, conceptual aspects must be evaluated to avoid potential errors or ideas that propagate alternative conceptions. It is noted that textual content and images have different standards, and recognizing these aspects can be useful for teachers in their planning and use of such content as well as for students when they turn to these systems to seek information.
Another important aspect concerns the interpretation of the generated responses and the learning of how to use the tools. Understanding the essential knowledge about technology and the specific content of a scientific field is crucial for the effective use of GAI systems. As the data show, there are errors and alternative conceptions in the material produced by GAI. Evaluating these productions can be a valuable educational exercise in critical analysis, in terms of both content and resource use. This can be applied in training courses for chemistry teachers as well as in the training of other chemistry professionals or related fields who will use GAI in their professional practice. In this way, by sharpening students’ critical thinking about the use of technology, even the conceptual errors of AI can be beneficial.
This is another implication that we can highlight. Chemical content analysis is one aspect to consider. However, it is important for teachers and students to have a basic understanding of the operation of the technological tool, the different levels of prompt generation, and “how to ask a question”. In the specific context of GAI systems, secondary and higher education, particularly teacher training courses, must incorporate the appropriate question format and necessary dialogue to train the machine, thereby establishing GAI as a valuable educational resource.
In this moment, we consider these systems to be an additional technological resource available, and given their contemporaneity, we emphasize the need to investigate different possibilities and limitations, as presented in this study, and we emphasize that the use of GAI systems should be mediated by the teachers and integrated into an appropriate and structured educational plan.

Supporting Information

Click to copy section linkSection link copied!

The Supporting Information is available at https://pubs.acs.org/doi/10.1021/acs.jchemed.4c00230.

  • Methodologies and results of a study on the performance of artificial intelligence in generating and interpreting chemical content; Four main sections focusing on different AI applications: Generative Text AI, Generative Image AI, Generative Image AI using Python codes, and AI capable of analyzing images and generating text (PDF)

Terms & Conditions

Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

Author Information

Click to copy section linkSection link copied!

  • Corresponding Author
  • Authors
    • Carla Morais - CIQUP, IMS, Science Teaching Unity, Department of Chemistry and Biochemistry, Faculty of Sciences, University of Porto, 4169-007 Porto, PortugalOrcidhttps://orcid.org/0000-0002-2136-0019
    • Gildo Girotto Júnior - IQ-UNICAMP, University of Campinas, Institute of Chemistry, Department of Analytical Chemistry, 13083-97 Campinas, BrazilOrcidhttps://orcid.org/0000-0001-9933-100X
  • Funding

    The Article Processing Charge for the publication of this research was funded by the Coordination for the Improvement of Higher Education Personnel - CAPES (ROR identifier: 00x0ma614).

  • Notes
    The authors declare no competing financial interest.

Acknowledgments

Click to copy section linkSection link copied!

The authors would like to thank the University of Campinas (UNICAMP), the São Paulo State Research Support Foundation (FAPESP), the Faculty of Sciences of the University of Porto, and members of the Research Group of Education in Science (Grupo de Pesquisa em Educação em Ciências – PEmCiE).

References

Click to copy section linkSection link copied!

This article references 48 other publications.

  1. 1
    Alpaydın, E. Introduction to Machine Learning, 2nd ed.; MIT Press: Cambridge, MA, 2020.
  2. 2
    Baum, Z. J.; Yu, X.; Ayala, P. Y.; Zhao, Y.; Watkins, S. P.; Zhou, Q. Artificial Intelligence in Chemistry: Current Trends and Future Directions. J. Chem. Inf. Model. 2021, 61 (7), 31973212,  DOI: 10.1021/acs.jcim.1c00619
  3. 3
    Senior, A. W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Hassabis, D. Improved Protein Structure Prediction Using Potentials from Deep Learning. Nature. 2020, 577 (7792), 706710,  DOI: 10.1038/s41586-019-1923-7
  4. 4
    Segler, M. H.; Kogej, T.; Tyrchan, C.; Waller, M. P. Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks. ACS Cent. Sci. 2018, 4 (1), 120131,  DOI: 10.1021/acscentsci.7b00512
  5. 5
    Segler, M. H.; Preuss, M.; Waller, M. P. Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature. 2018, 555 (7698), 604610,  DOI: 10.1038/nature25978
  6. 6
    Carleo, G.; Troyer, M. Solving the Quantum Many-Body Problem with Artificial Neural Networks. Science. 2017, 355 (6325), 602606,  DOI: 10.1126/science.aag2302
  7. 7
    Scott, K. C. A Review of Faculty Self-Assessment TPACK Instruments (January 2006–March 2020). Int. J. Inf. Commun. Technol. Educ. 2021, 17 (2), 118137,  DOI: 10.4018/IJICTE.2021040108
  8. 8
    Hedtrich, S.; Graulich, N. Using Software Tools To Provide Students in Large Classes with Individualized Formative Feedback. J. Chem. Educ. 2018, 95 (12), 22632267,  DOI: 10.1021/acs.jchemed.8b00173
  9. 9
    Grando, J. W.; Cleophas, M. G. Aprendizagem Móvel no Ensino de Química: Apontamentos Sobre a Realidade Aumentada. Quím. Nova Esc. 2021, 43 (2), 148154,  DOI: 10.21577/0104-8899.20160236
  10. 10
    Kelly, R. M. The Effect That Comparing Molecular Animations of Varying Accuracy Has on Students’ Submicroscopic Explanations. Chem. Educ. Res. Pract. 2017, 18 (4), 582600,  DOI: 10.1039/C6RP00240D
  11. 11
    Lopes, A. C. C. B.; Chaves, E. V. Animação Como Recurso Didático no Ensino da Química: Capacitando Futuros Professores. Educitec-Rev. Estud. Pesq. Sobre Ensino Tecnol. 2018,  DOI: 10.31417/educitec.v4i07.256
  12. 12
    Tauber, A. L.; Levonis, S. M.; Schweiker, S. S. Gamified Virtual Laboratory Experience for In-Person and Distance Students. J. Chem. Educ. 2022, 99 (3), 11831189,  DOI: 10.1021/acs.jchemed.1c00642
  13. 13
    Leite, B. S. Tecnologias Digitais na Educação: Da Formação à Aplicação; Livraria da Física: São Paulo, 2022.
  14. 14
    Lucas, M.; Moreira, A.; Costa, N. Quadro Europeu de Referência para a Competência Digital: Subsídios para a Sua Compreensão e Desenvolvimento. Observatorio (Obs) 2017,  DOI: 10.15847/obsOBS11420171172
  15. 15
    Redecker, C.; Punie, Y. Digital Competence of Educators; Punie, Y., Ed.; 2017.
  16. 16
    INTEF. Marco Común de Competencia Digital Docente. Ministerio de Educación, Cultura y Deporte, del Gobierno de España, 2017. https://bit.ly/2jqkssz (accessed August 5, 2024).
  17. 17
    Paiva, J. C.; Da Costa, L. A. Exploration Guides as a Strategy to Improve the Effectiveness of Educational Software in Chemistry. J. Chem. Educ. 2010, 87 (6), 589591,  DOI: 10.1021/ed1001637
  18. 18
    Paiva, J.; Morais, C.; Costa, L.; Pinheiro, A. The Shift from “e-Learning” to “Learning”: Invisible Technology and the Dropping of the “e. Br. J. Educ. Technol. 2016, 47 (2), 226238,  DOI: 10.1111/bjet.12242
  19. 19
    Wohlfart, O.; Wagner, I. The TPACK Model - A Promising Approach to Modeling the Digital Competences of (Prospective) Teachers? A Systematic Umbrella Review. Z. Padagogik. 2022, 68 (6), 846868,  DOI: 10.3262/ZP0000007
  20. 20
    Mishra, P.; Koehler, M. J. Technological Pedagogical Content Knowledge: A Framework for Teacher Knowledge. Teach. Coll. Rec. 2006, 108 (6), 10171054,  DOI: 10.1111/j.1467-9620.2006.00684.x
  21. 21
    Clark, T. M. Investigating the Use of an Artificial Intelligence Chatbot with General Chemistry Exam Questions. J. Chem. Educ. 2023, 100 (5), 19051916,  DOI: 10.1021/acs.jchemed.3c00027
  22. 22
    Emenike, M. E.; Emenike, B. U. Was This Title Generated by ChatGPT? Considerations for Artificial Intelligence Text-Generation Software Programs for Chemists and Chemistry Educators. J. Chem. Educ. 2023, 100 (4), 14131418,  DOI: 10.1021/acs.jchemed.3c00063
  23. 23
    Tassoti, S. Assessment of Students’ Use of Generative Artificial Intelligence: Prompting Strategies and Prompt Engineering in Chemistry Education. J. Chem. Educ. 2024, 101 (7), 24752482,  DOI: 10.1021/acs.jchemed.4c00212
  24. 24
    Humphry, T.; Fuller, A. L. Potential ChatGPT Use in Undergraduate Chemistry Laboratories. J. Chem. Educ. 2023, 100 (4), 14341436,  DOI: 10.1021/acs.jchemed.3c00006
  25. 25
    Araújo, J. L.; Saudé, I. Can ChatGPT Enhance Chemistry Laboratory Teaching? Using Prompt Engineering to Enable AI in Generating Laboratory Activities. J. Chem. Educ. 2024, 101 (7), 18581864,  DOI: 10.1021/acs.jchemed.3c00745
  26. 26
    Talanquer, V. Interview with the Chatbot: How Does It Reason?. J. Chem. Educ. 2023, 100 (8), 28212824,  DOI: 10.1021/acs.jchemed.3c00472
  27. 27
    Fergus, S.; Botha, M.; Ostovar, M. Evaluating Academic Answers Generated Using ChatGPT. J. Chem. Educ. 2023, 100 (4), 16721675,  DOI: 10.1021/acs.jchemed.3c00087
  28. 28
    Johnstone, A. H. The Development of Chemistry Teaching: A Changing Response to Changing Demand. J. Chem. Educ. 1993, 70 (9), 701,  DOI: 10.1021/ed070p701
  29. 29
    White, J.; Fu, Q.; Hays, S.; Sandborn, M.; Olea, C.; Gilbert, H.; Schmidt, D. C. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv , February 21, 2023.  DOI: 10.48550/arXiv.2302.11382 .
  30. 30
    Hatakeyama-Sato, K. Prompt Engineering of GPT-4 for Chemical Research: What Can/Cannot Be Done?. Sci. Technol. Adv. Mater.: Methods. 2023, 3 (1), 2260300  DOI: 10.1080/27660400.2023.2260300
  31. 31
    Korzynski, P. Artificial Intelligence Prompt Engineering as a New Digital Competence: Analysis of Generative AI Technologies Such as ChatGPT. Entrepreneurial Bus. Econ. Rev. 2023, 11 (3), 2537,  DOI: 10.15678/EBER.2023.110302
  32. 32
    Ferraz, A. P. do C. M.; Belhot, R. V. Taxonomia de Bloom: Revisa∼o Teórica e Apresentaça∼o das Adequaço∼es do Instrumento para Definiça∼o de Objetivos Instrucionais. Gest. Prod. 2010, 17, 421431,  DOI: 10.1590/S0104-530X2010000200015
  33. 33
    VERBI Software. MAXQDA 2022 [Computer Software]. VERBI Software: Berlin, Germany, 2021. https://www.maxqda.com (accessed August 5, 2024).
  34. 34
    Cavalcante, R. B.; Calixto, P.; Pinheiro, M. M. K. Análise de Conteúdo: Considerações Gerais, Relações com a Pergunta de Pesquisa, Possibilidades e Limitações do Método. Inform. Soc.: Estud. 2014, 24 (1). https://periodicos.ufpb.br/index.php/ies/article/view/10000.
  35. 35
    Yik, B. J.; Dood, A. J. ChatGPT Convincingly Explains Organic Chemistry Reaction Mechanisms Slightly Inaccurately with High Levels of Explanation Sophistication. J. Chem. Educ. 2024, 101 (7), 18361846,  DOI: 10.1021/acs.jchemed.4c00235
  36. 36
    Talanquer, V. When Atoms Want. J. Chem. Educ. 2013, 90 (11), 14191424,  DOI: 10.1021/ed400311x
  37. 37
    Coll, R. K.; Treagust, D. F. Learners’ Mental Models of Chemical Bonding. Res. Sci. Educ. 2001, 31 (3), 357382,  DOI: 10.1023/A:1013159927352
  38. 38
    Boo, H. K. Students’ Understandings of Chemical Bonds and the Energetics of Chemical Reactions. J. Res. Sci. Teach. 1998, 35 (5), 569581,  DOI: 10.1002/(SICI)1098-2736(199805)35:5<569::AID-TEA6>3.0.CO;2-N
  39. 39
    Taber, K. S. Student Understanding of Ionic Bonding: Molecular Versus Electrostatic Thinking. Sch. Sci. Rev. 1997, 78 (285), 8595
  40. 40
    Barker, V.; Millar, R. Students’ Reasoning About Basic Chemical Thermodynamics and Chemical Bonding: What Changes Occur During a Context-Based Post-16 Chemistry Course?. Int. J. Sci. Educ. 2000, 22 (11), 11711200,  DOI: 10.1080/09500690050166742
  41. 41
    Hapkiewicz, A. Clarifying Chemical Bonding. Sci. Teach. 1991, 58 (3), 24
  42. 42
    Treagust, D.; Chittleborough, G.; Mamiala, T. The Role of Submicroscopic and Symbolic Representations in Chemical Explanations. Int. J. Sci. Educ. 2003, 25 (11), 13531368,  DOI: 10.1080/0950069032000070306
  43. 43
    Exintaris, B.; Karunaratne, N.; Yuriev, E. Metacognition and Critical Thinking: Using ChatGPT-Generated Responses as Prompts for Critique in a Problem-Solving Workshop (SMARTCHEMPer). J. Chem. Educ. 2023, 100 (8), 29722980,  DOI: 10.1021/acs.jchemed.3c00481
  44. 44
    Watts, F. M. Comparing Student and Generative Artificial Intelligence Chatbot Responses to Organic Chemistry Writing-to-Learn Assignments. J. Chem. Educ. 2023, 100 (10), 38063817,  DOI: 10.1021/acs.jchemed.3c00664
  45. 45
    Shen, Y. Investigating Students’ Use of Multiple Representations of the Arrow-Pushing Formalism. Dissertation, Clemson University, 2015.
  46. 46
    Gilbert, J. K. Multiple Representations in Chemical Education; Springer: Dordrecht, 2009.
  47. 47
    Alasadi, E. A.; Baiz, C. R. Multimodal Generative Artificial Intelligence Tackles Visual Problems in Chemistry. J. Chem. Educ. 2024, 101 (7), 27162729,  DOI: 10.1021/acs.jchemed.4c00138
  48. 48
    OpenAI. DALL·E 3 System Card. https://openai.com/index/dall-e-3-system-card/ (accessed August 5, 2024).

Cited By

Click to copy section linkSection link copied!

This article has not yet been cited by other publications.

Journal of Chemical Education

Cite this: J. Chem. Educ. 2024, 101, 9, 3767–3779
Click to copy citationCitation copied!
https://doi.org/10.1021/acs.jchemed.4c00230
Published August 19, 2024

Copyright © 2024 The Authors. Published by American Chemical Society and Division of Chemical Education, Inc. This publication is licensed under

CC-BY 4.0 .

Article Views

3040

Altmetric

-

Citations

-
Learn about these metrics

Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.

  • Abstract

    Figure 1

    Figure 1. Flowchart illustrating the segmentation of GAI based on access type (free or paid) and capability (text or image generation).

    Figure 2

    Figure 2. Graphs quantifying the proportional average of categorized segments in relation to the content of the chatbots’ individual responses, pertaining to three types of prompts on chemical bonds: covalent, ionic, and metallic.

    Figure 3

    Figure 3. Series of visual prompts for detection by chatbots, including (1) an acid–base reaction mechanism; (2) resonance of the tosylate ion; (3) coordinated diagram of the Sn2 reaction; (4) Sn2 reaction mechanism involving methyl bromide and hydroxide; (5) comparative analysis of acidity and basicity. The numbers correspond to those represented in Figure 4.

    Figure 4

    Figure 4. Portrait grids, each consisting of 40 × 40 squares, for (i) ChatGPT 4.0; (ii) Google Bard; (iii) Bing Chat. The circled numbers denote the prompts, with green indicating the number of prompts each GAI correctly identified and red denoting those identified incorrectly. The colors of the squares in the portraits correlate with the categories listed at the bottom of the figure into which the text segments of the responses have been classified. The black line rectangle highlights the correct interpretation of the image made available to the machine. The numbers correspond to those image prompts represented in Figure 3, with green for correct identification and red for incorrect identification.

    Figure 5

    Figure 5. Images generated by AI in response to the prompt “Draw a representation of the concept of covalent bonding”, produced respectively by ChatGPT 4.0, Bing Chat, Leonardo.AI, and Adobe Firefly.

    Figure 6

    Figure 6. Graphs quantifying the number of categorized segments in relation to the content of the chatbots’ individual responses for generated images, pertaining to three types of prompts: atomic models, chemical bonds and Lewis structures. (Additional images are available in the Supporting Information, pg. 96).

  • References


    This article references 48 other publications.

    1. 1
      Alpaydın, E. Introduction to Machine Learning, 2nd ed.; MIT Press: Cambridge, MA, 2020.
    2. 2
      Baum, Z. J.; Yu, X.; Ayala, P. Y.; Zhao, Y.; Watkins, S. P.; Zhou, Q. Artificial Intelligence in Chemistry: Current Trends and Future Directions. J. Chem. Inf. Model. 2021, 61 (7), 31973212,  DOI: 10.1021/acs.jcim.1c00619
    3. 3
      Senior, A. W.; Evans, R.; Jumper, J.; Kirkpatrick, J.; Sifre, L.; Green, T.; Hassabis, D. Improved Protein Structure Prediction Using Potentials from Deep Learning. Nature. 2020, 577 (7792), 706710,  DOI: 10.1038/s41586-019-1923-7
    4. 4
      Segler, M. H.; Kogej, T.; Tyrchan, C.; Waller, M. P. Generating Focused Molecule Libraries for Drug Discovery with Recurrent Neural Networks. ACS Cent. Sci. 2018, 4 (1), 120131,  DOI: 10.1021/acscentsci.7b00512
    5. 5
      Segler, M. H.; Preuss, M.; Waller, M. P. Planning Chemical Syntheses with Deep Neural Networks and Symbolic AI. Nature. 2018, 555 (7698), 604610,  DOI: 10.1038/nature25978
    6. 6
      Carleo, G.; Troyer, M. Solving the Quantum Many-Body Problem with Artificial Neural Networks. Science. 2017, 355 (6325), 602606,  DOI: 10.1126/science.aag2302
    7. 7
      Scott, K. C. A Review of Faculty Self-Assessment TPACK Instruments (January 2006–March 2020). Int. J. Inf. Commun. Technol. Educ. 2021, 17 (2), 118137,  DOI: 10.4018/IJICTE.2021040108
    8. 8
      Hedtrich, S.; Graulich, N. Using Software Tools To Provide Students in Large Classes with Individualized Formative Feedback. J. Chem. Educ. 2018, 95 (12), 22632267,  DOI: 10.1021/acs.jchemed.8b00173
    9. 9
      Grando, J. W.; Cleophas, M. G. Aprendizagem Móvel no Ensino de Química: Apontamentos Sobre a Realidade Aumentada. Quím. Nova Esc. 2021, 43 (2), 148154,  DOI: 10.21577/0104-8899.20160236
    10. 10
      Kelly, R. M. The Effect That Comparing Molecular Animations of Varying Accuracy Has on Students’ Submicroscopic Explanations. Chem. Educ. Res. Pract. 2017, 18 (4), 582600,  DOI: 10.1039/C6RP00240D
    11. 11
      Lopes, A. C. C. B.; Chaves, E. V. Animação Como Recurso Didático no Ensino da Química: Capacitando Futuros Professores. Educitec-Rev. Estud. Pesq. Sobre Ensino Tecnol. 2018,  DOI: 10.31417/educitec.v4i07.256
    12. 12
      Tauber, A. L.; Levonis, S. M.; Schweiker, S. S. Gamified Virtual Laboratory Experience for In-Person and Distance Students. J. Chem. Educ. 2022, 99 (3), 11831189,  DOI: 10.1021/acs.jchemed.1c00642
    13. 13
      Leite, B. S. Tecnologias Digitais na Educação: Da Formação à Aplicação; Livraria da Física: São Paulo, 2022.
    14. 14
      Lucas, M.; Moreira, A.; Costa, N. Quadro Europeu de Referência para a Competência Digital: Subsídios para a Sua Compreensão e Desenvolvimento. Observatorio (Obs) 2017,  DOI: 10.15847/obsOBS11420171172
    15. 15
      Redecker, C.; Punie, Y. Digital Competence of Educators; Punie, Y., Ed.; 2017.
    16. 16
      INTEF. Marco Común de Competencia Digital Docente. Ministerio de Educación, Cultura y Deporte, del Gobierno de España, 2017. https://bit.ly/2jqkssz (accessed August 5, 2024).
    17. 17
      Paiva, J. C.; Da Costa, L. A. Exploration Guides as a Strategy to Improve the Effectiveness of Educational Software in Chemistry. J. Chem. Educ. 2010, 87 (6), 589591,  DOI: 10.1021/ed1001637
    18. 18
      Paiva, J.; Morais, C.; Costa, L.; Pinheiro, A. The Shift from “e-Learning” to “Learning”: Invisible Technology and the Dropping of the “e. Br. J. Educ. Technol. 2016, 47 (2), 226238,  DOI: 10.1111/bjet.12242
    19. 19
      Wohlfart, O.; Wagner, I. The TPACK Model - A Promising Approach to Modeling the Digital Competences of (Prospective) Teachers? A Systematic Umbrella Review. Z. Padagogik. 2022, 68 (6), 846868,  DOI: 10.3262/ZP0000007
    20. 20
      Mishra, P.; Koehler, M. J. Technological Pedagogical Content Knowledge: A Framework for Teacher Knowledge. Teach. Coll. Rec. 2006, 108 (6), 10171054,  DOI: 10.1111/j.1467-9620.2006.00684.x
    21. 21
      Clark, T. M. Investigating the Use of an Artificial Intelligence Chatbot with General Chemistry Exam Questions. J. Chem. Educ. 2023, 100 (5), 19051916,  DOI: 10.1021/acs.jchemed.3c00027
    22. 22
      Emenike, M. E.; Emenike, B. U. Was This Title Generated by ChatGPT? Considerations for Artificial Intelligence Text-Generation Software Programs for Chemists and Chemistry Educators. J. Chem. Educ. 2023, 100 (4), 14131418,  DOI: 10.1021/acs.jchemed.3c00063
    23. 23
      Tassoti, S. Assessment of Students’ Use of Generative Artificial Intelligence: Prompting Strategies and Prompt Engineering in Chemistry Education. J. Chem. Educ. 2024, 101 (7), 24752482,  DOI: 10.1021/acs.jchemed.4c00212
    24. 24
      Humphry, T.; Fuller, A. L. Potential ChatGPT Use in Undergraduate Chemistry Laboratories. J. Chem. Educ. 2023, 100 (4), 14341436,  DOI: 10.1021/acs.jchemed.3c00006
    25. 25
      Araújo, J. L.; Saudé, I. Can ChatGPT Enhance Chemistry Laboratory Teaching? Using Prompt Engineering to Enable AI in Generating Laboratory Activities. J. Chem. Educ. 2024, 101 (7), 18581864,  DOI: 10.1021/acs.jchemed.3c00745
    26. 26
      Talanquer, V. Interview with the Chatbot: How Does It Reason?. J. Chem. Educ. 2023, 100 (8), 28212824,  DOI: 10.1021/acs.jchemed.3c00472
    27. 27
      Fergus, S.; Botha, M.; Ostovar, M. Evaluating Academic Answers Generated Using ChatGPT. J. Chem. Educ. 2023, 100 (4), 16721675,  DOI: 10.1021/acs.jchemed.3c00087
    28. 28
      Johnstone, A. H. The Development of Chemistry Teaching: A Changing Response to Changing Demand. J. Chem. Educ. 1993, 70 (9), 701,  DOI: 10.1021/ed070p701
    29. 29
      White, J.; Fu, Q.; Hays, S.; Sandborn, M.; Olea, C.; Gilbert, H.; Schmidt, D. C. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv , February 21, 2023.  DOI: 10.48550/arXiv.2302.11382 .
    30. 30
      Hatakeyama-Sato, K. Prompt Engineering of GPT-4 for Chemical Research: What Can/Cannot Be Done?. Sci. Technol. Adv. Mater.: Methods. 2023, 3 (1), 2260300  DOI: 10.1080/27660400.2023.2260300
    31. 31
      Korzynski, P. Artificial Intelligence Prompt Engineering as a New Digital Competence: Analysis of Generative AI Technologies Such as ChatGPT. Entrepreneurial Bus. Econ. Rev. 2023, 11 (3), 2537,  DOI: 10.15678/EBER.2023.110302
    32. 32
      Ferraz, A. P. do C. M.; Belhot, R. V. Taxonomia de Bloom: Revisa∼o Teórica e Apresentaça∼o das Adequaço∼es do Instrumento para Definiça∼o de Objetivos Instrucionais. Gest. Prod. 2010, 17, 421431,  DOI: 10.1590/S0104-530X2010000200015
    33. 33
      VERBI Software. MAXQDA 2022 [Computer Software]. VERBI Software: Berlin, Germany, 2021. https://www.maxqda.com (accessed August 5, 2024).
    34. 34
      Cavalcante, R. B.; Calixto, P.; Pinheiro, M. M. K. Análise de Conteúdo: Considerações Gerais, Relações com a Pergunta de Pesquisa, Possibilidades e Limitações do Método. Inform. Soc.: Estud. 2014, 24 (1). https://periodicos.ufpb.br/index.php/ies/article/view/10000.
    35. 35
      Yik, B. J.; Dood, A. J. ChatGPT Convincingly Explains Organic Chemistry Reaction Mechanisms Slightly Inaccurately with High Levels of Explanation Sophistication. J. Chem. Educ. 2024, 101 (7), 18361846,  DOI: 10.1021/acs.jchemed.4c00235
    36. 36
      Talanquer, V. When Atoms Want. J. Chem. Educ. 2013, 90 (11), 14191424,  DOI: 10.1021/ed400311x
    37. 37
      Coll, R. K.; Treagust, D. F. Learners’ Mental Models of Chemical Bonding. Res. Sci. Educ. 2001, 31 (3), 357382,  DOI: 10.1023/A:1013159927352
    38. 38
      Boo, H. K. Students’ Understandings of Chemical Bonds and the Energetics of Chemical Reactions. J. Res. Sci. Teach. 1998, 35 (5), 569581,  DOI: 10.1002/(SICI)1098-2736(199805)35:5<569::AID-TEA6>3.0.CO;2-N
    39. 39
      Taber, K. S. Student Understanding of Ionic Bonding: Molecular Versus Electrostatic Thinking. Sch. Sci. Rev. 1997, 78 (285), 8595
    40. 40
      Barker, V.; Millar, R. Students’ Reasoning About Basic Chemical Thermodynamics and Chemical Bonding: What Changes Occur During a Context-Based Post-16 Chemistry Course?. Int. J. Sci. Educ. 2000, 22 (11), 11711200,  DOI: 10.1080/09500690050166742
    41. 41
      Hapkiewicz, A. Clarifying Chemical Bonding. Sci. Teach. 1991, 58 (3), 24
    42. 42
      Treagust, D.; Chittleborough, G.; Mamiala, T. The Role of Submicroscopic and Symbolic Representations in Chemical Explanations. Int. J. Sci. Educ. 2003, 25 (11), 13531368,  DOI: 10.1080/0950069032000070306
    43. 43
      Exintaris, B.; Karunaratne, N.; Yuriev, E. Metacognition and Critical Thinking: Using ChatGPT-Generated Responses as Prompts for Critique in a Problem-Solving Workshop (SMARTCHEMPer). J. Chem. Educ. 2023, 100 (8), 29722980,  DOI: 10.1021/acs.jchemed.3c00481
    44. 44
      Watts, F. M. Comparing Student and Generative Artificial Intelligence Chatbot Responses to Organic Chemistry Writing-to-Learn Assignments. J. Chem. Educ. 2023, 100 (10), 38063817,  DOI: 10.1021/acs.jchemed.3c00664
    45. 45
      Shen, Y. Investigating Students’ Use of Multiple Representations of the Arrow-Pushing Formalism. Dissertation, Clemson University, 2015.
    46. 46
      Gilbert, J. K. Multiple Representations in Chemical Education; Springer: Dordrecht, 2009.
    47. 47
      Alasadi, E. A.; Baiz, C. R. Multimodal Generative Artificial Intelligence Tackles Visual Problems in Chemistry. J. Chem. Educ. 2024, 101 (7), 27162729,  DOI: 10.1021/acs.jchemed.4c00138
    48. 48
      OpenAI. DALL·E 3 System Card. https://openai.com/index/dall-e-3-system-card/ (accessed August 5, 2024).
  • Supporting Information

    Supporting Information


    The Supporting Information is available at https://pubs.acs.org/doi/10.1021/acs.jchemed.4c00230.

    • Methodologies and results of a study on the performance of artificial intelligence in generating and interpreting chemical content; Four main sections focusing on different AI applications: Generative Text AI, Generative Image AI, Generative Image AI using Python codes, and AI capable of analyzing images and generating text (PDF)


    Terms & Conditions

    Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.