Questions for Methods of AI

Exam questions hub — Methods of AI (Uni Osnabrück, WiSe 2024/25 + SoSe 2026).

Contents: 100 basic Q&A · 20 complex Q&A · 6 MC sets (NotebookLM) · 2026 expansion (35 deep / exam-trap questions, covering all topics).

Companion files for exam day

Algorithm Decision Tree — MoAI — “which algorithm for which problem?” flowchart

Methods of AI — Exam Cheat Sheet — A4-printable formulas + traps

Lernzettel — per topic

Quizzes — per topic

Atomic Notes — by topic

Local Search
Search Algorithms · Hill Climbing · Gradient Descent · Local Beam Search · Simulated Annealing · Temperature in neighbour selection · Genetic Algorithms · Paper Review Genetic Algorithms · Stochastic Diffusion Search · cul-de-sac obstacles

Constraint Satisfaction
Constraint Satisfaction Problems

Planning & MDP / RL
Why an an AI struggles with Planning · Frame Problem · Situation Calculus · STRIPS · PDDL and STRIPS · Markov Decision Process (MDP) · Bellman Equation · Q-Function · Reinforcement Learning (RL) · Temporal Difference Reinforcement Learning · You Only Live Once, Single Life Reinforcement Learning

Machine Learning
Machine Learning · Bias-Variance Tradeoff · Decision Trees and ID3 · Random Forest · Perceptron · Support Vector Machines · Kernel Trick · Machine Learning for Cognitive Computational Neuroscience

Neural Networks & Deep Learning
Hopfield Networks · Gradient Backpropagation · Gradient Descent · Deep Neural Networks · Deep Neural Networks in Computational Neuroscience · Implementing Artificial Neural Networks with TensorFlow · The neuroconnectionist research programme

Transformers & Attention
Transformers · Self-Attention · Attention is All You Need · Attention Systems · YT - Attention in Transformers vs. menschliche Aufmerksamkeit

Knowledge Representation
Description Logics · Allen’s Tense Logic · RCC-8

Vagueness & Uncertainty
Fuzzy Logic · Dempster-Shafer Theory

NeuroAI bridges (Max’s writing angle)
Complementary Learning Systems · Neural Aspects of Prediction · High-performing neural network models of visual cortex benefit from high latent dimensionality

Course MOC

Methods of AI Lecture · Methods of AI Seminar · 611 📠Machine Learning · 610 🤖Artificial Intelligence, Künstliche Intelligenz

Questions from NotebookLM answered

Absolutely! Here are 100 questions based on the provided sources to test your knowledge for the exam, along with the answers.

Local Search

What are the main differences between classical search algorithms and local search algorithms?
- Answer: Classical search algorithms look for a path to a goal state, whereas local search algorithms find a goal state without considering the path.
Name three variants of the Hill-Climbing algorithm.
- Answer: Stochastic Hill Climbing, First-Choice Hill Climbing, Random-Restart (or Parallel) Hill Climbing.
What is the main idea behind the Simulated Annealing Algorithm?
- Answer: Accepting worse neighbors with a certain probability to avoid local maxima.
How does neighborhood selection work in Simulated Annealing?
- Answer: There is a naive implementation that checks all neighbors and a uniform implementation that creates random neighbors.
What is the difference between Parallel Hill Climbing and Local Beam Search?
- Answer: Parallel Hill Climbing retains the best neighbor of each current state, while Local Beam Search selects the k best neighbors from all neighbors of the current states.
How is chromosome selection done in genetic algorithms?
- Answer: The selection probability increases with fitness.
What is the difference between fitness-proportional and fitness-antiproportional selection in genetic algorithms?
- Answer: Fitness-proportional selection chooses chromosomes with high fitness, while fitness-antiproportional selects chromosomes with low fitness.
Describe reproduction by crossover in genetic algorithms.
- Answer: Parent chromosomes are split at a random point, and segments are combined to create offspring.
What is mutation in genetic algorithms?
- Answer: A random change of a gene in a chromosome.
What is the importance of cooling schedule in Simulated Annealing?
- Answer: The cooling schedule (schedule(t)) controls how the temperature T decreases. A slower cooling schedule leads to more exploration and increases runtime.
What is the difference between stochastic Hill Climbing and First-Choice Hill Climbing?
- Answer: Stochastic Hill Climbing chooses a random neighbor, while First-Choice Hill Climbing accepts the first better neighbor without evaluating all neighbors.
What is the termination condition in local search?
- Answer: The termination condition can limit the maximum number of search steps or search time.
What is the goal in optimization with local search?
- Answer: To maximize (or minimize, by maximizing -f(x)) the objective function.
What is a local maximum in local search?
- Answer: A state that is better than all its neighbors but not necessarily the best state in the entire search space.
How is the neighborhood defined in the 8-queens problem?
- Answer: Configurations can be reached by moving a single queen to another square in the same column.
What is the difference between the naive and uniform implementation in neighbor selection?
- Answer: The naive implementation lists all neighbors and selects randomly, while the uniform implementation creates a random neighbor.

Constraint Satisfaction Problems (CSP)

What is a Constraint Satisfaction Problem (CSP)?
- Answer: A CSP consists of variables, associated domains, and constraints that restrict the possible values of these variables.
What are the three main components of a CSP?
- Answer: Variables, domains, and constraints.
What is a consistent assignment in a CSP?
- Answer: An assignment that does not violate any constraints.
What is the difference between a partial and complete assignment in a CSP?
- Answer: A partial assignment assigns values to some variables, whereas a complete assignment assigns values to all variables.
What is a solution to a CSP?
- Answer: A complete, consistent assignment.
What does node consistency mean in a CSP?
- Answer: The domain of each variable satisfies its unary constraints (i.e., constraints on the variable itself).
How can a CSP be made node consistent?
- Answer: By removing values from the domains of variables that violate unary constraints.
What does arc consistency mean in a CSP?
- Answer: A variable is arc consistent with another if, for every value in its domain, there is a value in the other variable’s domain that satisfies the binary constraints.
Is arc consistency commutative?
- Answer: No, arc consistency is not commutative.
What is the importance of node and arc consistency when solving CSPs?
- Answer: They are preprocessing steps to optimize the search for solutions and simplify the CSP.
What is the difference between an explicit and implicit representation of constraints?
- Answer: An explicit representation stores all valid tuples, while an implicit representation uses a function to test validity.
Which representation of constraints is generally more efficient?
- Answer: The implicit representation is more efficient when constraints are easily computable.
Describe the naive approach to solving CSPs.
- Answer: Iterating through all possible variable assignments and checking for consistency.
How does the Backtracking Search algorithm for CSPs work?
- Answer: Variables are assigned values incrementally, and backtracking is performed when constraints are violated.
What is the difference between node, arc, and path consistency in CSPs?
- Answer: Node consistency involves individual variables, arc consistency involves pairs, and path consistency involves triplets (not comprehensively defined in the sources).
What are unary, binary, and n-ary constraints?
- Answer: Unary constraints involve one variable, binary constraints involve two, and n-ary constraints involve more than two variables.
Name two strategies for variable selection when solving Sudoku as a CSP.
- Answer: First Fail (Minimum Remaining Values) and Degree (Most Constraining Variables).
What is the purpose of value selection when solving Sudoku as a CSP?
- Answer: To select the value that causes the least constraint violations (Least Constraining Value) to reduce branching.
What can be concluded if a local search algorithm finds a globally optimal solution for a CSP?
- Answer: The CSP is consistent, and a solution has been found.
What can be concluded if a local search algorithm finds a locally optimal solution for a CSP?
- Answer: The solution still violates some constraints.
Is a combination of node and arc consistency sufficient to test a CSP for global consistency?
- Answer: No, node and arc consistency are not sufficient to guarantee global consistency.

Planning

What is the goal of classical planning?
- Answer: To find a sequence of actions that transform a given state into a goal state.
What are the three main problems in classical planning?
- Answer: The Frame Problem, the Qualification Problem, and the Ramification Problem.
What is the Frame Problem in planning?
- Answer: The difficulty in determining what changes and what remains the same after an action.
What is the Qualification Problem in planning?
- Answer: The difficulty in knowing all the preconditions for an action.
What is the Ramification Problem in planning?
- Answer: The difficulty in accounting for unintended side effects of actions.
What are fluents in Situation Calculus?
- Answer: Predicates that can change over time.
What is the difference between atomic and complex situations in Situation Calculus?
- Answer: Atomic situations are constants, while complex situations are constructed from atomic situations using operators.
What are effect axioms in Situation Calculus?
- Answer: Axioms that describe how the world changes after an action is performed.
What are frame axioms in Situation Calculus?
- Answer: Axioms that describe what aspects of the world remain unchanged after an action is performed.
How is a planning problem solved with a theorem prover in Situation Calculus?
- Answer: By defining the initial state as logical axioms, the goal as a theorem, and applying the theorem prover.
What are the three main elements of a STRIPS operator?
- Answer: Preconditions, additions, and deletions.
What is the significance of PRE, ADD, and DEL in STRIPS?
- Answer: PRE are preconditions, ADD are added facts, and DEL are deleted facts.
What does the Closed-World Assumption mean in the context of planning?
- Answer: All facts not known to be true are assumed to be false.
What is PDDL?
- Answer: Planning Domain Definition Language, used to describe planning domains and problems.
What are the main components of a PDDL representation?
- Answer: A domain definition (types, actions) and a problem definition (initial state, goal).
How is the syntax of PDDL different from the syntax of modern logical languages?
- Answer: Instead of f(x, y, z), it uses (f x y z).
What is the goal of probabilistic planning?
- Answer: To handle uncertainty and long-term consequences in planning.
What is a Markov Decision Process (MDP)?
- Answer: A model for decision-making under uncertainty, involving states, actions, transition probabilities, and rewards.
What are transition probabilities in an MDP?
- Answer: Probabilities that an action in a given state will lead to a particular successor state.
What is a policy in an MDP?
- Answer: A mapping of actions to each state.
What is the difference between direct and indirect training experience in ML?
- Answer: Direct training experience provides clear examples and labels, while indirect training experience provides feedback or rewards.
What does the term “Teacher” mean regarding training experience in ML?
- Answer: Training with a “teacher” means the data is labeled (supervised learning).
What is the importance of the discount factor (γ) in probabilistic planning?
- Answer: A value between 0 and 1 that devalues future rewards to ensure convergence.
How is the expected return of a policy calculated?
- Answer: As the sum of probabilities multiplied by their respective rewards.
What does policy iteration describe?
- Answer: A method to optimize a policy through iterative evaluation and improvement.
What are the components of an MDP?
- Answer: States (S), actions (A), transition probabilities (p), and rewards (r).

Knowledge Representation

What is an ontology in knowledge representation?
- Answer: A formal representation of knowledge as hierarchies of concepts.
What is the difference between domain-specific and general-purpose ontologies?
- Answer: Domain-specific ontologies relate to a particular part of the world, while general-purpose ontologies represent general concepts.
What are Description Logics (DLs)?
- Answer: A family of knowledge representation languages used to define and describe categories.
What are roles and concepts in Description Logics?
- Answer: Roles describe relationships between concepts; concepts are categories or classes of objects.
What are the two main parts of a DL knowledge base?
- Answer: The TBox (terminology) and the ABox (assertions).
What are General Concept Inclusions (GCIs) in the TBox?
- Answer: Statements about relationships between concepts (e.g., subsets).
What is a tableau-based algorithm?
- Answer: An algorithm that reduces the subsumption of concept descriptions to unsatisfiability.
What does subsumption mean in DL?
- Answer: The relationship where one concept is more general than another.
What is the difference between TBox and ABox in a DL knowledge base?
- Answer: The TBox contains terminological knowledge (concept definitions, schemas), while the ABox contains individual facts (assertions).
How are concepts formed with concept constructors?
- Answer: Concept constructors like negation, conjunction, disjunction, existential quantification, and universal quantification form complex concepts from basic ones.
What is the difference between concept negation and class complement?
- Answer: They are the same; both refer to the complement of a concept (the set of all things not belonging to the concept).
What are nominals in DL?
- Answer: Enumerated classes of objects defined by value restrictions.

Vagueness and Uncertainty

What are fuzzy logic and probabilistic logic?
- Answer: Fuzzy logic models vagueness, and probabilistic logic models uncertainty.
What is a characteristic function?
- Answer: A function that assigns a value of 1 for elements of a set and 0 for elements outside the set.
What is a membership function in fuzzy set theory?
- Answer: A function that indicates the degree of belonging of an element to a fuzzy set.
What are t-norms and s-norms in fuzzy logic?
- Answer: T-norms are for conjunction (AND) and s-norms for disjunction (OR) in fuzzy logic.
Give two examples each of t-norms and s-norms.
- Answer: T-norms are min and product, and s-norms are max and algebraic sum.
What are the axioms of probability measures (Kolmogorov’s axioms)?
- Answer: Non-negativity, normalization, and (finite) additivity.
What is an event in probability theory?
- Answer: A set of possible worlds that satisfy a particular statement.
What is the difference between partial truth and uncertainty?
- Answer: Partial truth relates to the vagueness of statements (e.g., “somewhat reddish”), while uncertainty relates to the occurrence of events (e.g., “likely”).
What is a neutral element for a t-norm and an s-norm?
- Answer: The neutral element for a t-norm is 1, and for an s-norm, it is 0.
What is the importance of commutativity in fuzzy logic operators?
- Answer: It means the order of operands in conjunction and disjunction does not matter (ab = ba, a+b = b+a).
What is the sum rule in probability theory?
- Answer: P(A or B) = P(A) + P(B) - P(A and B).
What is the formula for conditional probability P(A|B)?
- Answer: P(A|B) = P(A and B) / P(B).

Machine Learning

What is the goal of machine learning?
- Answer: To learn patterns and relationships in data to make predictions or support decisions.
What does overfitting mean in machine learning?
- Answer: A model fits the training data too well, impairing generalization to unseen test data.
What is the significance of Occam’s Razor in relation to overfitting?
- Answer: The simpler hypothesis is preferred because it is more likely to generalize than complex hypotheses.
What is a hypothesis in machine learning?
- Answer: An approximation to a target function.
What is the difference between completeness and correctness of a hypothesis?
- Answer: Completeness means the hypothesis covers all positive examples, while correctness means it does not cover any negative examples.
What are precision and recall?
- Answer: Precision is the proportion of correctly positive-classified examples among all positively classified examples, while recall is the proportion of correctly positive-classified examples among all actual positive examples.
What is accuracy in the context of machine learning?
- Answer: The proportion of correctly classified examples among all examples.
What is the F1 score?
- Answer: The harmonic mean of precision and recall.
What are similarity measures and distance measures in machine learning?
- Answer: Similarity measures quantify similarity between objects, while distance measures quantify dissimilarity.
How is cosine similarity calculated?
- Answer: It calculates the dot product of two vectors divided by the product of their lengths.
What are entropy and information gain in decision trees?
- Answer: Entropy measures the disorder or uncertainty of data, and information gain measures how well an attribute classifies the data.
What is the task of Support Vector Machines (SVM)?
- Answer: To find an optimal separating hyperplane that maximizes the margin between classes.
What is a perceptron?
- Answer: A simple neural network model with weighted inputs, an activation function, and a threshold.

Complex Questions from NotebookLM

Absolutely! Here are 20 deep and complex questions on the material that would be suitable for an exam, along with detailed answers based on the sources:

Questions and Answers

Question: Explain the difference between classical search and local search and provide an example problem for each where these search methods would be particularly suitable.
* Answer: Classical search aims to find an optimal sequence of actions from a start state to a goal state. This search considers the entire search space and is suitable for problems where the path to the solution matters, like route planning. In contrast, local search focuses on finding a “nearly” optimal or feasible state, where the sequence of actions is irrelevant. Local search algorithms are often better suited for optimization and constraint satisfaction problems where finding a good solution is more important than the specific path. An example is the N-Queens problem.
Question: What are the three main problems in planning (Frame Problem, Qualification Problem, Ramification Problem)? Describe each and provide an example.
* Answer:
- Frame Problem: Describes the difficulty of determining which aspects of the world change due to an action and which remain unchanged. An example is moving a block on a table. The frame problem questions how the system knows that the block remains intact after being placed on the table.
- Qualification Problem: Pertains to the possibility that actions can have infinite lists of preconditions. An action like picking up a block can become infeasible due to too many conditions (e.g., block weight, presence of a gripping hand).
- Ramification Problem: Deals with the issue that actions can have unintended side effects. For instance, stacking a block on another may also affect the stack’s stability, which wasn’t explicitly modeled in the action.
Question: Explain the difference between “uninformed” and “informed” search and name two typical algorithms for each.
* Answer:
- Uninformed Search (also known as blind search) uses no information about the path costs from the current state to the goal state. Examples include: Breadth-First Search (BFS), which searches the search tree layer by layer, and Depth-First Search (DFS), which searches the search tree deeply.
- Informed Search (also known as heuristic search) uses a heuristic (an estimate of cost) to evaluate the path to the goal state. Examples include: Greedy Best-First Search, which always chooses the node with the lowest estimated distance to the goal, and the A*-Algorithm, which combines the actual cost of the path so far and the estimated cost to the goal.
Question: Describe the fundamental components of a Constraint Satisfaction Problem (CSP) and provide an example.
* Answer: A CSP consists of three main components:
- Variables (X): A set of variables to be solved (e.g., x1, x2, x3).
- Domains (D): Each variable has an associated domain containing possible values that can be assigned to that variable (e.g., {true, false} for x1, {red, blue, green} for x2 and x3).
- Constraints (R): Constraints specify the permissible combinations of values that can be assigned to the variables (e.g., x1 ≤ x2 or x3 + x4 ≥ 4 * x5 + 2 * x6).
- An example would be frequency assignment in a radio network, where each variable is a radio station, each domain is the available frequencies, and the constraint is that adjacent transmitters must use different frequencies.
Question: Explain the terms Node Consistency and Arc Consistency in the context of CSPs.
* Answer:
- Node Consistency: A single node (variable) is consistent if each value in its domain satisfies all unary constraints. To achieve node consistency, values that do not satisfy the unary constraints are removed from the domain.
- Arc Consistency: An arc (a relationship between two variables) is consistent if for every value in the domain of the first variable, there is a value in the domain of the second variable such that the binary constraint between these variables is satisfied. Arc consistency is not commutative and is used to reduce the domains of variables and remove inconsistent assignments.
Question: Describe the Simulated Annealing algorithm. What is the role of temperature and cooling schedule?
- Answer: Simulated Annealing is a local search algorithm that attempts to avoid the problem of local optima. Initially, it allows worse moves (i.e., objective function deterioration) to explore the search space (exploration). As the search progresses, the probability of accepting worse moves decreases (intensification) to focus on promising areas.
  - Temperature (T): The temperature variable controls the probability of accepting worse moves. At the start of the search, the temperature is high, allowing a high chance of accepting worse moves, promoting exploration. As the temperature decreases, so does the probability of accepting worse moves, promoting intensification of the search.
  - Cooling Schedule: Defines how the temperature decreases over time. A slower cooling schedule allows for longer exploration and increases runtime. Cooling schedules include linear stepwise and delayed linear stepwise cooling.
Question: What are T-Norms and S-Norms in fuzzy logic, and give an example of each.
* Answer:
- T-Norms are generalizations of conjunction (logical “AND”) in fuzzy logic. They specify how the truth values of multiple fuzzy propositions are combined. Examples of T-Norms include the Minimum function (min(x, y)) and the Product function (x * y).
- S-Norms are generalizations of disjunction (logical “OR”) in fuzzy logic. They combine the truth values of multiple fuzzy propositions and are the complement to T-Norms. Examples are the Maximum function (max(x, y)) and the algebraic sum (x + y - (x * y)).
Question: Describe the difference between Supervised, Unsupervised, and Reinforcement Learning. Provide a typical application example for each.
* Answer:
- Supervised Learning: In supervised learning, an algorithm is given a dataset with input-output pairs (labels), and it learns to predict the output for new inputs. An example is spam email classification.
- Unsupervised Learning: Unsupervised learning uses a dataset without labels to find patterns or structures in the data. An example is clustering, where data is divided into groups of similar items.
- Reinforcement Learning: In reinforcement learning, an agent learns through trial and error and receives feedback (rewards) from the environment. The goal is to find an optimal policy to complete a task. An example is training a robot to learn movements.
Question: What is the main difference between a Decision Tree and a Random Forest?
* Answer: A Decision Tree is a single tree that makes decisions based on attribute values, but is susceptible to overfitting, meaning it can adapt too closely to training data and not generalize well to new data. A Random Forest is a collection of decision trees trained on different subsets of the training data and attributes. Each tree makes a classification decision, and the final classification is the majority decision from the individual trees. Random Forests are more robust to overfitting and improve the model’s generalization ability.
Question: Explain the basic functioning of a Perceptron and describe a problem that a simple Perceptron cannot solve.
- Answer: A Perceptron is a basic neural network that accepts binary inputs, multiplies them by weights, sums the weighted inputs, and processes the result through an activation function to generate a binary output. The Perceptron uses a threshold to determine whether a neuron ‘fires’ or not.
  - The most well-known problem that a simple Perceptron cannot solve is the XOR problem. The XOR function is not linearly separable, meaning no single line can separate the true and false outputs in 2D space.
Question: Explain the term “Kernel Trick” in the context of Support Vector Machines (SVMs).
- Answer: The “Kernel Trick” is a method that allows SVMs to be applied to non-linearly separable data. Instead of mapping data points into a higher-dimensional space explicitly, a kernel function calculates the dot product between the mapped points in the higher-dimensional space implicitly. As a result, SVMs can now find complex non-linear decision surfaces, enabled by various kernels (linear, polynomial, Gaussian).
Question: What is an Autoencoder, and what is it typically used for?
- Answer: An Autoencoder is a neural network consisting of an Encoder and a Decoder. The encoder compresses the input into a lower-dimensional representation (the “code”), and the decoder attempts to reconstruct the input from the code. Autoencoders are typically used for feature reduction, data compression, and anomaly detection. The goal is to minimize reconstruction error.
Question: Describe the main components of the Transformer architecture.
- Answer: The Transformer architecture consists of the following main components:
  - Encoder: The encoder comprises several identical layers, each containing a Multi-Head Self-Attention layer and a Fully Connected Feed-Forward layer. It processes the input sequence and generates a contextual representation.
  - Decoder: The decoder has a similar structure to the encoder but also includes a Multi-Head Attention layer over the encoder’s output. It generates the output sequence using the encoder’s output.
  - Multi-Head Attention: This mechanism allows the model to recognize relationships between different parts of the input sequence. Multi-Head Attention computes attention over multiple parallel “heads,” allowing different relationships to be learned.
  - Positional Encoding: Since transformers do not work sequentially, providing positional information about the tokens is necessary as they are not inherently provided by the architecture.
Question: What are “Fluents” in the context of the situation calculus and give an example.
- Answer: Fluents are predicates that can change over time or through actions. They describe the world’s state in a given situation. An example is onTable(C, S1), meaning that block C is on the table in situation S1.
Question: What is PDDL (Planning Domain Definition Language) and what information does a PDDL file contain?
- Answer: PDDL is a declarative language for representing planning problems. A PDDL file contains at least:
  - A Domain Definition, describing types and actions. The domain specifies possible actions, their preconditions, and effects on the world.
  - A Problem Definition, describing the initial state and the goal. The problem specifies the world’s initial state and the conditions that must be fulfilled to solve the problem.
Question: Explain the term “Discount Factor” (γ) in Markov Decision Processes (MDPs).
* Answer: The Discount Factor (γ) is a value between 0 and 1 that controls the importance of future rewards in an MDP.
- A γ close to 0 means only the immediate reward matters.
- A γ close to 1 means future rewards matter as well. The discount factor ensures that the sum of expected rewards over an infinite sequence of states converges.
Question: What is an ontology in the context of knowledge representation, and what role do upper ontologies play?
* Answer: An ontology is a formal representation of knowledge that describes concepts and relationships in a specific domain. It serves to represent knowledge structurally and explicitly.
- Upper ontologies are general frameworks that define fundamental concepts relevant to many different domains. They serve as a foundation to develop specific domain ontologies. Examples of upper ontologies include general concepts such as time, space, physical objects, etc.
Question: What is the difference between TBox and ABox in Description Logics?
* Answer:
- TBox (Terminological Box): The TBox contains general statements about concepts and relationships as General Concept Inclusions (GCIs). It defines the schema of the database and serves to restrict the interpretation space. Examples are: Student ⊑ Person (Student is a person) or Course ⊑ ¬Person (A course is not a person).
- ABox (Assertion Box): The ABox contains statements about specific individuals and their relationships. It corresponds to a database’s concrete data. Examples are Alice: Student (Alice is a student) or (Alice, CS600): teaches (Alice teaches CS600).
Question: Explain the meaning of “Mean Squared Error” as a loss function in training machine learning algorithms.
* Answer: “Mean Squared Error” (MSE) is a loss function that calculates the average squared distance between the predicted and actual values. It measures how “far away” the predictions of a perceptron or other machine-learning model are from the actual target values. Minimizing MSE fits the machine-learning model to the training data and improves the model’s accuracy.
Question: Describe the concept of the “Vanishing Gradient” problem in deep neural networks and provide a possible solution.
* Answer: The “Vanishing Gradient” problem occurs when gradients decrease in deep neural networks during backpropagation as they propagate from the output layer to the input layer. This means that weights of earlier layers learn only slowly or not at all, significantly impairing the network’s ability to learn complex patterns.
- A possible solution is using ReLU activation functions (Rectified Linear Unit) or Residual Connections (ResNet). ReLU has a constant gradient for positive pre-activations, resolving the saturation problem of activation functions like the sigmoid function. ResNet adds residuals (changes) to previous layers.

I hope these questions and answers help with your exam preparation! Let me know if you have more questions or need anything else.

Important words

Data Augmentation: This is a method where variations are added to existing data. This could be a way to generate 100 more items from an existing data set. For example, images can be slightly altered (rotated, zoomed, etc.) to create new variants.
•
Generative Models: These models can produce new data that resembles the data they were trained on. If such a model is trained on a specific type of data, it could be used to generate 100 more items that are similar to that type.
•
K-Means Clustering: This method divides datasets into interesting subsets of similar items, which could be relevant for creating 100 more items by ensuring the generated data maintains the same style as the existing data.
Additionally, various algorithms and methods addressed in the sources could be helpful in creating new content:
•
Local Search: Algorithms like Hill-Climbing, Simulated Annealing, and Local Beam Search are used to search for solutions within a state space. These could be used to generate new items by appropriately defining the search space.
•
Genetic Algorithms: These algorithms use concepts like mutation and crossover to find new solutions. This could also be relevant for creating 100 additional items, for instance, by varying existing items.
•
Decision Trees and Random Forests: These algorithms can be used for data classification. If a model is trained with existing data, it could be used to categorize or select the 100 additional items.
•
Support Vector Machines (SVM): These are used for binary classification problems and utilize hyperplanes to separate data. They could be used, for example, to decide what type of newly generated data should be produced.

multiple choice questions from notebooklm 1

Question 1: Which of the following statements about search problems are correct?

a) A search problem is defined as a 4-tuple consisting of a search space, successor relation, initial state, and goal predicate.
b) The solution to a search problem is a single state that satisfies the goal.
c) A search algorithm takes a search problem as input and returns a path to the goal state or an indication of failure.
d) The cost of transitions is always a necessary component of the definition of a search problem.
e) Local search algorithms are always better suited than classical search algorithms when only a solution is of interest.

Question 2: Which of the following statements correctly describe planning systems?

a) Deductive planning, like situation calculus, represents actions as state transitions and uses logical inferences.
b) State-based planning, like STRIPS, uses operators with preconditions, add, and delete lists to represent actions.
c) The frame problem arises because actions change the world, and the system must explicitly carry all unchanged aspects of the state into the next situation.
d) The qualification problem arises because actions have a finite list of preconditions to be executed, which can lead to an endless list of exceptions.
e) In planning, one is always interested in the internal representation of states.

Question 3: Which of the following statements about Constraint Satisfaction Problems (CSPs) is/are correct?

a) The solution of a CSP is a consistent, but not necessarily complete, variable assignment.
b) A CSP is node-consistent if all values in the domains of the variables satisfy the binary constraints.
c) A CSP is arc-consistent with respect to a variable X and a variable Y if for every value in the domain of X there is a value in the domain of Y that satisfies the constraints.
d) Local search algorithms can find a solution for CSPs even if they are inconsistent.
e) Backtracking search is a naive approach to solving CSPs by simply trying all possible variable assignments.

Question 4: Which of the following statements about fuzzy logic is/are correct?

a) Fuzzy logic deals with uncertainty, while probabilistic logic models vagueness.
b) In fuzzy logic, the truth of an expression is represented by a truth value between 0 and 1.
c) Fuzzy sets have a characteristic function that defines the membership of an element to a set.
d) The conjunction of two fuzzy statements is calculated by applying a t-norm.
e) The disjunction of two fuzzy statements is calculated by applying an s-norm.

Question 5: Which of the following statements correctly describe the functioning of a perceptron?

a) A perceptron is a simple neural network that can learn non-linear functions.
b) A perceptron can be trained by adjusting its weights to improve its predictions.
c) A perceptron can solve the XOR problem without transforming the input space.
d) The decision boundary of a perceptron is a hyperplane in the input space.
e) The bias of a perceptron helps to define the classification threshold.

Question 6: Which of the following statements about Support Vector Machines (SVMs) is/are correct?

a) SVMs can only be used for linear classification problems.
b) SVMs use kernel functions to map the input space into a higher-dimensional space where linear separation is possible.
c) SVMs are not applicable for text categorization or face recognition.
d) The support vectors are the data points that are furthest from the decision boundary.
e) SVMs are supervised learning algorithms that can be used for classification.

Question 7: Which of the following statements about decision trees is/are correct?

a) Decision trees implicitly define logical statements that can be expressed as conjunctions of implications.
b) Entropy is used to measure the impurity of a dataset.
c) Information gain is used to determine which attribute is best suited for splitting the decision tree.
d) Decision trees are always more accurate than random forests.
e) Decision trees are only suitable for numerical data.

Question 8: Which of the following statements about neural networks and deep learning is/are correct?

a) Deep learning networks have less than two layers.
b) The vanishing gradient problem occurs when activation functions lead to a small derivative for large (absolute) pre-activations.
c) ReLU (Rectified Linear Unit) is an activation function that can solve the vanishing gradient problem due to its constant gradient for positive pre-activations.
d) Residual connections allow the main signal path to be forwarded directly and can help overcome the vanishing gradient problem.
e) Transformers use recurrent connections or memory cells.

meine antworten:

a b ⇐ ac
a b c ⇐ b c d
c d
b d e
a b e ⇐ b e
b d e ⇐ b e
a c ⇐ a b c
b d ⇐ b c d

Answers:

Question 1: a), c)
Question 2: b), c), d)
Question 3: c), d)
Question 4: b), d), e)
Question 5: b), d), e)
Question 6: b), e)
Question 7: a), b), c)
Question 8: b), c), d)

multiple choice questions notebooklm 2

Question 1: Which of the following statements about local search algorithms are correct?

a) Local search begins with a random complete assignment and attempts to improve it iteratively.
b) Local search algorithms are suitable when the path to the solution is important.
c) Hill climbing selects the neighbor with the highest value at each step and can get stuck in local maxima.
d) Simulated annealing occasionally accepts deteriorations to escape local optima.
e) Local beam search tracks only a single path through the state space.

Question 2: Which of the following statements correctly describes classical search algorithms?

a) Breadth-first search (BFS) always finds the least costly path.
b) Depth-first search (DFS) is optimal when the search space is very large but has few branches.
c) Informed search algorithms do not use heuristics to guide the search process.
d) Uninformed search algorithms have no information about the cost of the path from the current state to the goal state.
e) A* is an informed search algorithm that uses a heuristic to make the search more efficient.

Question 3: Which of the following statements about the representation of time in AI is/are correct?

a) Allen’s interval algebra uses 13 basic relationships to describe the relationships between time intervals.
b) The composition table in Allen’s Tense Logic describes the possible relationships between time intervals based on other known relationships.
c) RCC-8 only describes the relationship between time intervals.
d) There is no way to represent the relationships ‘before’, ‘after’, or ‘overlapping’ with RCC-8.
e) RCC-5 summarizes some relationships of RCC-8.

Question 4: Which of the following statements about probabilistic logic is/are correct?

a) Probabilistic logic deals with vagueness using probabilities between 0 and 1.
b) The three Kolmogorov axioms are: non-negativity, normalization, and additivity.
c) The conditional probability P(A|B) is defined as P(A and B) * P(B).
d) The sum rule states that the sum of the probabilities of all possible outcomes is always 1.
e) The probability of an event is always less than 1.

Question 5: Which of the following statements about inductive bias is/are correct?

a) A bias-free learning system makes no assumptions.
b) Inductive learning is only sensible with prior assumptions (bias).
c) Overfitting can lead to poor generalization of a learning model on unknown data.
d) The goal of learning is to find a hypothesis that covers all positive examples.
e) Precision measures the “degree of completeness,” while recall measures the “degree of correctness.”

Question 6: Which of the following statements about k-means clustering is/are correct?

a) k-means clustering is a hierarchical clustering algorithm.
b) k-means clustering guarantees finding a global minimum of the squared error measure.
c) The number of clusters (k) is a parameter of the k-means algorithm.
d) The k-means algorithm is deterministic and does not depend on initialization.
e) The k-means algorithm is an example of an unsupervised learning method.

Question 7: Which of the following statements describes the functioning of random forests correctly?

a) Random forests consist of a single decision tree trained on the training data.
b) Random forests use the concept of tree bagging to use different subsets of training data for training different trees.
c) Random forests are less prone to overfitting than deep decision trees.
d) Random forests always use information gain as a metric.
e) Random forests use a voting mechanism to determine the final classification from the predictions of the individual trees.

Question 8: Which of the following statements about transformers is/are correct?

a) Transformers use recurrent connections and memory cells with input, output, and forget gates.
b) Transformers model context using attention mechanisms.
c) The query, key, and value vectors are used to compute attention-weighted outputs.
d) Attention in early layers focuses on global dependencies.
e) Transformers use multi-head attention to consider various aspects of context simultaneously.

meine Antworten:

a c d
a b e ⇐ d e
b c e ⇐ a b e
b c d ⇐ b d
b c d e ⇐ a b c d
a c e ⇐ c e
b c e
b c e

Answers:

Question 1: a), c), d)
Question 2: d), e)
Question 3: a), b), e)
Question 4: b), d)
Question 5: a), b), c), d)
Question 6: c), e)
Question 7: b), c), e)
Question 8: b), c), e)

multiple choice questions notebooklm 3

Question 1: Which of the following statements about Constraint Satisfaction Problems (CSPs) are correct?

a) CSPs consist of a set of variables, which can take values from corresponding domains, and constraints, which define restrictions for these values.
b) Node Consistency ensures that the value of each variable is individually consistent with its constraints.
c) Arc Consistency ensures that for each variable and value in its domain, there exists a value for all other involved variables that is consistent with the constraints.
d) Local search is not suitable for solving CSPs.
e) Backtracking Search can be used to solve CSPs by assigning values to variables one at a time and backtracking when constraints are violated.

Question 2: Which of the following statements correctly describes planning problems?

a) The Frame Problem refers to the idea that everything in the world is always changing.
b) The Qualification Problem describes the difficulty of specifying all preconditions for an action.
c) The Ramification Problem refers to the unintended side effects of actions.
d) Situation Calculus is a form of deductive planning.
e) STRIPS is a state-based planning method that describes actions through pre-, add-, and del-lists.

Question 3: Which of the following statements about ontologies are correct?

a) Ontologies serve the representation of knowledge and consist of hierarchies of concepts.
b) Domain-specific ontologies refer to general concepts like time and beliefs.
c) General (Upper) Ontologies refer to specific areas of the world, such as biology or politics.
d) Ontologies can be created by importing categories and attributes from existing databases.
e) Whether a universal ontology is possible is a topic of current research.

Question 4: Which of the following statements about Fuzzy Logic is/are correct?

a) Fuzzy Logic deals with uncertainty by using membership functions.
b) Fuzzy sets use membership functions to describe the degree of belonging of an element to a set.
c) T-norms are operators for conjunction (AND) in fuzzy logic.
d) S-norms are operators for disjunction (OR) in fuzzy logic.
e) Fuzzy Logic operators are not commutative.

Question 5: Which of the following statements about Machine Learning (ML) are correct?

a) Supervised learning requires training data with correct outputs (labels).
b) Unsupervised learning uses training data with labels to recognize patterns.
c) Regularization is a method used to artificially reduce the capacity of a model.
d) Overfitting occurs when a model fits the training data too well and generalizes poorly to unseen data.
e) The number of training examples has no influence on the performance of a model.

Question 6: Which of the following statements about Decision Trees are correct?

a) Decision trees can only be used for classification problems.
b) Decision trees can implicitly define logical statements.
c) Information gain is used to select the best attributes to split.
d) Entropy is a measure of the purity of a set of examples.
e) Decision trees are less prone to overfitting than random forests.

Question 7: Which of the following statements about Support Vector Machines (SVMs) is/are correct?

a) SVMs are unsupervised learning models for classification tasks.
b) The goal of a linear SVM is to find an optimal hyperplane that separates the classes with a maximum distance (margin).
c) Kernel-SVMs can solve nonlinear problems by mapping the data into a higher-dimensional space.
d) SVMs can only be used for binary classification problems.
e) For multi-classification problems, either the one-against-all or the one-against-one method is used.

Question 8: Which of the following statements about Neural Networks is/are correct?

a) A perceptron is a simple form of an artificial neural network that can process binary inputs and produce a binary output.
b) A perceptron can solve complex nonlinear problems like XOR without additional measures.
c) The training of perceptrons is conducted by minimizing the mean squared error function.
d) The Lottery Ticket Hypothesis suggests that successful training requires a suitable combination of initial weights.
e) Overparameterization can help find better substructures during training.

Question 9: Which of the following statements about Probabilistic Planning are correct?

a) Probabilistic planning deals with certainty in the execution of actions.
b) Markov Decision Processes (MDPs) are a model for probabilistic planning.
c) In MDPs, an action is assigned to each state.
d) Transition probabilities indicate the probability of transitioning from one state to another.
e) In MDPs, there are no rewards or penalties for reaching a specific state.

Question 10: Which of the following statements about Local Search is/are correct?

a) Local search begins with an empty assignment of variables.
b) Hill-Climbing is a local search method that always moves to better states.
c) Simulated Annealing uses a temperature to decide whether to accept worse states.
d) Local Beam Search follows only a single path through the search space, unlike hill climbing.
e) Genetic algorithms use mutation and crossover to generate new populations.

Question 11: Which of the following statements about the Perceptron are correct?

a) The perceptron is a non-linear classification model.
b) The perceptron uses an activation function and a threshold.
c) The bias in a perceptron helps define the decision boundary.
d) The perceptron can learn linear but not nonlinear separations.
e) The perceptron can directly learn the XOR operator.

Question 12: Which of the following statements about t-norms and s-norms in Fuzzy Logic are correct?

a) t-norms are operations for disjunction (OR).
b) s-norms are operations for conjunction (AND).
c) A common t-norm is the minimum of the input values.
d) A common s-norm is the maximum of the input values.
e) t-norms and s-norms have no neutral elements.

Question 13: Which of the following statements about the representation of constraints are correct?

a) Constraints can be explicitly represented by a table of all possible input-output combinations.
b) Constraints can be implicitly represented by a function that returns a truth value for each input.
c) An explicit representation requires less storage space than an implicit representation.
d) An implicit representation is faster than an explicit representation because no computation is necessary in real-time.
e) The representation of constraints affects the efficiency of the algorithms.

Question 14: Which of the following statements about Hierarchical Clustering are correct?

a) Hierarchical clustering always reaches a global optimum.
b) Hierarchical clustering can be done top-down (divisive) or bottom-up (agglomerative).
c) Single-linkage clustering uses the minimum distance between two clusters.
d) Complete-linkage clustering uses the maximum distance between two clusters.
e) Hierarchical clustering does not require centroids, unlike k-means.

Question 15: Which of the following statements about the representation of knowledge in AI is/are correct?

a) Knowledge representation deals with how information can be expressed in a formal and structured manner.
b) The formal representation of knowledge does not allow for automatic inference.
c) There is a naive approach where the content of a system’s knowledge base is the only form of knowledge representation.
d) The formal representation of knowledge is important for enabling automated reasoning.
e) The field of knowledge representation has no influence on the development of AI.

Question 16: Which of the following statements about Description Logics (DLs) are correct?

a) DLs are a family of knowledge representation languages developed for describing categories.
b) DLs are a modern variant of FOL (First Order Logic).
c) A DL knowledge base consists of a TBox (terminology) and an ABox (assertions).
d) Tableau-based algorithms reduce subsumption to the check of unsatisfiability.
e) DLs have no Boolean operators.

Question 17: Which of the following statements about Transformers are correct?

a) Transformers do not use attention mechanisms to model contexts.
b) In transformers, position is not encoded.
c) Attention in later layers focuses on global dependencies.
d) Training with explanations helps smaller models improve.
e) Larger models are always better in terms of performance, even if not trained with explanations.

Question 18: Which of the following statements about Information Gain in decision trees is/are correct?

a) Information gain is used to pick the attribute that provides the least information.
b) Information gain is based on entropy.
c) Entropy measures the purity of a data set.
d) Higher information gain indicates a better split of the data.
e) Information gain is used for regression trees.

Question 19: Which of the following statements about Random Forests are correct?

a) Random forests consist of a single decision tree.
b) Random forests are less prone to overfitting than individual decision trees.
c) Tree Bagging is used to adopt different subsets of the training data to train various trees.
d) Random forests are always better than individual decision trees.
e) Random forests use a voting mechanism to determine the final classification.

Question 20: Which of the following statements about the training of neural networks are correct?

a) It is always better to train with less training data to avoid overfitting.
b) It is common to monitor the loss during validation and stop training when validation loss increases.
c) Overparameterization can lead to worse results.
d) Regularization can be used to prevent weights from becoming too large.
e) Removing parameters before training has no impact on training capability.

meine antworten:
121. a b c e
122. a b c d e ⇐ b c d e
123. a d e
124. a c d ⇐ b c d
125. a d ⇐ a c d
126. b c d
127. a b c ⇐ b c e
128. a b c ⇐ a c d e
129. a b c d ⇐ b c d
130. b c e
131. a b c d ⇐ b c d
132. d ⇐ c d
133. b ⇐ a b e
134. b c d e
135. a c d
136. a c e ⇐ a c d
137. d
138. b c e
139. b c e
140. b c d

Answers:

Question 1: a), b), c), e)
Question 2: b), c), d), e)
Question 3: a), d), e)
Question 4: b), c), d)
Question 5: a), c), d)
Question 6: b), c), d)
Question 7: b), c), e)
Question 8: a), c), d), e)
Question 9: b), c), d)
Question 10: b), c), e)
Question 11: b), c), d)
Question 12: c), d)
Question 13: a), b), e)
Question 14: b), c), d)
Question 15: a), c), d)
Question 16: a), c), d)
Question 17: c), d)
Question 18: b), c), d)
Question 19: b), c), e)
Question 20: b), d)

multiple choice questions from notebooklm 4

Which of the following statements about search problems are applicable?

a) A search problem is defined as a 4-tuple that contains the search space, transition relation, initial state, and a termination criterion.
b) Search in complex environments always relies on classical search algorithms.
c) Local search algorithms are better suited when only a solution, and not the path to the solution, is of interest.
d) In problem-solving with search, goal setting is the first step.
e) A search algorithm always returns a solution.

Which of the following statements describe properties of hypotheses in machine learning?

a) A hypothesis is correct if it covers all negative examples.
b) A hypothesis is complete if it covers all positive examples.
c) Precision measures the degree of correctness and is calculated by TP/(TP+FP).
d) Sensitivity (Recall) measures the degree of completeness and is calculated by TP/(TP+FN).
e) A hypothesis that makes no errors is always the best fit.

Which of the following statements about constraint satisfaction problems (CSPs) are correct?

a) A solution to a CSP is a consistent and complete assignment.
b) Unary constraints relate to relationships between two variables.
c) Consistency checking is an optional post-processing step.
d) Node and arc consistency are sufficient to test the global consistency of a CSP.
e) A CSP is inconsistent if there is no solution.

Which of the following statements about planning problems are correct?

a) Deductive planning seeks a sequence of actions that transform a given state into a goal state.
b) The frame problem refers to the difficulty of tracking side effects of actions.
c) The qualification problem refers to the difficulty of defining all preconditions for an action.
d) State-based planning uses situation calculus.
e) In classical planning, one is always interested in the best path.

Which of the following statements about Markov Decision Processes (MDPs) are correct?

a) MDPs consider uncertainty but not long-term consequences.
b) A policy assigns an action to each state.
c) Transition probabilities describe the likelihood of reaching a particular subsequent state when an action is taken in a given state.
d) Planning reduces to computing an optimal strategy.
e) An MDP cannot be characterized by its transition function and rewards.

Which of the following statements about knowledge representation are correct?

a) Ontologies represent concepts as hierarchies.
b) General ontologies refer to specific parts of the world.
c) Description logics serve to describe definitions and properties of categories.
d) RCC-8 is a generalization of Allen’s temporal logic’s 13 interval relations.
e) Direction can be represented through RCC calculus.

Which of the following statements about fuzzy and probabilistic logic are applicable?

a) Fuzzy logic deals with the indeterminacy of statements.
b) Probabilistic logic deals with the vagueness of statements.
c) The truth value of a conjunction in fuzzy logic is the minimum of the truth values of the conjuncts.
d) The truth value of a disjunction in fuzzy logic is the maximum of the truth values of the disjuncts.
e) Fuzzy logic uses the Kolmogorov axioms of probability.

Which of the following statements about perceptrons are correct?

a) A perceptron is a multi-layer network.
b) Perceptrons are capable of classifying non-linearly separable data.
c) The bias in a perceptron helps define the classification threshold.
d) The training phase of the perceptron optimizes the weights.
e) Perceptrons are not based on neural structures.

Which of the following statements about decision trees and random forests are correct?

a) Decision trees imply logical statements as conjunctions of implications.
b) Random forests train many decision trees on different attributes with different data sets.
c) Entropy is used to measure the purity of the subsets in the data.
d) The concept of tree bagging is not used in random forests.
e) Random forests are always better than decision trees.

Which of the following statements about Support Vector Machines (SVMs) are correct?

a) SVMs can only solve linear classification problems.
b) SVMs are based on a linear separator.
c) The goal of an SVM is to find an optimal hyperplane that maximizes the margin between the nearest positive and negative examples.
d) Kernels are used to map data into a higher-dimensional space.
e) SVMs cannot be used for multi-classification.

meine Antworten:
141. a c d e ⇐ a c d search problems
142. b c d
143. a e
144. a c d
145. a b c d
146. a c d
147. a c d e
148. b c d
149. a b c
150. b c d

Answers:

Search problems: a), c), d)
Properties of hypotheses: b), c), d)
Constraint satisfaction problems: a), e)
Planning problems: a), b), c)
Markov Decision Processes: b), c), d)
Knowledge representation: a), c), d)
Fuzzy and probabilistic logic: c), d)
Perceptrons: c), d)
Decision trees and random forests: a), c)
Support Vector Machines (SVMs): b), c), d)

These questions should help you assess your understanding of AI methods.

multiple choice questions from notebooklm 5

Sure, here are 10 advanced multiple-choice questions (with multiple responses) based on the topics of Fuzzy Logic and Knowledge Representation as discussed in the sources. The answers are revealed at the end.

Question 1: Which of the following statements about Fuzzy Logic are correct?

a) Fuzzy logic completely replaces classical logic by treating all statements as fuzzy.
b) In fuzzy logic, statements can take on a truth value between 0 and 1.
c) T-norms are used in fuzzy logic to model conjunction (AND).
d) Fuzzy logic deals with uncertainty, while probabilistic logic models vagueness.
e) S-norms are used to model disjunction (OR) in fuzzy logic.

Question 2: Which of the following statements correctly describes the relationship between fuzzy and probabilistic logic?

a) Both logics use probabilities to evaluate the truth of statements.
b) Probabilistic logic deals with vagueness using probabilities, while fuzzy logic handles uncertainty with membership functions.
c) Fuzzy logic is better suited to handling fuzzy predicates, whereas probabilistic logic deals with uncertain events.
d) Tautologies always have the value 1 in probabilistic logic, while this is not always the case in fuzzy logic.
e) Both logics extend classical logic and allow for partial truths.

Question 3: Which statements about ontologies are true?

a) Ontologies are always hierarchical, with general concepts at the top and specific concepts below.
b) Upper ontologies focus on very specific concepts and details.
c) Ontologies are used for the formal representation of knowledge, often in the form of terminologies.
d) Ontologies can be used in the linguistic domain.
e) Ontologies are only used in the field of Artificial Intelligence.

Question 4: Which of the following statements correctly describes Description Logics (DLs)?

a) DLs are a family of languages for knowledge representation used to describe categories and their properties.
b) DLs are a modern variant of KL-ONE and are based on classical predicate logic.
c) A DL knowledge base consists of a TBox (terminology) and an ABox (assertions).
d) Tableau-based algorithms use satisfiability checking to verify concept subsumption.
e) DLs are designed to make statements about objects, not about categories.

Question 5: Which of the following statements about tableau algorithms are true?

a) Tableau algorithms are used to check the satisfiability of concept descriptions in Description Logic.
b) A tableau algorithm tries to construct a model for an ABox to verify subsumption.
c) To check the subsumption of C by D, it must be checked whether C ⊓ ¬D is unsatisfiable.
d) Tableau-based algorithms can cover all inferences in DL.
e) Tableau algorithms reduce subsumption to checking whether a finite model exists.

Question 6: Which statements about the representation of time are correct?

a) Allen’s interval algebra defines 13 basic relations between time intervals, including ‘before’, ‘meets’, and ‘overlaps’.
b) RCC-8 is a good approach to describe the relations ‘before’, ‘after’, and ‘overlapping’.
c) The composition table in Allen’s Tense Logic predicts relations between time intervals based on other known relations.
d) RCC-5 summarizes some of the relations of RCC-8, including summarizing “disjoint” and “externally connected” to “discrete.”
e) Time can only be represented by exact measurements in AI.

Question 7: Which of the following statements about search algorithms are correct?

a) Local search algorithms are well-suited for finding the optimal path to a solution.
b) Classical search algorithms use an explicit representation of states, while local search algorithms operate only with a complete variable assignment.
c) Breadth-first search (BFS) is an uninformed search algorithm that always finds the shortest path, provided all steps have the same cost.
d) A* is an informed search algorithm that uses a heuristic to guide the search.
e) Depth-first search (DFS) is a good choice when the search space is very large and highly branching.

Question 8: What are the key properties of inductive bias in machine learning?

a) An inductive bias is a set of assumptions made by the learning algorithm to generalize from examples.
b) A learning system without bias can collect the given examples without generalizing.
c) Too high a bias can lead to overfitting and poor performance on unseen data.
d) The inductive bias has no effect on the learning ability of a learning system.
e) The inductive bias helps the learning system make deductive inferences.

Question 9: Which statements about decision trees and random forests are true?

a) Decision trees implicitly learn logical statements that are conjunctions of implications.
b) Decision trees are robust against irrelevant features.
c) Random forests form an ensemble of many decision trees that are randomly trained and whose results are then aggregated.
d) Tree bagging is a method to train decision trees on the entire dataset.
e) In random forests, features are randomly selected at each split to increase the diversity of individual trees.

Question 10: Which of the following statements about Support Vector Machines (SVMs) are correct?

a) SVMs attempt to find a hyperplane that separates the data in a higher-dimensional space.
b) Kernel functions are used to transform data into a higher-dimensional space without explicitly computing the transformation.
c) SVMs have an efficient learning procedure but can only learn linear functions.
d) The bias in SVMs shifts the classification terms by the same amount for all data points.
e) SVMs can only solve binary classification problems and must be decomposed into several binary classifiers for multi-class problems.

meine Antworten:

a c e ⇐ b c e
c d e
a c d
c d e ⇐ a b c d
a b e ⇐ a b c e
a c d
b d ⇐ b c d
a b c e ⇐ a b
a c d e ⇐ a b c e
a b ⇐ a b d e

Answers:

Question 1: b), c), e)
Question 2: c), d) e)
Question 3: a), c), d)
Question 4: a), b), c), d)
Question 5: a), b), c), e)
Question 6: a), c), d)
Question 7: b), c), d)
Question 8: a), b), c)
Question 9: a), b), c), e)
Question 10: a), b), d), e)

mulitple choice questions notebooklm 6

Here are 12 additional multiple-choice questions in the style of the previous questions, covering various topics from the sources:

Question 1: Which of the following statements correctly describes local search algorithms?

a) Local search algorithms are concerned with the sequence of actions leading to a solution.
b) Local search algorithms are particularly suitable when interested in a path to the solution.
c) Hill-Climbing is a variant of Local Beam Search where multiple states are searched in parallel.
d) Local search can be suitable for optimization and constraint satisfaction problems.
e) In Simulated Annealing, the probability of accepting worse solutions increases over time.

Question 2: Which of the following statements about Constraint Satisfaction Problems (CSPs) is correct?

a) CSPs consist of variables, domains, and constraints, with the goal of violating all constraints.
b) Arc Consistency ensures that for every value of a variable, there is a compatible assignment for every other variable in the constraint.
c) An explicit representation of constraints is only suitable for constraints with infinite input possibilities.
d) Backtracking Search assigns values to variables one by one and changes the values of the last variable assignments when constraints are satisfied.
e) Frequency allocation for radio stations is not an example of CSP.

Question 3: Which of the following statements about planning problems is correct?

a) In planning, the focus is on the internal representation of states.
b) The Frame Problem describes that actions change the world, but many things remain unchanged.
c) In Deductive Planning, operators are described by three lists: PRE, ADD, DEL.
d) PDDL is a declarative language that does not represent knowledge.
e) Situation Calculus is an example of state-based planning.

Question 4: Which of the following statements about Markov Decision Processes (MDPs) is correct?

a) MDPs do not account for uncertainty.
b) A policy assigns an action to every possible state.
c) In policy iteration, policies are first improved and then evaluated.
d) The transition probability describes the reward.
e) Policies cannot be evaluated based on their expected rewards.

Question 5: Which of the following statements about ontologies is correct?

a) There is a generally accepted definition of ontology.
b) Upper ontologies are specific to a particular application area.
c) Directions can be represented by RCC calculi.
d) Ontologies cannot be created from text documents by Open Information Extraction.
e) Upper ontologies focus on general concepts that occur in many areas.

Question 6: Which of the following statements about Fuzzy Logic is correct?

a) Fuzzy Logic cannot represent partial truth.
b) In Fuzzy Logic, statements are either true or false.
c) Fuzzy Logics address uncertainty.
d) T-norms are used to represent disjunctions.
e) Fuzzy Logic uses membership functions to model vague statements.

Question 7: Which of the following statements about Probabilistic Logic is correct?

a) Probabilistic Logic cannot represent uncertainty.
b) Probabilistic Logic addresses partial truth (vagueness).
c) In Probabilistic Logic, statements are only true to a certain degree.
d) Probabilistic Logic has fewer logical properties than Fuzzy Logic.
e) Probability measures assign probabilities to events.

Question 8: Which of the following statements about Neural Networks is correct?

a) A network is only considered deep if it has more than three layers.
b) Two layers are sufficient for a universal approximator.
c) The number of parameters in pre-trained language models has a significant impact on performance.
d) The attention weights in later layers focus on local dependencies.
e) Smaller language models cannot benefit from output explanations.

Question 9: Which of the following statements about Machine Learning is correct?

a) Explicit models require little effort.
b) Direct training experience involves no feedback from the environment.
c) Unsupervised learning involves an experienced teacher.
d) In Reinforcement Learning, the goal is to find an optimal policy to achieve a goal by maximizing the expected future reward.
e) Inductive Logic Programming uses subsymbolic computations.

Question 10: Which of the following statements about Decision Trees is correct?

a) Decision trees are not robust against irrelevant features.
b) Deep decision trees do not tend to overfit.
c) Decision trees are “black box” models.
d) Regression trees use discrete values for the target variable.
e) Decision trees implicitly define logical statements.

Question 11: Which of the following statements about Random Forests is correct?

a) Random Forests train only one decision tree.
b) Random Forests do not use tree bagging.
c) Random Forests are less performant than decision trees because they are based on multiple trees.
d) Random Forests do not randomly select features at each split.
e) Random Forests use a majority vote of the individual decision trees for classification.

Question 12: Which of the following statements about Support Vector Machines (SVMs) is correct?

a) SVMs cannot solve nonlinear problems.
b) SVMs are not suitable for binary classification problems.
c) Kernel functions are not used to transform data into a higher-dimensional space.
d) SVMs look for a hyperplane that minimizes the distance between the nearest positive and negative examples.
e) For linearly separable problems, nonlinear kernels do not improve performance.

meine antworten:

d
b c ⇐ none
c e ⇐ a b
b
b ⇐ e
e
c e ⇐ e
a b c ⇐ b
d
d e ⇐ e
e
e

2026 Expansion — Deep / Exam-Trap Questions

Added 18/05/2026. Focus: harder math, KKT/Bellman/AC-3 details, common confusions, NeuroAI bridges. ⚠️ = known exam trap.

Local Search

Why is the Simulated Annealing acceptance probability exp(Δ/T) and not, say, T/|Δ|?

The exponential (Boltzmann) form gives two essential properties: (1) Δ → 0 ⇒ P → 1 (tiny downhill moves almost always accepted, at any T); (2) T → 0 ⇒ P → 0 (no downhill moves accepted at low T). Borrowed directly from statistical mechanics. The ratio of acceptance probabilities for two downhill moves depends only on the energy gap, not absolute energy — exactly what we want.
See: Simulated Annealing, Temperature in neighbour selection

In Local Beam Search with k=1, what do you get? With k = ∞?

k=1 → simple Hill Climbing (single current state, only its best neighbor kept). k = ∞ → Breadth-First Search over the neighborhood graph. k interpolates between exploitation (small k) and exploration (large k).

⚠️ Why is “Parallel Hill Climbing” NOT the same as Local Beam Search, even when both use k states?

Parallel HC = k independent searches, no information sharing. Local Beam Search pools all k·b successors and keeps the global best k → states converge on promising regions (“rich get richer”). Trade-off: Beam can collapse to one region — fixed by Stochastic Beam Search (probabilistic selection).

A Genetic Algorithm has stalled — all chromosomes have near-identical fitness. Which mechanism do you adjust, and why?

Mutation rate ↑. Selection + crossover alone can’t introduce novelty once population diversity collapses (crossover of identical parents = identical children). Alternatives: re-inject random chromosomes (immigration), or briefly switch to fitness-antiproportional selection to preserve weak-but-different solutions.
See: Genetic Algorithms, Paper Review Genetic Algorithms

Why does Random-Restart Hill Climbing have P(success) → 1 as n → ∞ for finite state spaces, but Simulated Annealing requires an infinitely slow cooling schedule?

RR-HC ≈ independent sampling — given enough restarts, you eventually start in the basin of the global optimum. SA’s guarantee is different: a sufficiently slow schedule keeps the Markov chain in the Boltzmann distribution, which concentrates on the global optimum as T → 0. Practical SA cools too fast → can still get stuck.

Constraint Satisfaction

What is the time complexity of AC-3, and why?

O(c·d³). c = number of binary constraints (arcs), d = max domain size. Each arc may be revisited up to d² times; each value-pair consistency check is O(d). A deletion from one domain re-enqueues up to d connected arcs.
See: Constraint Satisfaction Problems

⚠️ MRV says “pick variable with fewest legal values”; Degree says “pick variable with most constraints.” How can both be primary heuristics?

They aren’t — Degree is the tie-breaker when multiple variables share the same minimum domain size under MRV. Same underlying logic: fail fast by attacking the most constrained variable first.

Do MRV and LCV ever recommend conflicting moves?

No — they operate on different choice points. MRV chooses which variable to assign next; LCV chooses which value to try first for that variable. Intuition: fail fast on variable selection (MRV), keep options open on value selection (LCV).

What are RCC-8 relations, and why are they CSP-relevant?

Region Connection Calculus: 8 jointly exhaustive and pairwise disjoint topological relations between regions (DC, EC, PO, EQ, TPP, NTPP, TPPi, NTPPi). Form a CSP over spatial regions with these qualitative relations as constraints. Useful because most real spatial reasoning is qualitative, not metric.

Planning

⚠️ STRIPS uses ADD/DEL lists. Does this solve the Frame Problem?

Only syntactically. The closed-world default (everything not in ADD/DEL stays as it was) eliminates the need for explicit frame axioms. But the ontological problem (knowing what doesn’t change in the real world) is still there — it’s shifted into the action schema’s design burden.
See: Why an an AI struggles with Planning

In Situation Calculus, how many frame axioms do n actions and m fluents require?

O(n·m) — one frame axiom per (action, fluent) pair that doesn’t change. This is the Frame Problem made explicit: even trivial actions force you to assert non-effects for every other fact.

State-space planning vs. plan-space planning — what’s the difference?

State-space: search in the space of world states (apply actions, get new states). Plan-space: search in the space of partial plans (add actions, orderings, causal links to a plan skeleton). Plan-space allows partial ordering of actions → least-commitment planning.

MDPs / Reinforcement Learning

What’s the difference between V^π(s) and V(s)?*

V^π(s) = value of state s under a specific policy π (expected discounted return when following π). V*(s) = optimal value = max_π V^π(s). Two corresponding Bellman equations: policy evaluation (for V^π) and Bellman optimality (for V*).
See: Markov Decision Process (MDP), Q-Function

⚠️ Value Iteration vs. Policy Iteration — when do you prefer which?

VI: iterate Bellman optimality update until V converges, extract greedy policy. Simple, but slow convergence when γ → 1. PI: alternate policy evaluation (compute V^π exactly via linear system) + policy improvement (greedy w.r.t. V^π). Converges in fewer iterations but each iteration is more expensive. Use PI for small state spaces; VI when iterations must be cheap.

Why is γ NOT part of the MDP tuple (S, A, p, r)?

γ belongs to the objective (discounted return), not the MDP itself. The same MDP with different γ gives different optimal policies. Some textbooks include γ; Osnabrück’s MoAI does not.

Knowledge Representation

TBox vs. ABox — give one reasoning task unique to each.

TBox: Subsumption (“does the TBox entail C ⊑ D?”) and Classification (build the concept hierarchy). ABox: Instance Check (“is individual a in concept C?”) and Realization (find most specific concepts for each individual). Consistency involves both.

⚠️ Open World vs. Closed World Assumption — what does DL use, and why does it bite on exam questions?

DL uses OWA: absence of info ≠ negation. STRIPS / databases use CWA. Consequence: in OWL, “no triple says Alice teaches CS500” does not entail “Alice does not teach CS500” — negation must be asserted explicitly. Students lose points on entailment questions every term because of this.

In ALC, what’s the difference between ∀R.C and ∃R.C, and why does ∀R.C give surprising results when R has no fillers?

∃R.C: at least one R-successor in C. ∀R.C: every R-successor is in C. Trap: if an individual has no R-successors, ∀R.C is vacuously true. So ∀hasChild.Doctor is true for everyone childless — counterintuitive in natural language, correct in classical logic.

Vagueness & Uncertainty

⚠️ A statement has μ_A(x) = 0.7. Is this a probability?

No — it’s a degree of membership: x belongs to fuzzy set A to degree 0.7. Probabilities describe uncertainty about a crisp fact (“it will rain”). Fuzzy values describe partial truth about a vague predicate (“it is warm”). Different algebras (min/max vs. product/sum) and different semantics.

What axioms does a t-norm satisfy?

(1) Neutral element: T(x, 1) = x. (2) Commutativity. (3) Associativity. (4) Monotonicity: x ≤ x’ ⇒ T(x, y) ≤ T(x’, y). Examples: min(x, y), product x·y, Łukasiewicz max(0, x + y − 1).

Defuzzification: name two methods and one situation where they disagree.

Centroid (center of gravity of the output membership function); Mean of Maxima (average of x-values where μ is maximal). Disagree on asymmetric multi-peaked output sets: centroid is pulled by mass, MoM sees only the peak. Common in control systems.

Machine Learning

⚠️ Derive the bias-variance decomposition for squared error in one line.

E[(y − ŷ)²] = (E[ŷ] − y)² + E[(ŷ − E[ŷ])²] + σ² = Bias² + Variance + Irreducible noise. ↑ model complexity → ↓ bias, ↑ variance.
See: Machine Learning

In ID3, what happens when Information Gain selects an attribute with high cardinality (e.g., a unique ID)?

Catastrophic overfitting — each value becomes its own pure leaf, IG is maximized but generalization is zero. Fix: Gain Ratio (normalize by SplitInformation) or Gini impurity (CART). Random Forest with feature subsampling also mitigates.

Why does Bagging reduce variance but not bias?

Averaging identically distributed estimators reduces variance (by 1/n if independent; less if correlated). The expected prediction equals the base model’s → bias unchanged. Random Forest further decorrelates trees via feature subsampling → more variance reduction.

What does Out-of-Bag error estimate, and why is it ~unbiased?

Each bootstrap sample omits ~37% of points (1 − 1/e). For each point, predict using only trees that didn’t see it → effectively a CV estimate without splitting. Unbiased because predictions use models trained on data independent of that point.

SVM

⚠️ The SVM Lagrangian is L = ½‖w‖² − Σ αᵢ(yᵢ(⟨xᵢ,w⟩+b) − 1). Why minus, not plus?

Constrained minimization with inequality constraints yᵢ(⟨xᵢ,w⟩+b) − 1 ≥ 0. Standard form: minimize − Σ αᵢ · gᵢ(x) where gᵢ ≥ 0 and αᵢ ≥ 0. Flipping the sign would violate KKT and dualize incorrectly.

What do the slack variable ξᵢ and the C parameter do in soft-margin SVMs?

ξᵢ ≥ 0 allows point i to violate the margin (ξᵢ > 0) or be misclassified (ξᵢ > 1). Objective: ½‖w‖² + C · Σ ξᵢ. C large → few violations allowed → narrow margin, high variance, overfit risk. C small → wide margin, more bias, more robust.

State Mercer’s condition and what it guarantees.

A symmetric K(x, x’) is a valid kernel iff for every finite set {x₁, …, xₙ}, the Gram matrix K_ij = K(xᵢ, xⱼ) is positive semi-definite. Guarantees: K corresponds to a dot product in some Hilbert space → SVM optimization stays convex.

Neural Networks & Deep Learning

⚠️ Hopfield Networks store patterns at energy minima. Practical storage capacity for N neurons?

PDF says: “up to N target memories for N neurons” — use this on the exam. The realistic reliable-retrieval limit is ~0.138·N for random patterns (Hopfield 1982; Amit-Gutfreund-Sompolinsky 1985), before catastrophic interference sets in. Only cite the latter if asked for the precise number.
See: Hopfield Networks

What does the chain rule yield in backprop for a 2-layer MLP (input → W₁ → ReLU → W₂ → output)?

∂L/∂W₂ = δ_out · h^T (where h = hidden activation). ∂L/∂W₁ = (W₂^T · δ_out ⊙ ReLU’(z)) · x^T. The δ propagates backward, multiplied by local Jacobians at each layer.
See: Gradient Backpropagation

Why does ReLU help with vanishing gradients but not exploding gradients?

ReLU’s gradient is 1 (for x > 0) or 0 — no shrinking through depth, so signals don’t vanish. But it doesn’t bound activation magnitude → activations can grow → gradients can explode. Fixes: gradient clipping, BatchNorm/LayerNorm, He initialization.

⚠️ Modern Hopfield Networks (Ramsauer et al., 2020) connect to which Transformer component, and how?

Modern Hopfield is mathematically equivalent to Transformer self-attention: stored patterns ↔ keys, query state ↔ query vector, update rule = softmax-weighted sum = attention output. Self-Attention can be read as a single Hopfield retrieval step in a continuous, exponentially-high-capacity memory.
See: Hopfield Networks, Transformers, Attention is All You Need

Transformers

Write the Scaled Dot-Product Attention formula and explain each term.

Attention(Q, K, V) = softmax(QK^T / √d_k) · V. Q = queries (n × d_k), K = keys (m × d_k), V = values (m × d_v). QK^T: similarity matrix. /√d_k: prevents softmax saturation in high dim. softmax: turns similarities into a distribution over m keys. · V: weighted sum of values.

Why is positional encoding needed, and what makes sin/cos encoding special?

Self-attention is permutation-equivariant — without position info, “the dog bit the man” ≡ “the man bit the dog”. Sin/cos encoding (Vaswani 2017): PE(pos, 2i) = sin(pos / 10000^(2i/d)), PE(pos, 2i+1) = cos(…). Special: encodes relative position linearly — PE(pos + k) is a linear function of PE(pos) → the model can learn relative offsets without seeing them in training. Learned positional embeddings don’t have this property.

BERT vs. GPT — core training-objective difference and one architectural difference.

Objective: BERT = Masked Language Modeling (predict hidden tokens given full context — bidirectional). GPT = Causal Language Modeling (predict next token given left context only — autoregressive). Architecture: BERT = encoder-only with bidirectional self-attention. GPT = decoder-only with causal masking (each token attends only to previous positions).
See: Transformers, Attention is All You Need

Quellen

Erstellt: 15-02-25 10:48

Brain Online

Explorer

Questions for Methods of AI