poniedziałek, 7 października 2013

Artificial Intelligence University

News: I'm converting this into a "page", since a "post" is more of a finished thing genre. New updates will only go to the page version: http://lukstafi.blogspot.com/p/ai-university.html

Most of the links come from Video Lectures. The thesis is that the courses available online can form a solid education in AI. I have updated the list to provide a more balanced program, aiming at "university replacement". Tentatively one could go through four courses in a semester. I will add links to textbooks later.

  1. year:
    1. Introduction to Logic by Michael Genesereth
      1. General Game Playing by Michael Genesereth, also Michael Thielscher and Sam Schreiber -- at this point, take a quick tour by watching the (short and simple) lectures
    2. Probability Primer by mathematicalmonk
    3. Calculus: Single Variable by Robert Ghrist
    4. Algorithms, either one of:
      1. Algorithms: Design and Analysis, Part 2 by Tim Roughgarden
      2. Algorithms, Part 2 by Kevin Wayne and Robert Sedgewick
      3. Introduction to Algorithms by Charles Leiserson and Erik Demaine
    5. Functional Programming Principles in Scala by Martin Odersky
    6. Machine Learning  by Andrew Ng
    7. Introduction to Cognitive Architectures seminar:
      1. Cognitive Architectures by Włodek Duch
      2. Clarion Tutorial, Clarion Part 2 by Michael Lynch
      3. The Soar Cognitive Architecture  by Nate Debrinsky
      4. OpenCog by Ben Goertzel
      5. From Constructionist to Constructivist A.I. by Kristinn R. Thórisson
      6. Deconstructing Reinforcement Learning in Sigma, Modeling Two-Player Games in the Sigma Graphical Cognitive Architecture by Paul Rosenbloom
      7. Pursuing Artificial General Intelligence By Leveraging the Knowledge Capabilities Of ACT-R by Alesandro Oltramari
      8. A Cognitive Architecture based on Dual Process Theory (perception vs. imagination) by Claes Strannegård
    8. Scientific Approaches to Consciousness  by John F. Kihlstrom
  2. year:
    1. Probabilistic Graphical Models by Daphne Koller
    2. Course on Information Theory, Pattern Recognition, and Neural Networks by David MacKay
    3. Introduction to Modal Logic by Rajeev P. Goré
    4. Introduction to Databases  by Jennifer Widom
    5. Learning From Data  by Yaser Abu-Mostafa (Machine Learning with elements of Statistical Learning Theory)
    6. Linear Algebra by Gilbert Strang, also:
      1. Complex Analysis by Petra Bonfert-Taylor (optional)
      2. Differential Equations by Arthur Mattuck (optional)
      3. Introduction to Functional Analysis  by Richard Melrose (optional)
      4. Nonlinear Dynamics I: Chaos by Daniel Rothman (optional)
      5. Differential Geometry  by Paul Seidel (optional)
      • The optional math classes are meant to be picked up later as your time allows. You should at least have basic familiarity with: complex numbers; calculus and differential equations; linear operators: matrix representation in various bases, nullspaces, orthogonal complement.
    7. For a round number of courses, pick one more of the math courses above
    8. Introduction to Philosophy  by Richard Brown
  3. year:
    1. Discrete Optimization by Pascal Van Hentenryck
    2. Artificial Intelligence Planning by Gerhard Wickler and Austin Tate
    3. Introduction to Formal Languages, Automata and Computational Complexity  by Jeff Ullman
    4. Natural Language Processing, one of, or both:
      1. Dan Jurafsky and Christopher Manning
      2. Michael Collins
    5. Neural Networks by Geoffrey Hinton
      1. and Neural Networks class by Hugo Larochelle
    6.  Either:
      1. Machine Learning (review and continuation) by Andrew Ng, or
      2. Introduction to Machine Learning by Alex Smola.
      • Skip over parts that you are confident to know already.
    7. Linear Dynamical Systems by Stephen Boyd
    8. Computational Neuroscience by Rajesh P. N. Rao and Adrienne Fairhall
  4. year:
    1. Game Theory  by Kevin Leyton-Brown, Matthew O. Jackson and Yoav Shoham (optional)
    2. General Game Playing by Michael Genesereth, also Michael Thielscher and Sam Schreiber -- at this point, treat it as a project course, build your own player using knowledge from other courses
    3. Convex Optimization by Stephen Boyd (optional)
    4. Reinforcement Learning -- sorry for redundancy with each other and with pieces in Andrew Ng, try to find your way
      1. by Csaba Szepesvári,
      2. by Satinder Singh Baveja,
      1. Foundations of Machine Learning by Marcus Hutter,
      2. Richard Sutton AGI 2010 Keynote Address, Part 2 
      3. GQ(lambda)- A General Gradient Algorithm for Temporal-Difference Prediction Learning with Eligibility Traces by Hamid Reza Maei
    5. Abstract Algebra  by Benedict Gross
    6. Overview of Automated Reasoning  by Peter Baumgartner
    7. Type Theory Foundations and Proof Theory Foundations by Robert Harper and Frank Pfenning respectively
    8. Ethics and Moral Issues  by Richard Brown
  5. year:
    1. Big Data, Large Scale Machine Learning  by John Langford and Yann LeCun
    2. Graphical Models and Variational Methods by Christopher Bishop
    3. Statistical Learning Theory by John Shawe-Taylor and by Olivier Bousquet / newer variant of Olivier's
    4. Practical Statistical Relational Learning by Pedro Domingos
    5. Online Learning, Regret Minimization, and Game Theory by Avrim Blum
    6. Introduction to Category Theory by error792
    7. Computer Vision by Mubarak Shah
    8. Cognitive Architectures and Modeling Course -- perhaps some combination of these, but there is no good course online:
      1. Representations: Classes, Trajectories, Transitions and Architectures: GPS, SOAR, Subsumption, Society of Mind by Patrick H. Winston, as introduction
      2. The Society of Mind by Marvin Minsky
      3. Cognitive Science and Machine Learning Summer School videos
      4. Cognitive Modeling by John Anderson and T.A. Phil Pavlik
        1. Cognitive Modelling by Sharon Goldwater
      5. AGI 2011 ArchitecturesPart 2 and other AGI Conference presentation videos

czwartek, 8 sierpnia 2013

Reading Gary Drescher "Made Up Minds"

The "schema mechanism" system developed in Gary Drescher's PhD project reminds me of Anticipation-Based Learning Classifier Systems, but it is more AGI / cognitive-architecture worthy because of its representation building facilities. The book is very novel for its year 1991, before modern RL theory became popular in AI. ABLCSes are quite recent development in LCSes (although with an early publication in 1990). I have (good) sentiment towards Learning Classifier Systems, they were my first encounter with a more cognitive form of AI, very long ago.

It would be cool to redo Gary Drescher's project, but using Rich Sutton and Hamid Maei's recent results -- Gradient Temporal-Difference Algorithms with off-policy learning -- instead of schemas.

The principle behind Gary's project is constructivism, the opposite of nativism. Almost all structure of the world, even what Kant claimed to be necessarily needed a priori -- binding of experience into objects having features -- can be learned from input by a relatively simple mechanism. You might think that therefore constructivism isn't only radically against Chomsky, but also David Deutsch'es ideas I've been quoting recently --  bashing of empiricism and of logico-positivism. But consider this: the construction algorithm, to succeed, must have universal reach in David Deutsch'es terms. And Gary Drescher accepts the criticisms against logico-empiricism, "Even the most rudimentary conceptions of the physical object cannot be defined by the schema mechanism as any function of the sensory primitives." Section 8.6 stresses "Why Non-naive Induction Must Be Built In"; and the mechanism needs to solve similar problems with counterfactuals. The system uses counterfactuals for learning and concept invention. "The difficulty and obscurity of the concept of counterfactuals is, I suspect, a reason that its fundamental importance for learning systems has been late to be recognized, rather than a reason to consider it an implausible basis for learning."

Notes below focus on chapters 3 and 4 as these describe the mechanisms of the cognitive system. I pick a couple of nuggets from later chapters. Chapter 2 presents Piagetian theory, elaborating on initial stages of development. It is worth reading in full.
  1. Conditions are conjunctions of items. Primitive items are sensory facts and synthetic items are beliefs.
  2. Primitive items are binary: On / Off, synthetic items are ternary: On / Off / Unknown.
  3. Objects and relations between objects are supposed to be stable configurations of schemas, synthetic items and composite actions.
  4. Schemas are identified by a triple: preconditions, action, postconditions.
  5. Primitive actions change state of the world and agent.
  6. Composite actions are identified by a postcondition they achieve.
  7. Therefore a schema might be a refinement of the postcondition of a composite action under given preconditions.
  8. Accessible value in a state is the maximum value achievable along a reliable path from the state.
  9. Instrumental value is assigned to items along reliable paths to a goal. I.e. items in a state have its accessible value as instrumental value. Instrumental value is transient.
  10. Delegated value is assigned proportionally to (1) the difference of average accessible values of an item: when the item is On minus when the item is Off; and (2) the duration of the item being On. It is more permanent.
  11. Item of frequently instrumental value only has delegated value when it is not "readily accessible" -- readily accessible items have the difference in accessible values (between their On/Off states) close to zero.
  12. To avoid runaway of propagating delegated value through cycles, the value propagated is half of the delegated value.
  13. Attention via hysteresis-habituation loop: recently selected schemas have more weight but decreasing with consecutive selections.
  14. Reweighting to upsample schemas with rare actions.
  15. Promote actions with inverse effects (turns an item On/Off right after it was turned Off/On) as it heuristically leads to reliable schemas.
  16. Marginal attribution:
    1. Start from bare schema: {}-A->{}; add results in spinoff schemas for items whose positive-transition (from Off to On) or negative-transition rates following action A are higher than their averages for all actions.
    2. Add (negated) items in contexts of spinoff schemas which (anti)correlate with validity of the schema, making the spinoff schema more reliable.
    3. All statistics count only unexplained transitions and correlations -- when the items aren't in results or contexts of existing valid schemas. (Statistics are reset after a spinoff accordingly.) Statistics are only collected by most specific schemas accounting for a situation. This increases sensitivity to regularities and reduces combinatorial explosion.
  17. Schema chaining requires that all items of the (primary) context of a following schema are provided by results of preceding schema. It's used for composite actions, etc.
  18. Schemas have associated extended context (and extended results). Extended contexts and results are mutable (evolve over time). Besides being a data structure for spinoff formation, extended context adds to the condition for activation of a schema. Schema chaining cannot rely on the corresponding spinoff schemas, because it often requires general primary contexts.
  19. For a composite action, besides conditions that need to hold initially, there can also be conditions that need to hold throughout the action.
  20. Synthetic items are designed to identify invariants when all apparent manifestations change or cease (compare Piagetian conservation phenomena).
  21. Keep track of local consistency: the probability that a schema will be valid right after it has been valid, and expected duration of consistency: how long since onset of validity to first invalid activation. (Recall attention via hysteresis.)
  22. A synthetic item is a reifier of validity conditions of an unreliable locally consistent schema, called its host schema -- action called probing action and result called manifestation. I.e. it is the state item such that when added to the context (precondition) of the schema, if the schema were activated, the action would bring the result.
    1. The intention is that the synthetic item captures a persistent feature, like presence of an object, while the remaining context items of the host schema capture transient features, like effector configuration.
  23. Learned verification conditions set the state of a synthetic item:
    1. Host schema trial: when host schema is activated, On resp. Of if it succeded resp. failed.
    2. Local consistency: the state remains as changed for at most a period of expected duration of host schema's local consistency (for On, local "inconsistency" for Off). Then revert to Unknown.
    3. Augmented context conditions: the extended context of the host schema (which collects evidence from spinoff schemas).
    4. Predictions: "If a synthetic item appears in the result of a reliable schema, and that schema is activated, then in the absence of any evidence to the contrary, the mechanism presumes that that schema succeeded".
  24. Above mechanism approximates a synthetic item, but the synthetic item is not coextensive with any function of cumulative inputs (i.e. of the inputs history).
    1. "The schema mechanism grounds its synthetic items in the reification of counter-factual assertions; the subsequent adaptation of its verification conditions is driven by that grounding."
  25. Composite action is created for each spinoff schema that has a novel result.
  26. "A composite action is considered to have been implicitly taken whenever its goal state becomes satisfied [...] Marginal attribution can thereby detect results caused by the goal state, even if the goal state obtains due to external events." Which together with hysteresis leads to imitation.
  27. Backward Broadcast mechanism and action controller learn proximity of schemas (results) to goal states (of composite actions). If reliable chains of schemas are found, they are incorporated into composite actions. The chains are also used for forward prediction.
  28. Action controller handles special cases: indeterministic actions (schemas with same contexts and actions but various results), repetition, on-the-fly repair (detecting schema that make applicable some component of an interrupted action).
  29. Schema with composite action cannot spin off a schema with part of the composite action goal in the results.
  30. There is no "problem resolution" mechanism. Rather, some schemas hit a dead-end, and are taken over by schemas that capture more fruitful regularities.
  31. I haven't understood how inversely indexed representations (synthetic items) work. (par. 6.4.4)
  32. Note that synthetic items do not represent identity of for example tactile-visual objects. This isn't bad though because the system's mistakes reproduce Piagetian errors at corresponding developmental stages. Errors mean unreliable schemas which leads to further development.
  33. Now on to more far-fetched stuff. "The new conception [learned abstract concept] reifies the set of circumstances under which a piece of one's computational machinery behaves a certain way."
  34. "Consciousness requires knowledge (and hence representation) of one's own mental experiences as such; the schema mechanism does not come close to demonstrating such knowledge."
  35. Unimplemented mechanism: subactivation. "To subactivate an applicable schema is essentially to simulate taking its action, by forcing its result items into a simulated-On state (or, if negated, a simulated-Off state)." The simulated states are entirely distinct from actual states and all mechanisms are duplicated for them. But statistics are shared, and spinoff schemas are created "for real".
    1. Simulations are serial but parallel chaining search will cache the knowledge.
  36. Unimplemented mechanism: explicitly represent inverse actions to make them available for subactivation (i.e. simulation).
  37. Override generalizations: when a derived (by simulation) schema prediction is wrong because a direct schema from which it was derived is overriden, derived schema should be overriden too, without penalty for the derivation. A new schema will be created to capture this exception.
    1. "The suggestion is that deductive-override machinery may permit the schema mechanism to escape the fallacy of naive induction. The key is to regard the conflict between a reasonable generalization and an absurd but always-confirmed generalization as just another conflict between generalizations expressed at different levels of description."
  38. "The reason [not to build in] a variable-matching implementation of generalizations, is just that there is no apparent way to support such an implementation without abandoning the constructivist working hypothesis by including domain-specific build-in structure. [...] Perhaps the system itself could be designed to devise explicit structured representations to support variablized generalizations. [I]f virtual generalization fails, devising such machinery may be vital to the schema mechanism."
  39. "A schema's extended context is essentially a connectionist network solving a classifier problem."
  40. Unimplemented: clustering (i.e. hierarchical modeling); "having coordinated coarse- and fine-grained spaces mitigates the combinatorics of showing the path from one fine-grained place to another, because the path can be represented as a coarse segment to get in the right vicinity, followed by a fine-tuning segment."
  41. Unimplemented: garbage collection. Schemas: not contributing to goal achievement, not spawning new spinoffs, seldom activated, perhaps even those that are activated but recreation opportunities are more frequent than activation opportunities.

piątek, 22 lutego 2013

Dump of my recent "The Hard Problem" comments

  • Ascription of consciousness is a matter of degree, i.e. an entity can be slightly conscious or very conscious, with a lower bound of no consciousness but probably without upper bound. Consciousness is a complex process, a complex of processes of consciousness. For the most part, processes of consciousness are about representation and action (some like to call it intentionality). A necessary condition for a given degree of consciousness is a given degree of complexity of adequate representation including representation of actions (i.e. bidirectional causality between processes of consciousness and remaining processes of environment). Consciousness is relative to the environment of which it is conscious, i.e. you judge reality of consciousness by how you judge the reality of its environment (e.g. parallel universe, formally stated complete description of a universe, ...) Sufficient conditions for a given degree of consciousness are satisfaction to a given degree of constraints of the kind described in "Being No One" by Thomas Metzinger. Consciousness is not fundamental to all life, it does not necessitate biological life, and it is not epiphenomenal. Current computers don't have mental processes (AFAICT).
  • I mean by it that single cellular organisms do not have consciousness, and an "epsilon" of consciousness only appears in animals with a central nervous system. On the other hand, consciousness can for example be running on a set of integrated silicon-based chips created by humans, and be sent to a planet composed of only inorganic materials.
  • The notion of potentiality is important for this view. Mental processes are processes that are potentially conscious. Dreams and mental imagery are conscious in so far as they are potentially representative (or pertain to such). Also, I don't mind using the word "presentation" rather than "representation" in "transparent" cases -- where we don't have knowledge of our experiences, but rather have knowledge of the experienced things by having the experiences.
  • [Epiphenomenalism] is logically impossible (once you define consciousness somehow), but it is conceivable. Imagine that you stare on a webcam attached to someone else's head. Now imagine that a brick is falling on that person. You instinctively duck the brick. As it turns out, that person ducked the brick in the same manner. Suddenly you feel you are the person who sees the world through the webcam (due to the accidental correlation of your intention with the behavior). Epiphenomenalism would be very similar to this scenario, only with a systematic explanation of the "accidental correlation" via the claim that intention is just a manner of perceiving the onset of an action.
  • In this analogy there is the causally active consciousness of the person with the webcam, and the epiphenomenal consciousness of the person watching, insofar as the webcam transmission is concerned. [...] a prototype/analogy for a single experience where the causal link from the experience to its physical carrier is broken. To build a full-fledged epiphenomenalism you need to limit all experiences to such a kind.
  • Consciousness can the primary reality of the universe in diametrically different ways. (1a) It can be entirely different substance from material stuff -- the abandoned dualism. (1b) It can be an aggregation of an aspect of physical stuff that has "propensity for subjectivity", for example "quantum collapsability" or some other spookiness. (1c) It can just be some particular biological material processes but not others (the sense of "primary reality of consciousness" here is that we cannot "explain it away" structurally/functionally, but could at best duplicate these processes).
  • (2a) Epiphenomenalism is just a very narrow case of consciousness arising from material processes, one where it does not have causal effects (it only has causes). There are at least two other possibilities: (2b) consciousness is a particular coarse-graining of material processes, the effects of those processes are the effects of consciousness (with emphasis on the structure of the coarse-graining, not on the particular processes as in 1c); (2c) consciousness is a particular functional structure of material processes among other processes, the functional structure exhibiting so called downward causation on the processes that exhibit it.
  • "The neurological explanation of all measurable events related to 'orange' involves numbers and theories and chemistry, no need for the perception known as 'orange'." Well put. This is also known as "the problem of qualia", although I prefer calling it "the hard problem" because the notion of qualia is not really that clear-cut. For example "Normal listeners can discriminate about 1,400 steps of pitch difference across the audible frequency range, but they can recognize these steps as examples of only about 80 different pitches." I can discriminate two shades of orange when I see both nearby, without being able to give them names because when separated in time they just seem the same to me.
  • Perhaps a good way to start people thinking about this is to go back to B. Russell "Analysis of Mind" (1921). First, realize that when you note that you see an orange patch, you do not learn that you have an orange experience, you learn that you have experience of orange. "If there is a subject, it can have a relation to the patch of colour, namely, the sort of relation which we might call awareness. In that case the sensation, as a mental event, will consist of awareness of the colour, while the colour itself will remain wholly physical, and may be called the sense-datum, to distinguish it from the sensation. The subject, however, appears to be a logical fiction, like mathematical points and instants. It is introduced, not because observation reveals it, but because it is linguistically convenient and apparently demanded by grammar. [...] But when we [dispense with the subject, we cannot distinguish the sensation from the quality of the property.] Accordingly the sensation that we have when we see a patch of colour simply is that patch of colour, an actual constituent of the physical world, and part of what physics is concerned with [...]" In "My Philosophical Development" where I quote from, Russell continues, "It became possible to think that what the physiologist regards as matter in the brain is actually composed of thoughts and feelings, and that the difference between mind and matter is merely one of arrangement." This is close to the view that John Searle has. (My allegiance is more with Searle's opponents though, with representational functionalism of for example Daniel Dennett.)
  • Some philosophers are obsessed with the idea of “intrinsic existence”. We say that consciousness is (where “is” and “arises” have the same meaning) the structure of complex processes of representation and action. They say, structure is a relation among elements, and structure/relations do not have intrinsic existence because they are what is preserved between the existing stuff and arbitrary representations of the organization of the stuff, given an instruction of interpretation. Therefore they see our claim that consciousness is a particular structure of processes as equating something that has independent existence, with something that is a result of analysis/interpretation.
  • What we mean is functional structure, that consciousness is the structure of causal interaction within a system and across the system and the environment. They ridicule the idea that consciousness floats out of the brain sustained by causal links with pieces of environment. They say that besides, it doesn’t save function from being “in the eye of the functionalist”, that functions are ascribed rather than independently existing.