Information

1.2.1: Expanded Outline for URIECA Modules 4 and 5 - Biology

1.2.1: Expanded Outline for URIECA Modules 4 and 5 - Biology


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Module 4 Protein Expression and Isolation of DNA

Sessions 1 and 2. This week you will express the H396P Abl kinase domain. You will also isolate wild type (wt) Abl plasmid DNA for subsequent mutagenesis.

Session 1

  • Complete laboratory check-in.
  • Autoclave LB for bacterial protein expression (TAs).
  • Use sterile technique to transfer LB aliquots into three cell culture tubes.

Session 1- following day (~ 10 minutes of lab)

  • Select a colony of bacteria containing plasmids for the H396P Abl kinase domain and Yop phosphatase (supplied by your TA) and inoculate 5 mL of LB/ kanamycin (kan) / streptomycin (strep).
  • Select a colony of bacteria containing the Abl kinase domain plasmid and inoculate two 6-mL aliquots of LB/ kan.

Session 2

  • Inoculate 500 mL of LB/ kan/ strep with your overnight H396P Abl/ Yop bacterial culture. Induce protein expression.
  • Isolate the Abl plasmid DNA from the two 6-mL overnight cultures. • Quantify the Abl plasmid DNA concentration by absorption at 260 nm.

Session 2- following day (< 1 hour of lab)

  • Harvest cells by centrifugation. Record the pellet weight and store at -20 ºC.

Sessions 3 and 4: In these sessions you will verify that the plasmid DNA you isolated contains a construct of the expected size for the Abl kinase domain. You will then design primers for subsequent site-directed mutagenesis. In preparation for purifying the H396P Abl kinase domain, you will prepare all the necessary buffers for the lysis and purification. You will also prepare a standard curve for future protein quantification.

Session 3

  • Digest your isolated wt Abl DNA with Xho1/Nde1 restriction enzymes.
  • Analyze your digestion with an agarose gel and check for the ~6,000 bp insert.
  • Select a mutant Abl kinase domain that you would like to prepare. Design your primers to create the mutant DNA. Primer proposals will be handed in at the beginning of session 4.

Session 4

  • Prepare and pH lysis buffer, Ni-affinity column buffers, the dialysis stock buffer solution, and protein gel buffers and solutions.
  • Prepare the order form for your primers.
  • Prepare bovine serum albumin (BSA) dilutions and create a standard curve for the Bio-Rad protein quantification assay.

Sessions 5 and 6 In these sessions you will isolate the H396P Abl kinase domain using the amino-terminal hexahistidine tag. You will prepare an SDS gel for analyzing your protein elutions.

Session 5 (4 hours of lab)

  • Lyse your H396P Abl/Yop cell pellet.
  • Isolate the H396P Abl kinase domain by hexahistidine-tag affinity purification.
  • Combine the column elutions that contain detectable protein by UV/Vis. Dialyze the combined fractions to remove the imidazole.

Session 5- following day (~ 10 minutes of lab)

  • Change the dialysis buffer.

Session 6 (2 hours of lab)

  • Pour an SDS-PAGE gel for use in Session 7.
  • Prepare your pre- and post-induction samples and Ni-NTA elutions for the SDSPAGE gel analysis.

Sessions 7 and 8 In these sessions you will analyze the purified H396P Abl kinase domain by SDS-PAGE gel electrophoresis. You will determine the concentration of the expressed protein after purification and dialysis.

Session 7 (you may combine session 7 and session 8 into a single session)

  • Run and stain the SDS-PAGE gel. Take a picture of the gel for your report.

Session 8

  • Concentrate your dialyzed protein.
  • Use the Bio-Rad quantification assay to determine the protein concentration of the H396P Abl kinase domain after purification and after dialysis.

Module 5: DNA Mutagenesis and Kinase Activity Assays

Sessions 9 and 10 You will perform site-directed mutagenesis to construct the DNA for a mutant Abl kinase domain with a single base pair substitution. You will transform cells for subsequent isolation of your mutant DNA.

Session 9 (2 hours of lab)

  • Prepare your primers for the DNA mutagenesis.
  • Set up and run the PCR reaction for the mutant DNA with your primers.

Session 9- following day (< 10 minutes of lab)

  • Remove your pcr reaction from the thermal cycler and store at 4 ºC.

Session 10 (4 hours of lab)

  • Set up the Dpn digestion of the QuikChange DNA.
  • Pour LB/agar plates.
  • Transform cells with your mutant DNA, and plate the transformed cells.

Session 10- following day (~ 10 minutes)

  • Select 3 colonies from the plate and inoculate 3 separate 3-mL solutions of LB/ kan.

Sessions 11 and 12 In these sessions you will isolate your mutant DNA and send off samples for DNA sequencing. You will prepare buffers for the coupled phosphorylation assays that will be carried out in sessions 13 and 14.

Session 11

  • Isolate the DNA from the selected colonies and quantify the DNA concentration.
  • Prepare the DNA for sequencing and design sequencing primers.

Session 12

  • Prepare the buffers and solutions for the coupled phosphorylation assay

Sessions 13 and 14 In these sessions you will analyze the activity of the (commercially available) wild type (wt) Abl and your purified H396P Abl mutant using a coupled phosphorylation assay. You will then probe for inhibition of the wt and H396P Abl kinase domains in the presence of Gleevec and other potential Abl inhibitors.

Sessions 13 and 14

  • Use the coupled phosphorylation assay to probe for wt Abl kinase activity in the absence of an inhibitor, in the presence of Gleevec, and in the presence of an alternative small-molecule Abl inhibitor.
  • Use the coupled phosphorylation assay to probe for H396P Abl kinase activity in the absence of an inhibitor, in the presence of Gleevec, and in the presence of an alternative small-molecule Abl inhibitor.

Sessions 15 In the final lab session you will discuss the class results from the inhibition assays and use a structure-viewing program to analyze the active site of Abl and a selected Abl mutant. You will also analyze the results from DNA sequencing to determine if your mutagenesis was successful.

  • Analyze your sequencing data from the site-directed mutagenesis. Print out a copy of the DNA analysis for your final report.
  • Use the PyMol structure viewing program to view Abl crystal structures, and complete the structure viewing worksheet.

Journal Club presentations will take place during lecture periods at a time TBA.


Human brain

The human brain is the central organ of the human nervous system, and with the spinal cord makes up the central nervous system. The brain consists of the cerebrum, the brainstem and the cerebellum. It controls most of the activities of the body, processing, integrating, and coordinating the information it receives from the sense organs, and making decisions as to the instructions sent to the rest of the body. The brain is contained in, and protected by, the skull bones of the head.

The cerebrum, the largest part of the human brain, consists of two cerebral hemispheres. Each hemisphere has an inner core composed of white matter, and an outer surface – the cerebral cortex – composed of grey matter. The cortex has an outer layer, the neocortex, and an inner allocortex. The neocortex is made up of six neuronal layers, while the allocortex has three or four. Each hemisphere is conventionally divided into four lobes – the frontal, temporal, parietal, and occipital lobes. The frontal lobe is associated with executive functions including self-control, planning, reasoning, and abstract thought, while the occipital lobe is dedicated to vision. Within each lobe, cortical areas are associated with specific functions, such as the sensory, motor and association regions. Although the left and right hemispheres are broadly similar in shape and function, some functions are associated with one side, such as language in the left and visual-spatial ability in the right. The hemispheres are connected by commissural nerve tracts, the largest being the corpus callosum.

The cerebrum is connected by the brainstem to the spinal cord. The brainstem consists of the midbrain, the pons, and the medulla oblongata. The cerebellum is connected to the brainstem by pairs of tracts. Within the cerebrum is the ventricular system, consisting of four interconnected ventricles in which cerebrospinal fluid is produced and circulated. Underneath the cerebral cortex are several important structures, including the thalamus, the epithalamus, the pineal gland, the hypothalamus, the pituitary gland, and the subthalamus the limbic structures, including the amygdala and the hippocampus the claustrum, the various nuclei of the basal ganglia the basal forebrain structures, and the three circumventricular organs. The cells of the brain include neurons and supportive glial cells. There are more than 86 billion neurons in the brain, and a more or less equal number of other cells. Brain activity is made possible by the interconnections of neurons and their release of neurotransmitters in response to nerve impulses. Neurons connect to form neural pathways, neural circuits, and elaborate network systems. The whole circuitry is driven by the process of neurotransmission.

The brain is protected by the skull, suspended in cerebrospinal fluid, and isolated from the bloodstream by the blood–brain barrier. However, the brain is still susceptible to damage, disease, and infection. Damage can be caused by trauma, or a loss of blood supply known as a stroke. The brain is susceptible to degenerative disorders, such as Parkinson's disease, dementias including Alzheimer's disease, and multiple sclerosis. Psychiatric conditions, including schizophrenia and clinical depression, are thought to be associated with brain dysfunctions. The brain can also be the site of tumours, both benign and malignant these mostly originate from other sites in the body.

The study of the anatomy of the brain is neuroanatomy, while the study of its function is neuroscience. Numerous techniques are used to study the brain. Specimens from other animals, which may be examined microscopically, have traditionally provided much information. Medical imaging technologies such as functional neuroimaging, and electroencephalography (EEG) recordings are important in studying the brain. The medical history of people with brain injury has provided insight into the function of each part of the brain. Brain research has evolved over time, with philosophical, experimental, and theoretical phases. An emerging phase may be to simulate brain activity. [3]

In culture, the philosophy of mind has for centuries attempted to address the question of the nature of consciousness and the mind-body problem. The pseudoscience of phrenology attempted to localise personality attributes to regions of the cortex in the 19th century. In science fiction, brain transplants are imagined in tales such as the 1942 Donovan's Brain.


Background

Gene co-expression networks constructed from gene expression microarray data capture the relationships between transcripts [1–7]. From the point of view of individual genes ('from below'), modules are groups of highly interconnected genes that may form a biological pathway. From the point of view of systems biology ('from above'), functional modules bridge the gap between individual genes and emergent global properties [8–10]. Here we view modules as basic system components (i.e., nodes of a network) and describe their relationships using network language. We find that co-expression modules may form a biologically meaningful meta-network that reveals a higher-order organization of the transcriptome. We refer to modules in a meta-network of modules as meta-modules.

Our analysis can be viewed as a network reduction scheme that reduces a gene co-expression network involving thousands of genes to an orders of magnitude smaller meta-network involving module representatives (one eigengene per module). We refer to the resulting network as eigengene network. Using eigengene neworks, we will show that the information captured by co-expression modules is far richer than a catalogue of module membership.

As a motivating example, consider the comparison between gene co-expression networks in human and chimpanzee brains. Using gene expression microarray data corresponding to different brain regions, Oldham et al [11] found relatively large modules that are preserved between human and chimpanzee brains. Only one human brain module (corresponding to genes expressed in the cortex) was not preserved in chimpanzee brains. The original analysis focused on human modules and assessed their preservation in a corresponding chimpanzee co-expression network. We refer to such an analysis as a standard marginal module analysis since it simply determines whether a set of modules can be found in another network. Here we pursue a more comprehensive analysis that not only quantifies module preservation but also determines inter modular preservation. We refer to modules that are preserved among data sets as consensus modules. In our applications, we show that two consensus modules may be highly related to each other in one data set but unrelated in another. Inter-modular relationships are biologically interesting because changes in pathway dependencies may reflect biological perturbations.

In this work we present methods a) for finding consensus modules across multiple networks, b) for describing the relationship between consensus modules (eigengene networks), and c) for assessing whether the relationship between consensus modules is preserved across different networks (differential eigengene network analysis).


Education is our most effective tool to reduce poverty, address racism, and sustain economic advancement for all Virginians. The Commonwealth is committed to ensuring that students and families in Virginia, regardless of their race, economic status, or the languages they speak at home, feel welcomed in their schools. Visit VirginiaIsForLearners.virginia.gov/EdEquityVA to learn more about the commitment to ensure that the Commonwealth&rsquos public education system is positioned to achieve equitable academic outcomes for all students.

Virginia&rsquos School Quality Profiles provide information about accreditation, student achievement, college and career readiness, program completion, school safety, teacher quality and other topics of interest to parents and the general public. School Quality Profile reports are available for schools, school divisions and for the commonwealth.


Contents

The geometric series a + ar + ar 2 + ar 3 + . is written in expanded form. [1] Every coefficient in the geometric series is the same. In contrast, the power series written as a0 + a1r + a2r 2 + a3r 3 + . in expanded form has coefficients ai that can vary from term to term. In other words, the geometric series is a special case of the power series. The first term of a geometric series in expanded form is the coefficient a of that geometric series.

In addition to the expanded form of the geometric series, there is a generator form [1] of the geometric series written as

and a closed form of the geometric series written as

a / (1 - r) within the range |r| < 1.

The derivation of the closed form from the expanded form is shown in this article's Sum section. The derivation requires that all the coefficients of the series be the same (coefficient a) in order to take advantage of self-similarity and to reduce the infinite number of additions and power operations in the expanded form to the single subtraction and single division in the closed form. However even without that derivation, the result can be confirmed with long division: a divided by (1 - r) results in a + ar + ar 2 + ar 3 + . , which is the expanded form of the geometric series.

Typically a geometric series is thought of as a sum of numbers a + ar + ar 2 + ar 3 + . but can also be thought of as a sum of functions a + ar + ar 2 + ar 3 + . that converges to the function a / (1 - r) within the range |r| < 1. The adjacent image shows the contribution each of the first nine terms (i.e., functions) make to the function a / (1 - r) within the range |r| < 1 when a = 1. Changing even one of the coefficients to something other than coefficient a would (in addition to changing the geometric series to a power series) change the resulting sum of functions to some function other than a / (1 - r) within the range |r| < 1. As an aside, a particularly useful change to the coefficients is defined by the Taylor series, which describes how to change the coefficients so that the sum of functions converges to any user selected, sufficiently smooth function within a range.

The geometric series a + ar + ar 2 + ar 3 + . is an infinite series defined by just two parameters: coefficient a and common ratio r. Common ratio r is the ratio of any term with the previous term in the series. Or equivalently, common ratio r is the term multiplier used to calculate the next term in the series. The following table shows several geometric series:

a r Example series
4 10 4 + 40 + 400 + 4000 + 40,000 + ···
3 1 3 + 3 + 3 + 3 + 3 + ···
1 2/3 1 + 2/3 + 4/9 + 8/27 + 16/81 + ···
1/2 1/2 1/2 + 1/4 + 1/8 + 1/16 + 1/32 + ···
9 1/3 9 + 3 + 1 + 1/3 + 1/9 + ···
7 1/10 7 + 0.7 + 0.07 + 0.007 + 0.0007 + ···
1 −1/2 1 − 1/2 + 1/4 − 1/8 + 1/16 − 1/32 + ···
3 −1 3 − 3 + 3 − 3 + 3 − ···

The convergence of the geometric series depends on the value of the common ratio r:

  • If |r| < 1, the terms of the series approach zero in the limit (becoming smaller and smaller in magnitude), and the series converges to the sum a / (1 - r).
  • If |r| = 1, the series does not converge. When r = 1, all of the terms of the series are the same and the series is infinite. When r = −1, the terms take two values alternately (for example, 2, −2, 2, −2, 2. ). The sum of the terms oscillates between two values (for example, 2, 0, 2, 0, 2. ). This is a different type of divergence. See for example Grandi's series: 1 − 1 + 1 − 1 + ···.
  • If |r| > 1, the terms of the series become larger and larger in magnitude. The sum of the terms also gets larger and larger, and the series does not converge to a sum. (The series diverges.)

The rate of convergence also depends on the value of the common ratio r. Specifically, the rate of convergence gets slower as r approaches 1 or −1. For example, the geometric series with a = 1 is 1 + r + r 2 + r 3 + . and converges to 1 / (1 - r) when |r| < 1. However, the number of terms needed to converge approaches infinity as r approaches 1 because a / (1 - r) approaches infinity and each term of the series is less than or equal to one. In contrast, as r approaches −1 the sum of the first several terms of the geometric series starts to converge to 1/2 but slightly flips up or down depending on whether the most recently added term has a power of r that is even or odd. That flipping behavior near r = −1 is illustrated in the adjacent image showing the first 11 terms of the geometric series with a = 1 and |r| < 1.

The common ratio r and the coefficient a also define the geometric progression, which is a list of the terms of the geometric series but without the additions. Therefore the geometric series a + ar + ar 2 + ar 3 + . has the geometric progression (also called the geometric sequence) a, ar, ar 2 , ar 3 , . The geometric progression - as simple as it is - models a surprising number of natural phenomena,

  • from some of the largest observations such as the expansion of the universe where the common ratio r is defined by Hubble's constant,
  • to some of the smallest observations such as the decay of radioactive carbon-14 atoms where the common ratio r is defined by the half-life of carbon-14.

As an aside, the common ratio r can be a complex number such as |r|e iθ where |r| is the vector's magnitude (or length) and θ is the vector's angle (or orientation) in the complex plane. With a common ratio |r|e iθ , the expanded form of the geometric series is a + a|r|e iθ + a|r| 2 e i2θ + a|r| 3 e i3θ + . Modeling the angle θ as linearly increasing over time at the rate of some angular frequency ω0 (in other words, making the substitution θ = ω0t), the expanded form of the geometric series becomes a + a|r|e iω0t + a|r| 2 e i2ω0t + a|r| 3 e i3ω0t + . , where the first term is a vector of length a not rotating at all, and all the other terms are vectors of different lengths rotating at harmonics of the fundamental angular frequency ω0. The constraint |r|<1 is enough to coordinate this infinite number of vectors of different lengths all rotating at different speeds into tracing a circle, as shown in the adjacent video. Similar to how the Taylor series describes how to change the coefficients so the series converges to a user selected sufficiently smooth function within a range, the Fourier series describes how to change the coefficients (which can also be complex numbers in order to specify the initial angles of vectors) so the series converges to a user selected periodic function.

Closed-form formula Edit

where r is the common ratio. One can derive that closed-form formula for the partial sum, s, by subtracting out the many self-similar terms as follows: [2] [3] [4]

As n approaches infinity, the absolute value of r must be less than one for the series to converge. The sum then becomes

When a = 1 , this can be simplified to

The formula also holds for complex r , with the corresponding restriction, the modulus of r is strictly less than one.

As an aside, the question of whether an infinite series converges is fundamentally a question about the distance between two values: given enough terms, does the value of the partial sum get arbitrarily close to the value it is approaching? In the above derivation of the closed form of the geometric series, the interpretation of the distance between two values is the distance between their locations on the number line. That is the most common interpretation of distance between two values. However the p-adic metric, which has become a critical notion in modern number theory, offers a definition of distance such that the geometric series 1 + 2 + 4 + 8 + . with a = 1 and r = 2 actually does converge to a / (1 - r) = 1 / (1 - 2) = -1 even though r is outside the typical convergence range |r| < 1.

Proof of convergence Edit

We can prove that the geometric series converges using the sum formula for a geometric progression:

Convergence of geometric series can also be demonstrated by rewriting the series as an equivalent telescoping series. Consider the function,

Rate of convergence Edit

As shown in the above proofs, the closed form of the geometric series partial sum up to and including the n-th power of r is a(1 - r n+1 ) / (1 - r) for any value of r, and the closed form of the geometric series is the full sum a / (1 - r) within the range |r| < 1.

If the common ratio is within the range 0 < r < 1, then the partial sum a(1 - r n+1 ) / (1 - r) increases with each added term and eventually gets within some small error, E, ratio of the full sum a / (1 - r). Solving for n at that error threshold,

If the common ratio is within the range -1 < r < 0, then the geometric series is an alternating series but can be converted into the form of a non-alternating geometric series by combining pairs of terms and then analyzing the rate of convergence using the same approach as shown for the common ratio range 0 < r < 1. Specifically, the partial sum

s = a + ar + ar 2 + ar 3 + ar 4 + ar 5 + . + ar n-1 + ar n within the range -1 < r < 0 is equivalent to s = a - ap + ap 2 - ap 3 + ap 4 - ap 5 + . + ap n-1 - ap n with an n that is odd, with the substitution of p = -r, and within the range 0 < p < 1, s = (a - ap) + (ap 2 - ap 3 ) + (ap 4 - ap 5 ) + . + (ap n-1 - ap n ) with adjacent and differently signed terms paired together, s = a(1 - p) + a(1 - p)p 2 + a(1 - p)p 4 + . + a(1 - p)p 2(n-1)/2 with a(1 - p) factored out of each term, s = a(1 - p) + a(1 - p)p 2 + a(1 - p)p 4 + . + a(1 - p)p 2m with the substitution m = (n - 1) / 2 which is an integer given the constraint that n is odd,

which is now in the form of the first m terms of a geometric series with coefficient a(1 - p) and with common ratio p 2 . Therefore the closed form of the partial sum is a(1 - p)(1 - p 2(m+1) ) / (1 - p 2 ) which increases with each added term and eventually gets within some small error, E, ratio of the full sum a(1 - p) / (1 - p 2 ). As before, solving for m at that error threshold,

where 0 < p < 1 or equivalently -1 < r < 0, and the m+1 result is the number of partial sum pairs of terms needed to get within a(1 - p)E / (1 - p 2 ) of the full sum a(1 - p) / (1 - p 2 ). For example to get within 1% of the full sum a(1 - p) / (1 - p 2 ) at p=0.1 or equivalently r=-0.1, only 1 (= ln(E) / (2 ln(p)) = ln(0.01) / (2 ln(0.1)) pair of terms of the partial sum are needed. However at p=0.9 or equivalently r=-0.9, 22 (= ln(0.01) / (2 ln(0.9))) pairs of terms of the partial sum are needed to get within 1% of the full sum a(1 - p) / (1 - p 2 ). Comparing the rate of convergence for positive and negative values of r, n + 1 (the number of terms required to reach the error threshold for some positive r) is always twice as large as m + 1 (the number of term pairs required to reach the error threshold for the negative of that r) but the m + 1 refers to term pairs instead of single terms. Therefore, the rate of convergence is symmetric about r = 0, which can be a surprise given the asymmetry of a / (1 - r). One perspective that helps explain this rate of convergence symmetry is that on the r > 0 side each added term of the partial sum makes a finite contribution to the infinite sum at r = 1 while on the r < 0 side each added term makes a finite contribution to the infinite slope at r = -1.

As an aside, this type of rate of convergence analysis is particularly useful when calculating the number of Taylor series terms needed to adequately approximate some user-selected sufficiently-smooth function or when calculating the number of Fourier series terms needed to adequately approximate some user-selected periodic function.

Zeno of Elea (c.495 – c.430 BC) Edit

2,500 years ago, Greek mathematicians had a problem with walking from one place to another. Physically, they were able to walk as well as we do today, perhaps better. Logically, however, they thought [5] that an infinitely long list of numbers greater than zero summed to infinity. Therefore, it was a paradox when Zeno of Elea pointed out that in order to walk from one place to another, you first have to walk half the distance, and then you have to walk half the remaining distance, and then you have to walk half of that remaining distance, and you continue halving the remaining distances an infinite number of times because no matter how small the remaining distance is you still have to walk the first half of it. Thus, Zeno of Elea transformed a short distance into an infinitely long list of halved remaining distances, all of which are greater than zero. And that was the problem: how can a distance be short when measured directly and also infinite when summed over its infinite list of halved remainders? The paradox revealed something was wrong with the assumption that an infinitely long list of numbers greater than zero summed to infinity.

Euclid of Alexandria (c.300 BC) Edit

Euclid's Elements of Geometry [6] Book IX, Proposition 35, proof (of the proposition in adjacent diagram's caption):

Let AA', BC, DD', EF be any multitude whatsoever of continuously proportional numbers, beginning from the least AA'. And let BG and FH, each equal to AA', have been subtracted from BC and EF. I say that as GC is to AA', so EH is to AA', BC, DD'.

For let FK be made equal to BC, and FL to DD'. And since FK is equal to BC, of which FH is equal to BG, the remainder HK is thus equal to the remainder GC. And since as EF is to DD', so DD' to BC, and BC to AA' [Prop. 7.13], and DD' equal to FL, and BC to FK, and AA' to FH, thus as EF is to FL, so LF to FK, and FK to FH. By separation, as EL to LF, so LK to FK, and KH to FH [Props. 7.11, 7.13]. And thus as one of the leading is to one of the following, so (the sum of) all of the leading to (the sum of) all of the following [Prop. 7.12]. Thus, as KH is to FH, so EL, LK, KH to LF, FK, HF. And KH equal to CG, and FH to AA', and LF, FK, HF to DD', BC, AA'. Thus, as CG is to AA', so EH to DD', BC, AA'. Thus, as the excess of the second is to the first, so is the excess of the last is to all those before it. The very thing it was required to show.

The terseness of Euclid's propositions and proofs may have been a necessity. As is, the Elements of Geometry is over 500 pages of propositions and proofs. Making copies of this popular textbook was labor intensive given that the printing press was not invented until 1440. And the book's popularity lasted a long time: as stated in the cited introduction to an English translation, Elements of Geometry "has the distinction of being the world's oldest continuously used mathematical textbook." So being very terse was being very practical. The proof of Proposition 35 in Book IX could have been even more compact if Euclid could have somehow avoided explicitly equating lengths of specific line segments from different terms in the series. For example, the contemporary notation for geometric series (i.e., a + ar + ar 2 + ar 3 + . + ar n ) does not label specific portions of terms that are equal to each other.

Also in the cited introduction the editor comments,

Most of the theorems appearing in the Elements were not discovered by Euclid himself, but were the work of earlier Greek mathematicians such as Pythagoras (and his school), Hippocrates of Chios, Theaetetus of Athens, and Eudoxus of Cnidos. However, Euclid is generally credited with arranging these theorems in a logical manner, so as to demonstrate (admittedly, not always with the rigour demanded by modern mathematics) that they necessarily follow from five simple axioms. Euclid is also credited with devising a number of particularly ingenious proofs of previously discovered theorems (e.g., Theorem 48 in Book 1).

To help translate the proposition and proof into a form that uses current notation, a couple modifications are in the diagram. First, the four horizontal line lengths representing the values of the first four terms of a geometric series are now labeled a, ar, ar 2 , ar 3 in the diagram's left margin. Second, new labels A' and D' are now on the first and third lines so that all the diagram's line segment names consistently specify the segment's starting point and ending point.

Here is a phrase by phrase interpretation of the proposition:

Proposition in contemporary notation
"If there is any multitude whatsoever of continually proportional numbers" Taking the first n+1 terms of a geometric series Sn = a + ar + ar 2 + ar 3 + . + ar n
"and equal to the first is subtracted from the second and the last" and subtracting a from ar and ar n
"then as the excess of the second to the first, so the excess of the last will be to all those before it." then (ar-a) / a = (ar n -a) / (a + ar + ar 2 + ar 3 + . + ar n-1 ) = (ar n -a) / Sn-1, which can be rearranged to the more familiar form Sn-1 = a(r n -1) / (r-1).

Similarly, here is a sentence by sentence interpretation of the proof:

Proof in contemporary notation
"Let AA', BC, DD', EF be any multitude whatsoever of continuously proportional numbers, beginning from the least AA'." Consider the first n+1 terms of a geometric series Sn = a + ar + ar 2 + ar 3 + . + ar n for the case r>1 and n=3.
"And let BG and FH, each equal to AA', have been subtracted from BC and EF." Subtract a from ar and ar 3 .
"I say that as GC is to AA', so EH is to AA', BC, DD'." I say that (ar-a) / a = (ar 3 -a) / (a + ar + ar 2 ).
"For let FK be made equal to BC, and FL to DD'."
"And since FK is equal to BC, of which FH is equal to BG, the remainder HK is thus equal to the remainder GC."
"And since as EF is to DD', so DD' to BC, and BC to AA' [Prop. 7.13], and DD' equal to FL, and BC to FK, and AA' to FH, thus as EF is to FL, so LF to FK, and FK to FH."
"By separation, as EL to LF, so LK to FK, and KH to FH [Props. 7.11, 7.13]." By separation, (ar 3 -ar 2 ) / ar 2 = (ar 2 -ar) / ar = (ar-a) / a = r-1.
"And thus as one of the leading is to one of the following, so (the sum of) all of the leading to (the sum of) all of the following [Prop. 7.12]." The sum of those numerators and the sum of those denominators form the same proportion: ((ar 3 -ar 2 ) + (ar 2 -ar) + (ar-a)) / (ar 2 + ar + a) = r-1.
"And thus as one of the leading is to one of the following, so (the sum of) all of the leading to (the sum of) all of the following [Prop. 7.12]." And this sum of equal proportions can be extended beyond (ar 3 -ar 2 ) / ar 2 to include all the proportions up to (ar n -ar n-1 ) / ar n-1 .
"Thus, as KH is to FH, so EL, LK, KH to LF, FK, HF."
"And KH equal to CG, and FH to AA', and LF, FK, HF to DD', BC, AA'."
"Thus, as CG is to AA', so EH to DD', BC, AA'."
"Thus, as the excess of the second is to the first, so is the excess of the last is to all those before it." Thus, (ar-a) / a = (ar 3 -a) / S2. Or more generally, (ar-a) / a = (ar n -a) / Sn-1, which can be rearranged in the more common form Sn-1 = a(r n -1) / (r-1).
"The very thing it was required to show." Q.E.D.

Archimedes of Syracuse (c. 287 – c. 212 BC) Edit

Archimedes used the sum of a geometric series to compute the area enclosed by a parabola and a straight line. His method was to dissect the area into an infinite number of triangles.

Archimedes' Theorem states that the total area under the parabola is 4/3 of the area of the blue triangle.

Archimedes determined that each green triangle has 1/8 the area of the blue triangle, each yellow triangle has 1/8 the area of a green triangle, and so forth.

Assuming that the blue triangle has area 1, the total area is an infinite sum:

The first term represents the area of the blue triangle, the second term the areas of the two green triangles, the third term the areas of the four yellow triangles, and so on. Simplifying the fractions gives

This is a geometric series with common ratio 1/4 and the fractional part is equal to

This computation uses the method of exhaustion, an early version of integration. Using calculus, the same area could be found by a definite integral.

Nicole Oresme (c.1323 – 1382) Edit

Among his insights into infinite series, in addition to his elegantly simple proof of the divergence of the harmonic series, Nicole Oresme [7] proved that the series 1/2 + 2/4 + 3/8 + 4/16 + 5/32 + 6/64 + 7/128 + . converges to 2. His diagram for his geometric proof, similar to the adjacent diagram, shows a two dimensional geometric series. The first dimension is horizontal, in the bottom row showing the geometric series S = 1/2 + 1/4 + 1/8 + 1/16 + . , which is the geometric series with coefficient a = 1/2 and common ratio r = 1/2 that converges to S = a / (1-r) = (1/2) / (1-1/2) = 1. The second dimension is vertical, where the bottom row is a new coefficient aT equal to S and each subsequent row above it is scaled by the same common ratio r = 1/2, making another geometric series T = 1 + 1/2 + 1/4 + 1/8 + . , which is the geometric series with coefficient aT = S = 1 and common ratio r = 1/2 that converges to T = aT / (1-r) = S / (1-r) = a / (1-r) / (1-r) = (1/2) / (1-1/2) / (1-1/2) = 2.

Although difficult to visualize beyond three dimensions, Oresme's insight generalizes to any dimension d. Using the sum of the d−1 dimension of the geometric series as the coefficient a in the d dimension of the geometric series results in a d-dimensional geometric series converging to S d / a = 1 / (1-r) d within the range |r|<1. Pascal's triangle and long division reveals the coefficients of these multi-dimensional geometric series, where the closed form is valid only within the range |r|<1.


Note that as an alternative to long division, it is also possible to calculate the coefficients of the d-dimensional geometric series by integrating the coefficients of dimension d−1. This mapping from division by 1-r in the power series sum domain to integration in the power series coefficient domain is a discrete form of the mapping performed by the Laplace transform. MIT Professor Arthur Mattuck shows how to derive the Laplace transform from the power series in this lecture video, [8] where the power series is a mapping between discrete coefficients and a sum and the Laplace transform is a mapping between continuous weights and an integral.

The closed forms of S d /a are related to but not equal to the derivatives of S = f(r) = 1 / (1-r). As shown in the following table, the relationship is S k+1 = f (k) (r) / k!, where f (k) (r) denotes the k th derivative of f(r) = 1 / (1-r) and the closed form is valid only within the range |r| < 1.

Repeating decimals Edit

A repeating decimal can be thought of as a geometric series whose common ratio is a power of 1/10. For example:

The formula for the sum of a geometric series can be used to convert the decimal to a fraction,

The formula works not only for a single repeating figure, but also for a repeating group of figures. For example:

Note that every series of repeating consecutive decimals can be conveniently simplified with the following:

That is, a repeating decimal with repeat length n is equal to the quotient of the repeating part (as an integer) and 10 n - 1 .

Economics Edit

In economics, geometric series are used to represent the present value of an annuity (a sum of money to be paid in regular intervals).

For example, suppose that a payment of $100 will be made to the owner of the annuity once per year (at the end of the year) in perpetuity. Receiving $100 a year from now is worth less than an immediate $100, because one cannot invest the money until one receives it. In particular, the present value of $100 one year in the future is $100 / (1 + I ), where I is the yearly interest rate.

which is the infinite series:

This sort of calculation is used to compute the APR of a loan (such as a mortgage loan). It can also be used to estimate the present value of expected stock dividends, or the terminal value of a security.

Fractal geometry Edit

In the study of fractals, geometric series often arise as the perimeter, area, or volume of a self-similar figure.

For example, the area inside the Koch snowflake can be described as the union of infinitely many equilateral triangles (see figure). Each side of the green triangle is exactly 1/3 the size of a side of the large blue triangle, and therefore has exactly 1/9 the area. Similarly, each yellow triangle has 1/9 the area of a green triangle, and so forth. Taking the blue triangle as a unit of area, the total area of the snowflake is

The first term of this series represents the area of the blue triangle, the second term the total area of the three green triangles, the third term the total area of the twelve yellow triangles, and so forth. Excluding the initial 1, this series is geometric with constant ratio r = 4/9. The first term of the geometric series is a = 3(1/9) = 1/3, so the sum is


CL-20-Based Cocrystal Energetic Materials: Simulation, Preparation and Performance

1 μm (Figure 12a). The mean particle size of the discrete coformers substantially decreased after 10 min of milling (Figure 12b), with no conversion to the cocrystalline material. Plate-like crystal particles with dimensions less than 500 nm started to appear in the specimen being milled for 20 min, as indicated by the arrow in Figure 12c. The plate-like particles were assigned to the 2CL-20·HMX cocrystal as (1) the 2CL-20·HMX cocrystal is known to have a plate-like morphology (2) the appearance of these particles and the diffraction peaks of the 2CL-20·HMX cocrystal in the XRD pattern occurred at the same time and (3) more plate-like particles were observed in the specimens upon further milling (Figure 12d and e, respectively). The observation of these relatively large cocrystal particles seems to be contradicting to the intensive collisions between the grinding media and particles and between the particles themselves occurring during the milling process. It is possible that some growth occurs during the drying of the sampled specimens.


Genetics of Epilepsy

Berge A. Minassian , in Progress in Brain Research , 2014

5 Neuronopathic Gaucher Disease

Glucocerebrosidase removes glucose from the glycolipid glucosylceramide in lysosomes. Comparatively mild deficiencies of this enzyme lead to accumulations of glucosylceramide that most cells can in some fashion deal with, including possibly by extrusion. Mononuclear phagocytes play a crucial role in degrading these materials, and when they cannot, they become filled with them (and become Gaucher cells), which infiltrate the spleen causing hepatosplenomegaly, and the bone marrow leading to marrow failure (Type I Gaucher). Type II does involve the nervous system and leads to an acute neurologic infantile or early childhood lethality. It is the Type III that is of interest here, namely, presenting with nonacute neurologic disease, and sometimes with a typical PME, complete with occipital seizures and progressive myoclonus and dementia. Hepatosplenomegaly may be moderate, but should be the telltale sign for this diagnosis. Gaucher cells in bone marrow aspirates may be rare and should not exclude the disease.


Accessible Science Resources

Find out what's new and learn more about resources related to accessible science.

SOFIA, which is a jumbo jet fitted with an infrared telescope, recently flew through the "central flash" of Pluto. This is the area where researchers can get the most complete sense of Pluto’s skies and record how much of Pluto’s atmosphere absorbed the starlight.

TVI Jeff Killebrew recently flew on SOFIA and has been sharing his experiences with students and teachers.

Archived Hubble Hangout video on 3D printing and technology in astronomy education and research. The 3D Astronomy Project at the Space Telescope Science Institute and NASA Goddard have created innovative education materials and 3D models of astronomical objects using #Hubble data.

This guide describes 28 of the most common birds in North America, with a recording of their voices includes a listing of different habitats and the birds common to each.

R.G. Baldwin shares ideas about making physics concepts accessible to students who are blind or visually impaired. He is a Professor of Computer Information Technology at Austin Community College in Austin, TX and is interested in helping students with visual impairments overcome barriers when studying science. The materials on this site are intended to supplement an introductory Physics class in high school or college. Topics include:

The site also includes information about creating tactile graphics and links to using supplemental materials, such as a graphing board, protractor, etc.

Students who are blind should not be excluded from physics courses because of inaccessible textbooks. The modules in this collection present physics concepts in a format that students with visual impairments can read using accessibility tools, such as an audio screen reader and an electronic line-by-line braille display. These modules are intended to supplement and not to replace the physics textbook.

This interactive website is full of practical ideas for hands-on lessons, resources, materials, and more. Subscribe to the blog, ask questions, and share your ideas with an online community of practice of educators interested in making science accessible to students with visual impairments.

This section of the interactive website includes information about products and instructional materials for teaching science to students with visual impairments.

In this webcast, Perkins science teacher Kate Fraser outlines teaching strategies and adaptations to make science lessons and activities accessible to students who are visually impaired. Find even more resources more at the Perkins Accessible Science website.

"The AccessSTEM website is a space where K-12 teachers, postsecondary educators, and employers learn to make classroom and employment opportunities in science, technology, engineering and mathematics (STEM) accessible to individuals with disabilities, and share promising practices."

Adapted Curriculum Enhancement (ACE) seeks to provide research-based educational products and services for students with visual impairments. Their website offers some hands-on activities on topics such as:

  • Spongy Universe
  • Dynamic Universe
  • Tracing Origins Thought Experiments
  • Feel the Impact

The materials include tactile files, audio files, and student text.

Acorn Naturalists offers a host of 3-D models appropriate for students with visual impairment.

The following is from the Acorn Naturalists website:

This site is filled with ideas and tools for planning your programs, enriching your curriculum, and getting children interested in science and nature. Acorn Naturalists' resources are designed to add depth, zest, and inspiration to your programs. This year, our 26th, we've added hundreds of new products, including replicas, hands-on activity kits, field equipment, interpretive tools, curricula, and other engaging resources.

Acorn Naturalists was started over a quarter century ago by two educators interested in developing and distributing high quality resources for the trail and classroom. The founders are still involved on a daily basis, working closely with a dedicated and informed staff.

Evolving Universe and Feel the Impact are NASA astronomy modules adapted for students with visual impairments. Both include alternate student texts and tactile graphics cards. The SEE Project develops "Braille / tactile … space science activities and observing programs that actively engage blind and visually impaired students from elementary grades through introductory college level in space science."

Adapting Science for Students with Visual Impairments is a handbook from American Printing House for the Blind (APH) for classroom teachers and teachers of students who are blind or visually impaired (TVIs). It provides suggestions for making science accessible to students with vision loss. It includes:

  • Advance Preparation Checklist alerting the teacher to orientation and safety issues
  • Skills Checklist readying the student for laboratory and classroom activities

The Adapting Science checklists are also available as a free download at: http://www.aph.org/manuals/index.html

OSHA standards and procedures for protecting the eyes in the workplace.

This Tactile Astronomy section of Amazing Space features a library of selected Hubble images that can be printed in a tactile format. It opened in celebration of Hubble's 20th Anniversary, on April, 2010. Images can be easily printed on microcapsule paper and "puffed" in a thermal fuser. These specially prepared PDFs include braille headings and embedded text for screen readers that describes the featured celestial object and what astronomers are learning about it optimized for low-vision users and screen readers. Tactile Astronomy also includes a "special projects" section that currently features the limited-edition Tactile Carina Nebula booklet, a 17-by-11–inch color image embossed with lines, slashes, and other markings that correspond to objects within the nebula’s fantasy landscape of bubbles, valleys, mountains, and pillars.

If you have questions about our tactile Hubble images or the Tactile Astronomy site, please send an e-mail to: [email protected]

For itinerant TVIs who are adapting chemistry for a braille student, these guidelines will be invaluable. The 20-page document includes:

  • Basic Guidance on When to Switch
  • UEB Rule for Use of Opening and Closing Nemeth Indicators
  • Additional Guidelines
  • Formatting

The guidelines are available for free download in PDF or BRF format on the BANA (Braille Authority of North America) website.

Maylene Bird shares teaching tips on cells, microscopes, diagrams and models, dissecting, and measuring.

This page has links to various biology lessons, a list of errors and omissions in the Holt Biology Book (2004), and diagrams that can be downloaded to accompany the text.

This site features two newspaper articles about "small but significant breakthroughs" in science education for students who are blind: Camp Eureka, a natural history camp in Montana and a dissection class at Colorado Center for the Blind.


Supplementary Figure 1 Scores in Sub-challenge 1.

(a) Overall scores of the 42 module identification methods applied in Sub-challenge 1 at four different FDR cutoffs (10%, 5%, 2.5%, and 1% FDR). For explanation see legend of Fig. 2b, which shows the scores at 5% FDR (the predefined cutoff used for the challenge ranking). The top-performing method (K1) ranks first at all four cutoffs. The consensus prediction achieves the top score at 10% and 5% FDR, but not at the more stringent cutoffs. (b) Average number of trait-associated modules across the 42 methods for each of the six networks. The most trait modules are found in the two protein-protein interaction (PPI) and the co-expression networks. Related to Fig. 2d, which shows the average number of trait modules relative to network size.

Supplementary Figure 2 Pairwise similarity of module predictions from different methods.

(a) Pairwise similarity of module predictions from different methods in Sub-challenge 1, averaged over all networks. Similarity was computed based on whether the same genes were clustered together by the two methods. Specifically, a prediction vector Pmk was defined for every method m and network k, specifying for every pair of genes whether they were co-clustered in the same module (Methods). The prediction vectors Pmk of method m for the six networks (k = 1,2. 6) were then concatenated, forming a single vector Pm representing the module predictions of that method for all six networks. A corresponding distance matrix between the 42 methods was computed as described in Methods (Equation 1) and hierarchically clustered using Ward’s method. The annotation row and column show the method type. The top five methods (1-5) and the consensus (C) are highlighted. The top methods did not converge to similar module predictions (they are not all grouped together in the hierarchical clustering). Related to Fig. 3, which shows similarity of module predictions from individual networks. (b) Comparison of trait-associated modules identified by all challenge methods. Pie-charts show the percentage of trait modules that show overlap with at least one trait module from a different method in the same network (top) and in different networks (bottom). We distinguish between strong overlap, sub-modules, weak but statistically significant overlap, and insignificant overlap (Methods).

Supplementary Figure 3 Optimal module granularity is method- and network-specific.

All panels show results for single-network module identification methods (Sub-challenge 1). (a) Average module size versus score for each of the 42 methods. The x-axis shows the average module size of a given method across the six networks. The y-axis shows the overall score of the method. Top teams (highlighted) produced modules of varying size, i.e., they did not converge to a similar module size during the leaderboard round. There is no significant correlation between module size and score (p-value = 0.13 using two-sided Pearson’s correlation test), i.e., the scoring metric did not generally favor either small or large modules. Rather, when optimizing parameters during the leaderboard round, teams converged to very different granularities that led to the best performance for their specific methods. (b) Average number of modules versus score for each method. The x-axis shows the average number of submitted modules across networks for a given method, and the y-axis shows the corresponding score. The top five teams (highlighted) submitted a variable number of modules (between 103 and 470 modules, on average, per network). There is no significant correlation between the number of submitted modules and the obtained score (p-value = 0.99 using two-sided Pearson’s correlation test), i.e., the scoring metric was not biased to generally favor either a small or high number of submitted modules. (c) Comparison of module sizes between networks and method types. For each network, boxplots show the distribution of average module sizes for kernel clustering (n = 6 methods), modularity optimization (n = 10 methods), random-walk-based (n = 10 methods), and hybrid methods (n = 7 methods the remaining categories are not shown because they comprise only three methods each). Note that teams tuned the resolution (average module size) of their method during the leaderboard round. The variation in module size between different method categories and networks suggests that the optimal resolution is method- and network-specific. For example, teams using random-walk-based methods tended to choose a higher resolution (smaller average module size) than teams using kernel clustering or modularity optimization methods. On average, modules were smallest in the signaling network and largest in the co-expression network. (d) Module size versus trait-association p-value for individual modules from all methods and networks. For all n = 84,798 modules, the module size (x-axis) is plotted against the -log10 of the minimum Pascal p-value across all GWASs (y-axis). Color shows the density of points. By design, Pascal p-values are not confounded by module size 23 , which is confirmed here (the regression line, shown in red, is flat see also Supplementary Fig. 4).

Supplementary Figure 4 Module granularity of random predictions does not correlate with score.

The panels show the average number of trait-associated modules for 17 random modularizations of the networks (i.e., networks were decomposed into random modules of the given sizes). Results are shown both for Bonferroni (orange) and Benjamini-Hochberg (blue) corrected p-values at a significance level of 0.05. The difference between the two panels is the background gene set used for the Pascal module enrichment test (see Methods). (a) The complete set of all annotated genes is used as background to compute module enrichment (the UCSC known genes). This is an incorrect choice for the background because module genes are drawn from the network genes, which is a subset of all known genes. As expected, this incorrect choice of a background set leads to a higher number of trait-associated random modules than in Panel b, in particular for large modules. (b) The set of all genes in a given network is used as background to compute module enrichment. This is the approach that was employed for the challenge scoring. Besides from very small modules of size 3, the module size does not affect the number of trait-associated random modules, i.e., our scoring methodology is not biased towards a specific module size (see also Supplementary Fig. 3d).

Supplementary Figure 5 Scores in Sub-challenge 2.

(a) Final scores of multi-network module identification methods in Sub-challenge 2 at four different FDR cutoffs (10%, 5%, 2.5%, and 1% FDR). For explanation see legend of Fig. 3e, which shows the scores at 5% FDR (the predefined cutoff used for the challenge ranking). Ranks are indicated for the top five teams (ties are broken according to robustness analysis described in Panel b). The multi-network consensus prediction (red) achieves the top score at each FDR cutoff. Interestingly, the performance of methods integrating all five networks (dark blue) seems to drop substantially at the more stringent FDR thresholds. For example, the second and third ranking methods at both 5% and 10% FDR, which integrated all five networks, performed poorly at the 2.5% and 1% FDR thresholds (see second and third row from the top). This suggests that not only the absolute number of trait-associated modules, but also their quality in terms of association strength could not be improved by considering multiple networks. As mentioned in the Discussion, the challenge networks may not have been sufficiently related for multi-network methods to reveal meaningful modules spanning several networks. Indeed, the similarity between our networks in terms of edge overlap was small (Supplementary Fig. 6). Of note, there is an important conceptual difference between the multi-network methods that teams applied (blue) and the multi-network consensus prediction (red). While the former performed modularization on blended or multi-layer networks, the latter integrated the single-network module predictions obtained from each individual network (see Supplementary Fig. 7). Results thus suggest that our multi-network consensus approach is better suited than multi-layer module identification methods when network similarity is low. Exploring the performance of these different approaches when applied to networks of varying similarity is a promising avenue for future work. (b) Robustness of the overall ranking in Sub-challenge 2 was evaluated by subsampling the GWAS set used for evaluation 1,000 times. For each method, the resulting distribution of ranks is shown as a boxplot (using the 5% FDR cutoff for scoring). Related to Fig. 2c, which shows the same analysis for Sub-challenge 1. The difference between the top single-network module prediction and the top multi-network module predictions is not significant when sub-sampling the GWASs (Bayes factor < 3, see Methods section “Robustness analysis of challenge ranking”).

Supplementary Figure 6 Pairwise similarity of challenge networks.

Pairwise similarity of challenge networks. The upper triangle of the matrix shows the percent of shared links (the Jaccard index multiplied by 100) and the lower triangle shows the fold-enrichment of shared links compared to the expected number of shared links at random. The two protein-protein interaction networks are the two most similar networks, yet they have only 8% shared edges. Of note, a recent study has found similarly low overlap between protein-protein interaction networks from different sources, suggesting that these molecular maps are still far from complete 60 . 60. Huang, J. K. et al. Systematic Evaluation of Molecular Networks for Discovery of Disease Genes. Cell Syst. 6, 484-495.e5 (2018)

Supplementary Figure 7 Consensus Module Predictions.

(a) Schematic of the approach used to generate single-network consensus module predictions for Sub-challenge 1. For each network, module predictions from the top 50% of teams were integrated in a consensus matrix C, where each element cij gives the fraction of teams that clustered gene i and j together in the same module in the given network (performance as the percentage of considered teams is varied is shown in (c)). The overall score from the leaderboard round was used to select the top 50% of teams, i.e., the same set of teams was used for each network. The consensus matrix of each network was then clustered using the top-performing module identification method of the challenge (method K1 see Methods). (b) The approach used to generate multi-network consensus module predictions for Sub-challenge 2 was exactly the same as for single-network predictions, except that team submissions from all networks were integrated in the consensus matrix C. In other words, as input we still used the single-network predictions of the top 50% of teams from Sub-challenge 1, but instead of forming a consensus matrix for each network, a single cross-network consensus matrix was formed. This cross-network consensus matrix is then clustered using method K1 as described above (see Methods). (c) Scores of the single-network consensus predictions as the percentage of integrated teams is varied. We considered the top 25%, 50%, 75% and 100% of teams, as well as the top eight (19%) teams (these are the teams that ranked 2nd, or tied with the team that ranked 2nd, at any of the considered FDR cutoffs). (d) Performance of different methods to construct the consensus matrix C. In addition to the basic approach described above (Standard), two more sophisticated approaches to construct the consensus matrix were evaluated (Normalized and SML). In each case, the same set of team submissions were integrated (top 50%) and method K1 was applied to cluster the resulting consensus matrix. The first alternative (Normalized) is similar to the basic method but further assumes that appearing together in a smaller cluster is stronger evidence that a pair of genes is associated than appearing together in a larger cluster. Thus, each cluster’s contribution to the consensus matrix was normalized by the size of the cluster. Furthermore, we normalized the ij-entry of the consensus matrix by the number of methods that assigned gene i to a cluster, thus taking the presence of background genes into account. We found that the consensus still achieved the top score with these normalizations, but there was no improvement compared to the basic approach. The second method is a very different approach called Spectral Meta Learner (SML) 56 . SML is an unsupervised ensemble method designed for two-class classification problems. Briefly, it takes a matrix of predictions P, where each row corresponds to different samples being classified and the columns correspond to different methods. Accordingly, each matrix element Pij is the class (0 or 1) assigned to sample i by method j. Under the assumption of conditional independence of methods given class labels, SML can estimate the balanced accuracy of each classifier in a totally unsupervised manner using only the prediction matrix P. The algorithm then uses this information to construct an ensemble classifier in which the contribution of each classifier is proportional to its estimated performance (balanced accuracy). The module identification problem is an unsupervised problem by its nature and we applied the SML algorithm as a new way for constructing consensus modules. For each method m and network k, we created a vector of prediction Pmk, of size (N_) by (N_) , where (N_) is the number genes in network as follows: Pmk(i, j) = 1, if method m puts genes I and j in the same module (1) Pmk(i,j) = 0, otherwise. For each network, we constructed the prediction matrix P with each column Pm defined as above. We then provided this matrix as input to the SML algorithm. The SML algorithm outputs a consensus matrix, which assigns a weight between each pair of genes. We found that SML did not perform well in the context of this challenge, likely because the underlying assumption of SML is that top-performing methods converge to similar predictions, which was not the case here (see Fig. 3 and Supplementary Fig. 2).

Supplementary Figure 8 Number of distinct trait-associated modules recovered by top methods.

Number of distinct trait-associated modules recovered by the top K methods. Here, we did not form consensus modules. Instead, given the top K methods, we considered the set including all individual modules predicted by these methods and scored them with the same pipeline as used for the challenge submissions. We then evaluated how many “distinct” trait-associated modules were recovered by these methods. Distinct modules were defined as modules that do not show any significant overlap among each other. Overlap between pairs of modules was evaluated using the hypergeometric distribution and called significant at 5% FDR (Benjamini-Hochberg adjusted p-value < 0.05). From the set of trait-associated modules discovered by the top K methods, we thus derived the subset of distinct trait-associated modules (when several modules overlapped significantly, only the module with the most significant GWAS p-value was retained). Although the resulting scores (number of distinct trait-associated modules) cannot be directly compared with the challenge scores (because module predictions had to be strictly non-overlapping in the challenge), it is instructive to see how many distinct trait modules can be recovered when applying multiple methods. The stacked bars (colors) further show how many of the distinct trait modules are contributed by each method category. The number of distinct trait modules is not monotonically increasing as more methods are added because the larger sets of modules also increase the multiple testing burden of the GWAS scoring. The top four methods together discover 78 distinct trait-associated modules. Relatively little is gained by adding a higher number of methods.

Supplementary Figure 9 Functional Enrichment for Example Modules.

Enrichment p-values for mouse mutant phenotypes, Reactome pathways and GO biological processes are shown for four example modules discussed in the main text. P-values were computed using the non-central hypergeometric distribution and adjusted using the Bonferroni method (Methods). Results for the remaining trait-associated modules from the consensus analysis in the STRING protein-protein interaction network are shown in Supplementary Fig. 12 and Supplementary Table 4. Functional enrichment analysis for additional pathway databases and modules from all methods and networks are available on the challenge website. (a) Module associated with height described in Fig. 5 (n = 25 genes). (b) Module associated with rheumatoid arthritis described in Fig. 6a (n = 25 genes). (c) Module associated with inflammatory bowel disease described in Fig. 6b (n = 42 genes). (d) Module associated with myocardial infarction described in Fig. 6c (n = 36 genes).

Supplementary Figure 10 Enrichment of trait-associated modules in curated gene sets from recent studies

. Enrichment of trait-associated modules in six curated gene sets from three recent studies. The first two gene sets were taken from Marouli et al. 32 and correspond to genes comprising height-associated ExomeChip variants (n = 475 genes) and genes known to be involved in skeletal growth disorders (n = 266 genes), respectively. The third gene set was taken from de Lange et al. 61 and corresponds to genes causing monogenic immunodeficiency disorders (n = 316 genes). Lastly, three gene sets relevant for type 2 diabetes (T2D) were taken from Fuchsberger et al. 62 and correspond to genes in literature-curated pathways that are believed to be linked to T2D (we distinguished between genes in cytokine signalling pathways [n = 384 genes] and other pathways [n = 390 genes]) and genes causing monogenic diabetes (n = 81 genes). We then considered corresponding GWAS traits in our hold-out set, namely height, all immune-related disorders, and T2D. We tested all modules associated with these GWAS traits for enrichment in these six external gene sets. Enrichment was tested using the hypergeometric distribution and p-values were adjusted to control FDR using the Benjamini-Hochberg method. The heatmap shows for each GWAS (row) the fraction of trait-associated modules that significantly overlap with a given gene set (column). It can be seen that modules associated with a given trait predominantly overlap the external gene sets that are expected to be relevant for that trait. 61. de Lange, K. M. et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat. Genet. 49, 256–261 (2017). 62. Fuchsberger, C. et al. The genetic architecture of type 2 diabetes. Nature 536, 41–47 (2016).

Supplementary Figure 11 Support of trait-module genes in higher-powered GWASs.

Trait-associated modules comprise many genes that show only borderline or no signal in the corresponding GWAS (called “candidate trait genes”). To assess whether modules correctly prioritized candidate trait genes, we considered eight traits for which older (lower-powered) and more recent (higher-powered) GWAS datasets were available in our holdout set. This allowed us to evaluate how well trait-associated modules and candidate trait genes predicted using the lower-powered GWAS datasets were supported in the higher-powered GWAS datasets. (a) Pairs of older (lower-powered) and more recent (higher-powered) GWASs used for the evaluation of module-based gene prioritization. The first column gives the trait and the second and third columns the corresponding GWASs. The bar plot shows the percentage of trait-associated modules from the first GWAS that are also trait-associated modules in the second GWAS. At the bottom, the expected percentage of confirmed modules at random is shown (i.e., assuming the trait-associated modules in the second GWAS were randomly selected from the set of predicted modules). (b) Height-associated module from Fig. 5 as an illustrative example (n = 25 genes). The module shows modest association to height in the lower-powered GWAS. Color indicates GWAS gene scores (FDR-corrected Pascal p-value = 0.04, see Methods). The signal is driven by three genes from different loci with significant scores (pink), while the remaining genes (grey) are predicted to be involved in height because of their module membership. (c) The module from (b) is supported in the higher-powered GWAS (q-value = 0.005). 45% of candidate trait genes (grey in (b)) are confirmed (pink). (d) Since high-powered GWASs typically result in many trait-associated genes, even random modules would have some genes “confirmed”. It is thus important to evaluate whether more candidate trait genes are confirmed than expected. Here we show support of candidate trait genes across the eight traits listed in (a). The lower-powered GWASs were used to predict candidate trait genes, defined as genes that: (i) are within a trait-associated module in the lower-powered GWAS (ii) have a high gene p-value (p > 5E-4, i.e., two orders of magnitude above the genome-wide significance threshold of 5E-6 (cf. grey genes in (a)) and (iii) are located more than one megabase away from the nearest significant locus of the corresponding GWAS. Gene p-values were computed using Pascal as described above. Finally, the Pascal p-value of all candidate trait genes was evaluated for the higher-powered GWAS (n = 2,254 genes considering trait-modules from all methods). Since there is a genome-wide tendency for p-values to become more significant in higher-powered GWAS data 38 , Pascal p-values were also evaluated for a background gene set (all genes that meet the two conditions (ii, iii), but do not belong to trait-associated modules of the lower-powered GWAS). The plot shows the cumulative distribution of gene scores in the higher-powered GWASs for candidate trait genes (red line) and genes in the background set (grey line). a substantial fraction of module genes that do not show any signal and are located far from any significant locus in the lower-powered GWAS are subsequently confirmed by the higher-powered GWAS. (e) Since candidate trait genes (i.e., genes satisfying the three conditions (i-iii) described above) could still have lower p-values than genes in the background set (i.e., genes satisfying the two conditions (ii, iii)), we repeated the same analysis with higher gene p-value thresholds for condition (ii): p-value > 5E-3 (n = 2,185 genes) (e) and p-value > 5E-2 (n = 1,969 genes) (f). For this range the “discovery” gene score p-values in the candidate set and the background set are much more similar. Although there may remain some confounding, the same trend as in (d) is observed, indicating that the result is robust. This suggests that modules are predictive for trait-associated genes and could potentially be used to prioritize candidate genes for follow-up studies, for instance.

Supplementary Figure 12 Overview of Consensus Trait-modules in the STRING Network.

Overview of all 21 trait-associated consensus modules in the STRING protein-protein interaction network. The first three columns give the module ID, the trait type, and the specific GWAS trait that the module is associated to. We tested all modules for enrichment in GO annotation, mouse mutant phenotypes, and other pathway databases using the non-central hypergeometric test (Methods). The putative function of each module based on this enrichment analysis is summarized in the fourth column (see Figs. 5 and 6, Supplementary Fig. 9, and Supplementary Table 4 for details). Two thirds of the modules have functions that correspond to core pathways underlying the respective traits, while the remaining modules correspond either to generic pathways that play a role in diverse traits or to pathways without an established connection to the considered trait or disease. Only pathways with a well-established link to the trait were considered core pathways. Generic pathways, such as cell-cycle-related or epigenetic pathways, were not considered core pathways because they are relevant for many traits and tissues, making them more difficult to target therapeutically. For example, modules 77 and 109 are both associated with schizophrenia and comprise pathways related to epigenetic gene silencing and nucleosome organization, respectively. Although there is evidence that epigenetic mechanisms may play a role in schizophrenia, we considered this to be a generic pathway.

Supplementary Figure 13 Modules Associated with IgA Nephropathy.

The top ten enriched GO biological processes, Reactome pathways and mouse mutant phenotypes are shown for two IgA nephropathy (IgAN) associated modules. P-values were computed using the non-central hypergeometric distribution (Methods). (a) IgAN-associated module identified using the consensus analysis in the InWeb protein-protein interaction network (n = 19 genes). The module comprises immune-related NF-κB signaling pathways. Enriched mouse mutant phenotypes for module gene homologs include perturbed immunoglobulin levels (IgM and IgG1). The module implicates in particular the NF-κB subunit REL as a candidate gene. The REL locus does not reach genome-wide significance in current GWASs for IgAN but is known to be associated with other immune disorders such as rheumatoid arthritis. (b) IgAN-associated module identified by the best-performing method (K1) in the InWeb protein-protein interaction network (n = 12 genes). Besides finding complement factors that are known to play a role in the disease (CFB and C4A), the module implicates novel candidate genes such as the chemokine Platelet Factor 4 Variant 1 (PF4V1) from a sub-threshold locus, and is enriched for coagulation cascade, a process known to be involved in kidney disease 62 . The top two enriched mouse mutant phenotypes are precisely “abnormal blood coagulation” and “glomerulonephritis”. 62. Madhusudhan, T., Kerlin, B. A. & Isermann, B. The emerging role of coagulation proteases in kidney disease. Nat. Rev. Nephrol. 12, 94–109 (2016).


Watch the video: 100MSC Module 4 Part 2 28min (July 2022).


Comments:

  1. Danila

    The same thing, endlessly

  2. Urian

    Between us speaking, I would ask for help from the users of this forum.

  3. Mojin

    I confirm. It was with me too. We can communicate on this theme. Here or at PM.

  4. Siman

    What a great topic

  5. Gideon

    Strange how

  6. Devisser

    I thought and moved away this sentence



Write a message