The president of the school board announced that your district will begin an intense war on mediocrity in our school district. Within five years our foremost goal is for all our students to move above average on the Comprehensive Tests of Basic Skills…Over the last decade about half the students in your district have been above average. ….Why is the board president’s statement unrealistic?
It is not statistically possible for all students to score above average in your district.
Recent trends and current trends all suggest that the use of testing and assessment in the schools is likely to…
Joan, a third-grade teacher, decides to complete her lesson on “math Facts-Addition” and move on to “Math Facts Subtraction. This is an example of ___________ decision.
According to the authors, test misuse and abuse are most likely to occur whenever…
Test users are unaware of the factors that influence test scores.
As a teacher the main advantage of knowing the lay language of validity and reliability is to …
communicate test results to the parents
This type of test has uniform administration and scoring procedures…
____________ decision is usually made based on standardized test data.
Ms. Palmer’s main motive to suggest Ms. Wilson to include the essay items was…
to test students’ higher order thinking
For the upcoming test, Ms. Palmer will create…
Criterion-Referenced Test (CRT)
Who determines whether students with disabilities take the same annual academic assessment required under NCLB?
Individual Educational Plan Team
The major distinction between testing and assessment is that assessment is a process that is…
broader than testing
Which is the most appropriate marking system to communicate students’ grades with the parents?
Letter grades A, B, C, D, E, F
Which of the following is NOT a factor that can affect a test’s usefulness?
Increased use of portfolio and performance assessment techniques in the schools has been suggested to…
reduce pressure to test only for facts and specific skills.
Which of the following best describes the current status of performance-based assessment?
A currently popular idea that awaits further debate and development.
What is Ms. Palmer’s main reasoning for utilizing the Promotion and Graduation Test (PGA)?
For the students to graduate.
Both IDEIA and NCLB represent example of
educational reform initiatives
The first thing that Ms. Wilson should have done prior to writing the test items for the upcoming exam is to…
Assessment that is intended to inform day to day instructional decision making in the classroom is referred to as
A teacher has a new foreign student and wishes to obtain an estimate of the student’s math skills. What kind of test should she use?
What is Ms. Palmer’s reasoning for creating the criterion referenced test in the mid-semester?
She wants to test students’ more specific knowledge of the topics covered.
Robert, a 9th grader, has just been told, “The reason you’re having so much trouble with division is that you have never mastered multiplication. The Math Basic Skills Tests have indicated this quite clearly. We are going to provide you with instruction in multiplication immediately. Roberts teacher made what kind of a decision relative to his skill level?
You receive a packet in the mail from Dr. Smith, a psychologist in private practice, who is completing an evaluation of a child suspected of having learning disabilities. It contains a number of checklists and forms and will take 45-60 minutes to complete. The special ed supervisor tells you Dr. Smith will be presenting his findings at the child’s special education eligibility meeting at the end of the month. What should you do?
As a result of the recent passage of the 2004 Reauthorization of the Individuals with Disabilities Education Act (IDEA-04) the involvement of general education teachers in the education, testing and assessment of special education students is expected to…
When selecting a test, which of the following factors should be considered?
The cultural/linguistic background of those being tested.
Why is it important to specify what we want to measure before we begin to test?
To avoid interpretive errors
This type of test has uniform administration and scoring procedures
Teachers are most frequently involved in what kind of decision?
The major distinction between testing and assessment is that assessment is a process that is
broader than testing.
One of the main advantages of the use of curriculum-based measurements (CBM) is that they…
can be used for progress monitoring of both regular and special education students.
How we measure in the classroom should be determined by…
what it is that we want to measure.
The main purpose of testing in education is to
provide objective achievement data
For the upcoming test, Ms. Palmer will create
Why Ms. Palmer suggested to Ms. Wilson to write down the objectives?
For progress monitoring, it is important that multiple CBM data points are used to inform decision making to compensate for the
poor reliability of single CBM probes
The term used for reassigning a principal and all or part of existing school staff due to unacceptable school performance is
In a three tier RTI model, compared to Tier 3 instruction, Tier 2 instruction is typically…
less intensive and less specialized
One of the advantages of the standard protocol approach to RTI is
training teachers to implement this approach is easier
Which of the following best describes response to intervention (RTI)?
An integrated assessment, intervention and decision-making system
According to the three-tier RTI model, about what percentage of students receive Tier 2 instruction?
This type of high-stakes test is designed to focus on broad national goals and objectives
Universal screening is intended to
assess all students to identify those at educational risk
The No Child Left Behind (NCLB) Act requires public schools to assess all students at the end of grades
3 through 8
Progress monitoring involves the use of
frequent brief assessments to monitor student responsiveness to instruction
According to the authors, basing important decisions only on high-stakes test scores is controversial because these scores are
A colleague insists that RTI and CBM are the same. You state, “That is not correct,
CBM is on of the key RTI assessment components
The discrepancy between the “aim line” and the “trend line” is used to
A primary disadvantage of the problem-solving RTI model is
implementation integrity is difficult
Which of the following statement is true about the No Child Left Behind Act (NCLB) and state high stakes testing programs?
In high-stakes testing, alignment is considered to be evidence of an achievement test’s
The intent of high stakes testing is to use educational tests and measurements to
make decisions of prominent educational, financial or social impact
A fellow teacher says I’ve been teaching for 30 years, and we have always used end of year norm-referenced tests to make educational decisions. There is simply no need for us to start using CBM on a weekly basis. Your reply is
they are not sensitive to day to day changes in learning like CBM
According to the text, what can a teacher do to prepare and help students prepare for high-stakes testing?
keep discussions of the test simple
According to the authors, arguments against high stakes testing include all but which of the following
When RTI is sully implemented it applies to
both regular and special education personnel
According to the authors, RTI holds promise to
improve early identification and intervention for struggling readers
In a three tier RTI model, in what setting is Tier 1 instruction typically delivered?
The regular classroom
Progress monitoring involves the use of
frequent brief assessments to monitor student responsiveness to instruction.
You have been asked by your principal to provide input into district level committee that can inform your state’s high stakes testing program. During the committee meeting, you are told that the state plans to track test performance in order to make comparisons in scores over time. Based on this information which of the 12 AERA conditions should you make sure the state is meeting?
It is possible for a student to pass an NCLB required test in one state and yet fail an NCLB required test in another state because
states are free to establish different cut off(i.e. passing) scores.
Universal screening is intended to
assess all students to identify those at educational risk
According to the text, one of the challenges facing RTI is the lack of
culturally responsive measures and research-based interventions
Mr. Reacher wants to know if his class is achieving at or above or below average rate. A(n) ___________ test will help him to get this information
Identify the learning activity from the following list
Watch the Evening News tonight
Which of the following phrases best describes the process of analysis?
Breakdown into constituent parts
In a criterion referenced test, we are interested in
whether a pupil has achieved mastery of the skill
Grade equivalent scores compare a student’s performance to
the performance of others
Criterion-referenced tests are used to determine
which objectives a student has acquired competence in
The average difficulty level of an item for a norm referenced test is appropriate if
half the pupils miss the item
A technique that helps the teacher write test items at different levels of the cognitive taxonomy is
The term to describe the extent to which test items match a teacher’s instructional objective is
The LEAST important purpose for preparing educational objectives among the following is
to match test items on standardized achievements, so a course of study prepares students for the exam
Which of the following is a suggestion to more objectively score essay items?
Grade each person’s responses anonymously
The use of none of these as an option in a multiple choice item is only appropriate when
the opinions provide absolutely correct or incorrect answers
Ownership refer to the perception that the portfolio contains
what the students want
Which of the following types of items is best adapted to evaluating student knowledge of numerous technical terms?
What is the rationale behind specification of testing constraints in performance assessments?
Which of the following is a poor suggestion for grading an essay test?
Grade one complete paper before going to the next
In which of the following situations would the pictorial form of multiple choice items be most useful?
Interpreting a histogram
One prominent limitation of essay tests is
the difficulty of analyzing the subparts of the correct response
Which of the following item formats is easiest to construct?
To test a student’s ability to use higher mental processes of logical reasoning and critical thinking a(n) __________ test should be used?
The word objective when used to describe a type of test item, refers to the
Rubrics may best be considered to be
Which of the following is a good suggestion to follow in writing true-false items?
Avoid using statements taken directly from the text.
Which of the following items is poorly written?
Discuss the causes of the Civil War in depth. You will have 30 minutes.
The number of points a scale contains should
change according to the number of variations in the behavior that can be observed.
If the same knowledge were tested by the same number of items of different types, which type would the fewest number of pupils be expected to answer correctly by chance?
A short-answer test.
For which of the following objectives would an essay test be most clearly superior to an objective test? To appraise the examinee’s
ability to assemble and organize his ideas on a topic
How many variations in student behavior are being observed when a checklist is used?
The use of “none of these” as an option in a multiple-choice item is only appropriate when
the opinions provide absolutely correct or incorrect answers
In an extended essay, your detailed scoring criteria should at least assign weights to organization, process and
Which of the following item types is least subject to guessing?
For checklist items in which the “no opportunity to observe” box is checked, the behavior in question is coded
receive no rating
For which of the following levels of learning outcomes is the restricted response essay question more appropriate?
The use of scoring criteria in grading essay items consists of
rating or scoring each component of a pupil’s response.
On an open-book exam, a teacher selects a quotation from the text and asks her students to use the text to prepare an analysis of the quotation. This procedure is referred to as a
The stem of the multiple-choice item presents the pupil with a problem, whereas the choices represent
According to the authors, which objective item formats are most suitable for use in interpretive exercises?
Unintentional clues to the correct response in objective test items can best be reduced by
If you assigned a 1st draft a score of 7, a 2nd draft a score of 8, and a final draft a score of 9, and weighted the three products 20%, 30%, and 50%, respectively, what numerical score would you assign for this sample of work?
The best way in which to view portfolios among other competing forms of assessment would be to see the portfolio as
a means of capturing authentic behavior in a life-like environment.
The first step in developing a performance test is
Which of the following is NOT a recommendation to follow in writing multiple choice items?
What type of format provides greatest content sampling in the shortest testing time?
Objective tests are generally more reliable than essay tests because objective tests
Among the most important traits to include when developing a scoring mechanism for the entire portfolio are
organization thoroughness, growth or progress
For what type of examination would one expect to find the lowest interscorer reliability?
Which of the following would be considered dispositions?
The use of scoring criteria in grading essay items consists of
rating or scoring each component of a pupil’s response
Which of the following is a shortcoming of performance assessments?
None of the above is a shortcoming
Which of the following best reflects a performance assessment?
Developing a class presentation utilizing more than one medium that demonstrates the steps followed in completing the presentation.
If a teacher asks 10 essay questions in a two hour exam, how many points should be allocated to each item?
as many as the scoring criteria suggest
In rating the portfolio as a whole, which among the following would you want to emphasize most?
Growth in skill or performance
Which of the following best describes the current status of performance based assessment? It is…
being advocated as a fairer and more relevant alternative to conventional testing.
Which one of the types of items listed below is easiest to score?
Which one of the following is an advantage of short answer type items?
The probability of guessing correctly is reduced.
On an open book exam, a teacher selects a quotation from the text and asks her students to use the text to prepare an analysis of the quotation. This procedure is referred to as a
Assume a set of 90 normally distributed test scores has a mean of 70 and a standard deviation of 10. If this value is zero, everyone received the same score.
For making predictions, a test that yields a large negative correlation is…
as useful as one with the same sized positive correlation.
A person whose z-score was zero would have a raw score equal to…
If a pupil was informed that his z-score was -.6 in a distribution where the mean = 70 and the standard deviation = 10, his raw score would be…
Is always the same as the 50th percentile
the item refer to the median
Assume that a set of 90 normally distributed test scores has a mean of 70 and a standard deviation of 10. A T-score of 75 would equal a raw score of
What is the median for the following distribution of scores? 5,8,3,5,7,9,5,8,3
What is the variance of the scores 3, 6, and 9?
the square root of 6
A distribution with more than one most frequent score is
A research worker gave a scholastic aptitude test to a sample of eighth graders. Then he correlated the aptitude test scores with the chronological ages of the subjects. he found a correlation of -.42. How should this result be interpreted?
In a negatively skewed distribution, this value is located at the lower end.
the item refers to the mean
Assume that a set of 90 normally distributed test scores has a mean of 70 and a standard deviation of 10. Approximately what percentage of cases falls between a z-score of +2 and a z-score of +3?
For making predictions, a test that yields a large negative correlation is
as useful as one with the same sized positive correlation
In high school, a teacher gave two sections of a class the same arithmetic test. The results were as follows: Section I: Mean 45, SD 6.5; Section II: Mean 45, SD 3.1. Which of the following conclusions is correct?
Section I is more variable than Section II.
What percentage of students in a distribution falls between the first and third quartiles?
The distance between -.50 and +.50 in z-score units represents how many stanines?
The coefficient of determination is used to
compare the relative strength of coefficients
90 normally distributed test scores; mean of 70; SD of 10. A T-score of zero (0) would be equivalent to a raw score of
The result of computing a linear correlation coefficient when a somewhat curvilinear relationship exists will be a(n)
underestimation of the relationsship
The score that is duplicated (repeated) the largest number of times in a set of scores is the
The distribution below is (distribution has the hump on the left side)
In a normal distribution, approximately what percentage of scores lies between T-scores of 40-80?
Given the following distribution of scores, which value most accurately represents the median? 83,71,98,83,83,93,87,83,64,83
An r-value of +0.82 can be interpreted to mean that
the two variables tend to be closely related
The median is a point on the scale that divides the
number of scores in halves
In a frequency distribution of 290 scores, the mean is 99 and the median is 86. One would expect this distribution to be
Assume that a set of 90 normally distributed test scores has a mean of 70 and a standard deviation of 10. A percentile of 50 would equal a raw score on the test of
In a negatively skewed distribution this value is located at the lower end
the items refers to the mean
Which of the following correlation coefficients indicates the highest degree of linear relationship?
If the correlation between piano playing ability and proficiency in weight lifting is negative, we would know that if John is a poor piano player, he would likely be
a good weight lifter
One of the disadvantages of a grouped frequency distribution is
that it loses information about individual scores
Given the following distribution of scores, which value most accurately represents the median? 7,3,8,7,7,10
The most dependable derived unit for measuring how far score varies from the mean for the group is called the
A pupils reading achievement score is one standard deviation above the national norm. How could you best interpret this finding to her parents?
she is above average in reading
An individual reported a correlation of 1.25 between form A and form B of an intelligence test. From this coefficient one would conclude that
a mistake has been made in computing the correlation coefficient
Eliminating scores near the middle of a distribution cannot effect this statistic
What would the correlation likely be in the following diagram? x & y scatterplot where the scores are scattered in an upward direction to the right?
To compute a correlation coefficient between traits A and B, one must have
measures of traits A and B on each subject in one group
A T-score of 35 would be the same as a z-score of
The optimal number of intervals in a grouped frequency distribution is
dependent on the way the data are distributed
A percentile rank of 16 is how many standard deviations below the mean in a normal distribution?
May have more than one value in the same distribution
Which of the following fall(s) at a z-score of 0.0 in a normal distribution?
Five students received scores of 10, 12, 14, 16, and 28. The mean of these scores is
none of the above
If a pupil ranks ninth in a class of 30, his percentile rank is
In a normal distribution, what percentage of scores will fall between -1 standard deviation and +1 standard deviation?
Given (picture of a bell curve leaning towards the right with three lines ABC)
If this rather stable value is zero, everyone received the same score
From a measurement point of view, what is the major objection to reporting grades by intervals of one percentage point rather than by five letter grades (A,B,C,D,F,)?
The unit is too small to be judged accurately
There is no value in correcting for guessing on a teacher-made test where pupils are instructed to
answer every item
Which item appears best for a criterion-referenced test?
Chart with Item no.s, pre-test, post-test and difference. 1st three no.s are 84, 86, +2%
What is the discrimination index of this item? Chart with A, B*, D, D, E
Which of the following is indicated by item 2?
Chart with Item 1 and Item 2 and ABCDE. C* is in the chart of Item 2
The first decision made by the test constructor is determining the
purpose of the test
The procedures for item analysis of a norm-referenced test require that both the high group and the low group
contain the same number of pupils
One of the questions on an achievement test yields the following item analysis data…
Upper/Lower Chart with AB*CDE
What is item 2’s difficulty level? Chart with Item 1 and 2 with ABC*D?
Which item is probably too difficult? Chart with %Passing and pre-test, post -test etc.
The item discrimination index is used in test construction to ensure that there is
maximum variation in performance between individuals.
The major reason why care should be taken to ensure that performance assessment instruments should be given adequate front-end or back-end weight for a marking period is that they
require a substantial time and effort commitment
Which is the disadvantage of comparisons based on aptitude?
marks tend to unjustly reward low aptitude students
Which of the following best describes current practice in weighting and combining components of a composite mark? Most teachers…
fail to equate before they weight
Which if any of the following distractors should be replaced?
What is Item 1’s difficulty level?
One of the disadvantages of using a pass/fail grading system is that
it makes interpretation on passing grades difficult
Which of the following would require the greatest amount of teacher time?
Basing marks on student improvement
Which of the following best describes current practice in weighting and combining components of a composite mark? Most teachers
fail to equate before they weight
All other things being equal, which of the following marking practices results in the most reliable grades?
The reliability of grades may be improved by
using five to 15 categories
One of the questions on an achievement test yields the following item analysis data….Chart with upper and lower where B*
(b) + (c)
What is items 2’s discrimination index?
The factor that most seriously limits the vale of grades for certification is that grades
are no comparable between classes and schools
Record keeping is most time consuming when grades are based on comparison with
Which item should probably be eliminated?
Combining achievement and attitude (or effort) in a single mark is poor practice because
the mark will be difficult to interpret
What is the difficulty level of this item? B* 14 upper 10 lower?
Which of the following is a recommendation to help prepare students for a test?
equalize the advantages between test-wise and non test-wise students
Each mark in a course should e assigned and interpreted as
a measure of the level of achievement
What is meant by “a given test item is highly discriminating?”
More high-scoring students than low-scoring students answer it correctly
Which distractor on Item 1 needs revision or elimination? D* B has 4 and 1
Research has indicated that Blankety-Blank Aptitude Test yields a validity co-efficient of about .50 when used to select applicants for algebra classes. Your school used the Blankety Blank as a selection device by eliminating half of the lowest scoring students. Considering only those students who have been admitted into algebra classes, what validity coefficient would you expect for them?
The type of validity that is most appropriate for aptitude tests is
Administering a test in the morning, rather than the afternoon, will cause the reliability of the test to
Which of the following will decrease the standard error of measurement? Decreasing
the standard deviation
A test with a small standard error of measurement relative to the total variance is apt to yield scores which are
An admissions director correlates scores from the SAT with 3rd year GPA. This is an example of
This reliability coefficient is usually over a short time than a long term?
Which type of validity coefficient is most appropriately used for section purposes?
The new IQ test you have devised is administered to a gifted class. Its results are then correlated with end of year grades. Compared with the correlation that would be obtained if it were correlated with grades from regular class students, this correlation would be
The answer is not curvilinear***
The following table lists the scores obtained from five different tests by Seymour and Butch….Gamma highlighted
Is most important for achievement tests such as those constructed by teachers
If a thermometer measured the temperature in an over as 400 degrees F five days in a row when the temperature was actually 397 degrees F, this measuring instrument would be considered quite
reliable but not valid
The following table lists the scores obtained from five different tests by Seymore and Butch. Delta highlighted
The five tests represent IQ tests…For test Beta, the chances are 95 out of 100 that students who have true scores of 100 will receive scores on the test between
Decreasing the time interval between predictor and criterion measures
increases the validity coefficient
A teacher has just computed the reliability of a test she’s made after a single administration. What kind of reliability did she compute?
Requires a time interval for its determination
For which of the following tests would one be most exclusively interested in PREDICTIVE validity?
Comparing test items with objectives refers to which type of validity
Comparing a newly formed anxiety scale with an existing anxiety scale yields this type of validity coefficient
The biggest obstacle to determining a test’s predictive validity is
The purpose for interpreting a test score as a band of scores, rather than a specific value, is to
To build reliability into a test, it is desirable to
write items of various difficulty levels
Which of the following examples is NOT a method of building reliability into a test?
Can be established by correlating odd with even-numbered items
The correlation between test scores and a criterion is a measure of
The following table lists the scores obtained from five different tests by Seymour and Butch. Epsilon highlighted
The five tests represent IQ tests. Which test demonstrates the strongest correlation with socioeconomic status?
The biggest obstacle in determining a test’s predictive validity is
On which of the following (fictitious) tests is content validity most appropriate?
The Beta Achievement Test
The following table lists the scores obtained from five different tests by Seymour and Butch. Alpha highlighted..
You administer the Quick and Dirty personality test on January 16, 1984 and march 1, 1984 to the same group of subjects and correlate the results…estimate of
When two intelligence tests are given to an adult a year or two apart, the difference in individual’s IQ’s or percentile ranks from one test to the other is most likely due to
error of measurement
The index which estimates the extent to which a score obtained on a test approximates the individual’s true score on the test is the
standard error of measurement
Can be established by comparing items with objectives
For speeded tests, the split-halves procedure of determining reliability will usually yield estimates that are
You have devised a new measure called the PITSS Procrastination Inventory for Teachers in Secondary School….
Erroneously adding 5 points to each score on a test will cause its reliability coefficient to
do nothing – the reliability coefficient stays the same
Which of the following will decrease the standard error of measurement? decreasing the
Which type of validity coefficient is most important for personality tests?
Is most important for aptitude tests?
Which type of validity coefficient is most appropriately used for selection purposes?
Given the following information on a test: M= 70, SD = 10, SEmeas = 2. If a student made a score of 81, we would be 68% certain that his true score would be in the range from…
79 to 83
An individual’s score on an achievement test is 75. The standard error of measurement for the test is reported to be five points. What are the chances that the individual’s TRUE SCORE is between 70 and 80?
The number of items on the predictor is cut in half. This
decreases the validity coefficient
What size correlation would be expected between a Standford-Binet given at age 8 and one at age 15?
Which of the following represents the highest degree of relative performance?
A T-score of 72
If the instructions for administering and scoring a standardized achievement test are NOT followed rigidly whe it is administered to pupils
reliability will be affected to an unknown extent
A teacher give s a final examination to see if students have met his course objectives. This calls for the use of
a teacher-made test
One advantage of individual tests of scholastic aptitude is that they provide opportunities for
clinical insights regarding pupil’s typical behavior
It is always advisable to interpret test scores
Projective techniques are, as a group, distinguished from other personality measure in their utilizing
Which of the following need NOT be considered in interpreting standardized test scores
As contrasted with various separate achievement tests in the same subjects and obtained from different test publishers, a survey achievement battery yields derived scores for the same school subjects which are more likely to be
With the passage of IDEIA, the severe aptitude-achievement discrepancy approach to Specific Learning Disability (SLD) eligibility decision-making was
This content was submitted by our community members and reviewed by Essayscollector Team. All content on this page is verified and owned by Essayscollector Team. All comments and user reviews are moderated by Essayscollector Team. In the case of any content-related problem, you can reach us through the report button.