The author defines a formal protection model named k-anonymity to
address the re-identification problem, points out three possible
attacks against k-anonymity, and provides a set of policies which are
used to thwart the attacks. The author also analyzes existing works in
statistical and security communities, showing none of them is able to
provide effective solutions.

This paper is neatly organized, enable the readers to grasp the main
idea more easily. The introduction section explains the
re-identification problem and basic idea behind the k-anonymity, which
is increasing the number of candidates for linking. Then the main body
introduces the k-anonymity model and accompanying polices, in which
each section begins with a summary paragraph. Sufficient examples are
provided, accompanying definitions, lemma and policies, making this
paper more readable.

One of the strengths of this paper is it finds a problem and provides
novel solutions. However, dose the k-anonymity work in practice? It is
more convincing if the author provides sufficient experimental results
as proof. In addition, the accompanying policies are heuristic and
there might exists other kinds of attacks that the three policies can
not thwart. These problems should be addressed in future
works. Another possibility is extending k-anonymity to more complex
data model.

=============================================================================

 What are the contributions of the paper?

This paper tries to develop an approach that allows to reveal data
relevant enough for statistical purposes but that is unsufficient to
violate privacy.  It gives a guarantee of the certain user-chosed
level of anonymity in the revealed data.  The papers aims to provide a
model for understanding, evaluating and constructing of computational
systems that control inferences in the revealed data.

 What is the quality of the presentation?

The general structure of the papers is excellent. Flow of ideas is
clear and logical.

 What are the strengths of the paper?

Good quality of the presentation and style (although there are some
informalities like "Let me ...").  Innovative approach that might lead
to a breakthrough in the area.  Major assumption, which is possibility
to accurately identify quasi-identifiers, explicitly stated, although
the phrase "it is believed that these attributes can be easily
identified by the data holder" is strange unless we assume that some
general external guidelines on this would be developed.  Analysis of
possible attacks against k-anonymity is made.

 What are its weaknesses?

Although the first part of the paper (introduction / motivation +
background) is too long for non-introduction paper in the field.  Some
notions such as the ones from relational database area are redudant.
Formal definitions looks too cumbersome and redudant as well while
there is a lack of intuitions sometimes.  Extremely scarce conclusion.
No future directions of work are proposed as the author does not
consider the topic promising enough.  No attention to k-anonymity
invariant maintaining while information in database modifying.  The
approach may drop some information needed for statistical purposes
while anonymizing the data.  The source of the information may
implicitly reveal additional info.  Ex.: database of the patients in
the specific hostipal gives a lot of locality info even without
providing of ID.

 What is some possible future work?

Consider the situation when different entities reveales information
with different levels of k-anonymity.  Consider how the k-anonymity
invariant can be supported effectively while adding new or removing
old records to/from the database.  Developing of the framework or
guidelines for implementing this idea on practise (the idea makes
sense only if it is established as strict guidelines for developing
data-collectioning systems).

 Some ideas that came to my mind while reading the paper.

An one-time likage of the sensitive information is very possible and
very hard to disguise. Personal info does not change to often, thus
likage with the rate of once in several years may make a lot of hard
to the personal privacy.

=============================================================================

What are the contributions of the paper?

The paper has the following main contributions:

1. It brings up the attention of potential privacy violations caused
by releasing person-specific data from the data holders.

2. It presents a formal protection model named K-Anonymity which can
prevent the re-identifying problems of the released data.

3. It introduces a set of accompanying policies that can prevent
re-identification attacks on the data released based on the
k-anonymity model.

------------------------------------------------------------------------------------------
What is the quality of the presentation?

The quality of the paper presentation is good for the following
reasons:

1. The structure of the paper is very well-organized so that it
presents the sequence of the topics in a very logical way:
 
  Problems investigation-> solutions --> solution evaluations -->
  improvements

2. The concept and model are clearly and precisely represented by
using formal definitions, tables and examples, so that it makes
themselves easier to be understood.

------------------------------------------------------------------------------------------
What are the strengths of the paper?

1. It is a fine paper for both technical and non-technical
audiences. ie. it uses both the high level(such as concrete real life
examples) and technical (such as formal definitions and symbols) to
present the problems and solutions.

2. The paper does not only present the concept and use of the
k-Anonymity model very clearly but also very precisely by discussing
the following: - Assumptions and constraints - Exceptional cases -
Further improvement

------------------------------------------------------------------------------------------
What are its weaknesses?  It would be more convincible if the paper
demonstrates the benefits of the k-Anonymity model by direct
comparisons between the model with other solutions in areas such as
attack vulnerability and data structure difference.

It would be more precise if the paper could use formal mathematical
proofs to convince the technical audiences that it is indeed
infeasible for re-identifying when the K-Anonymity model is being used
under general conditions. The mathematical proofs may include the cost
and complexity analysis of the possible re-identifying
attacks. ------------------------------------------------------------------------------------------
What is some possible future work?

1. Other types of possible attacks against the k-Anonymity model would
be examined and possible solutions that will prevent these attacks can
be investigated.

2. An algorithm needs to be developed to implement the k-Anonymity
model and its performance (such as running cost and complexity) need
to be evaluated when it does the following: - filtering data when data
is generated - comparing data when the number of occurrences is
checked - soring data when prevent possible re-identifying attacks

=============================================================================

Contribution: Sometimes private and confidential information can be
inadvertently disclosed when the holder of data from different sources
can draw inferences by joining the data from all sources.  The
author's model attempts to create circumstances where such inferences
cannot be drawn because a data record is linked to at least k other
persons.

Quality: I liked her style of presentation.  The author's writing is
clear, concise, and intuitive.

Strengths: The author presents a very interesting and
difficult-to-solve problem in the area of personal privacy.  She
presents not only the strengths of her model, but admits to the
possible attacks which can be made against it.  The author discusses
the possible defences against those attacks.

Weaknesses: The use of the first-person singular has given the paper a
"folksy" tone.  The paper didn't sound as professional as it could
have sounded.  Although she is the sole author, it is unlikely that
she did the work without any other assistance.  The formula on page 7,
definition 2, need further clarification.  It's a trifle difficult to
understand.

Future Work: Using the k-anonymity model or real data as opposed to
the toy examples in the paper would be the next step in testing the
author's model to see if it works in practise.

=============================================================================

Summary/Contributions: - Main contribution is the k-anonymity
requirement, a way of being ensuring that released person-specific
data cannot be associated to an individual. Focus is to protect the
identity of the people who are the subjects of the data - Four
possible attacks are presented

Strengths & Quality of Presentation: - Identifies a very real problem,
and offers a solution - Inferences are an easily overlooked attack
vector, this paper increases awareness - Running examples make it
easier to follow, even for non-database people.

Weaknesses/Future Possible Work:

- very weak security guarantee/argument; "this paper seeks to
primarily protect against known attacks", i.e. the four attacks the
author was able to think of.  Future work must strengthen this
argument, if possible provide some sort of guarantee.

- The Assumption.  Without some assumption, this problem is likely
untractable.  However, imagine the optimal case, where the data holder
has complete knowledge of all the external data available at the time
of release.  She releases the data, then two weeks later an external
source is released to compromise her release. Even in the optimal case
under this assumption, there is little hope of the desired outcome.
Future work must work to relax and remove assumptions if this is to be
practical.  This may be "show-stopper" since the data holder has no
control over external information.

- Paper assumes a proper quasi-identifier is identified, how difficult
is this problem? Is this a reasonable assumption?  I think that
determining the quasi-identifier posses a significant challenge, and
may not even be possible.  This assumption is necessary for the paper
to proceed, but approach is not practical until this problem is
solved.  I suppose the data holder could always take the complete set
of attributes as the quasi identifier.

- Formal definition of quasi-identifier is obfustcated by the use of
poorly defined functions f_c and f_g.  Meaning only becomes clear when
considered with the example.

- It is o.k. to consider only the case of protecting someone's
identity, but it would have been nice to know how else this could be
applied.

- The idea of "adding noise to the data while still maintaing some
statistical invariant" was dismissed with only one small reason.
Could this be used in conjunction with k-anonymity?  In cases where
k-anonymity is not possible (say only k-2 is possible), could noise be
used to reach k-anonymity?

- The "Lemma" on page 9 is merely a restatement of Definition 3.  It
dosen't get used to prove anything later in the paper either.  It
should be omitted.  Also, examples 3 and 4 are highly redundant.

=============================================================================

In this paper the author does an excellent job of introducing the idea
of k-anonymity by not only defining it precisely but clarifying it
with table based examples which required no previous background
knowledge of the various types of "joins" as considered in database
query languages. However, I believe that the concept of
Quaisi-Indentifiers was not well explained and the importance of
identifying them precisely was not done justice to. Specifically, the
author makes the assumption that all quasi-identifies will be known to
the person disclosing the information which I believe is a very
unrealistic assumption.

Lastly I think that the attacks described against k-anonymity in the
paper were perhaps the tip of the ice-berg. There are various other
attacks on k-anonymity that are presented in literature. A good
example is the paper ["l-diversity: Privacy beyond k-anonymity",
Machanavajjhala, Gehrke, Kifer] and it illustrates with "Homogenity
attack" and the "Background Information Attack" that k-anonymity does
not imply privacy.

=============================================================================

The author developed a information quasi-suppressing technic on
quasi-identifiers stated in his quasi-academic paper. The paper
addresses the issue of releasing massive amount of person-specific
data while maintaining individual privacy. Hence making the subjects
of the data anonymous is the goal. k-anonymity, a protection model
devised by the author, managed to reach this goal as well as retain
data usefulness. The author explained his idea in fluent English but
lack of clear perception. The first question that comes to mind is
that how the algorithm works in terms of modifying sequences of values
corresponding to the attributes in quasi-identifiers so that
k-anonymity is achieved. What is the trade-off between the
modification of the data (in order to satisfy k-anonymity) and the
usefulness of the data? I personally feel that properly recognizing
quasi-identifiers in a reasonable amount of time is of great
importance. However, the paper does little elaboration on this
aspect. Could this be a possible future work or someone might has
already come up with a suggestion. One neat thing about the paper is
that the author foresaw the possible attacks against k-anonymity and
provided corresponding solutions.

=============================================================================

What are the contributions of the paper?

-Overview of other models of anonymity.  -Overview of possible attacks
on anonymity from released data.  -Proposed a new model to provide
k-anonymity on releasing data.  -Proposed possible attacks on the
model and how it could be addressed.


What is the quality of the presentation?  -Very clear. Simple and
right to the point.


What are the strengths of the paper?  -Like the above points
mentioned. Very simple to understand, right to the point.


What are its weaknesses?

 I found a slight problem with the lack of statistics shown.  It would
 be great to have a bit of results that back up the idea before the
 conclusion.

 Also, about the first example stating 87% of the population can be
 identified from linking medical data and voter data. It seems like a
 very astonishing result at first. But given the Zip code, birth-date,
 as well as sex was in the list, the result doesn't seem too alarming.


What is some possible future work?

-More empirical results.  -Test on how much the level of anonymity is
lowered with different Quasi- identifier released in real world data.
-In general, implement the idea, test on more real life data and
publish those results.

=============================================================================

Q: What are the contributions of the paper? A: The contribution of
this paper is that it proposes a k-anonymity model, which can be by
used private data holer to mitigate re-identification attacks on
released information without undermining the usefulness of the
released data. In summary, by altering the released information to map
to many possible people can effectively throttle this kind of
attack. "The greater the number of candidates provided, the more
ambiguous the linking, and therefore, the more anonymous the data."

Q: What is the quality of the presentation? A: The paper has a well
organized overall structure. It first analyzes the current posing
attacks on released information. And then, it proposes and elaborates
the k-anonymity model. Finally, it paper ends with some examples of
potential attacks to the proposed k-anonymity model and the proposed
solutions to each of the attacks.

Q: What are the strengths of the paper? A: This paper is good at its
analysis of current potential re-identification attacks on released
information. Also, the representation of k-ananymity model uses
mathematical notation which is very formal and accurate. Finally, the
use of some examples in this paper to illustrate some concepts are
very effective.

Q: What are its weaknesses? A: This paper fails to mention the steps
of how to convert/create a k-anonymity table on the basis of a private
data table which is not k- anonymity.

Q: What is some possible future work? A: More work may be put in the
research on how to model an efficient procedure to create a
k-anonymity table from a non-k-anonymity private table.

=============================================================================

   * What are the contributions of the paper?

It gives a formal method(k-anonymity) that protects privacy against
inference from linking to known external sources.  This method can
provide a guard against re-identifying individuals if the data holder
follows the policies (mentioned in attack examples) to release the
data.

   * What is the quality of the paper?

The paper explains the definition of K-Anonymity deeply and
accurately. It uses enough examples to describe the K-Anonymity, which
clarify the quality and accuracy of this new method. Also, it gives
some example of different exiting attack to the method and gives
solutions to each of them.

   * What are the strengths of the paper?

The structure of this paper is well formed. It gave the brief
descriptions to the existing works (statistical database and
multi-level database) and listed their weakness. Then, give the
k-anonymity method.  In this way, it highlights the importance and
success of this method.

   * What are its weaknesses

In the abstraction, the paper mentions there are “a set of
accompanying policies for deployment”.  But I did not find them. I
think the author should list them in a separated section.

The weakness of this method is k occurrences of the same sequence of
data in the table. In many situations, the data are not always
duplicated.

The paper did not mention any future work.

   * What is some possible future work?

1) As the author said in paper “The greater the number of candidates
provided, the more ambiguous the linking and therefore, the more
anonymous the data.”, can we find another way base on k-anonymity
that does not require huge number of candidates and still can give the
same quality of protecting the privacy? Maybe we can combine it with
multi-level database.

2) The whole assumption is based on the assumption that the data
holder can accurately identify quasi-identifier.  Can we improve the
k-anonymity when we restrict the assumption that the holder only knows
some part of quasi-identifier?

=============================================================================

What are the contributions of the paper?

Person-specific data becomes more and more valuable to industry and
academic research. Privacy protection while releasing such data
becomes an important concern. In the author's 1998 paper [1], she
proposed a model named k-anonymity and associated policies to ensure
that the information corresponding to one person can not be identified
from other k-1 individuals on the same release. In this paper, the
author discussed several potential attacks against k-anonymity and new
policies to defeat such attacks.

What is the quality of the presentation?

I think the quality of the presentation is ok. The whole paper is easy
to understand. The problem is well motivated. The examples are
especially good to demonstrate the problems. However, I feel this is
not a serious paper in that the contribution of this paper is not so
clear and the conclusion is too short to cover the main points of this
paper.

What are the strengths of the paper?

I really like section 4, which discusses several potential attacks
against k-anonymity and possible solutions.

What are its weaknesses?

The paper itself is not self-contained in that it tells little about
the actual algorithms and policies should be used to generate released
data satisfying k-anonymity property. The author comments that "this
paper significantly amends and substantially expends the earlier
paper...", but to me, the only obvious extension is the discussions on
possible attacks against k-anonymity, which itself may or may not has
sufficient contributions as a new paper. At least for the first three
sections, I see no fundamental difference from the author's 1998
paper. I am also curious why the author's 1998 paper end up
unpublished.

What is some possible future work?

The degree of protection of k-anonymity really depends on the correct
selection of quasi-identifiers, which further depends on the data
holder's ability to identify those attributes that could leak personal
identity information and depends on the information that the receiver
of released data already has. The rules or policies to identify these
quasi-identifiers may be worth for future research work. k-anonymity
works by making all information that could uniquely identify a person
fuzzy, while the techniques used in statistical database usually add
noise to information while maintaining statistical invariant. In
general, the former approach is strong to known attacks while the
latter is better to unknown attacks. It might be possible to combine
these two approaches in some way, to provide better protection to both
known and unknown attacks with less data distortion.


Reference:

[1] P. Samarati and L. Sweeney, "Protecting privacy when disclosing
information: k-anonymity and its enforcement through generalization
and suppression", submitted to IEEE Security and Privacy 1998,
unpublished.

=============================================================================


What are the contributions of the paper? It provided a model for
protecting privacy that made some amelioration of contemporary
methods. We are no longer need to insert some noisy data or change the
structure of database to improve the level of the protection of data
privacy. And the paper also shows us a concept to use original data
frame to improve the protection.
 
What is the quality of the presentation? It’s good. The presentation
of the model includes definitions, explanations and some examples. All
that makes us easy to understand the meaning and the author’s
intention.

What are the strengths of the paper? Since there is only a model and
do not have some practical achievement, the strength of the paper is
not so strong. However, the concept in my view is good.

What are its weaknesses? 1.It takes too long time to get into the main
topic.  2.There is no mathematic method to prove the feasibility and
efficiency of the model.  3.There is no consideration of how to make
use of the model

What is some possible future work? The most important thing is to
improve the model is more suitable for protect privacy

=============================================================================

This paper introduces a method of releasing a table of person-specific
anonymized data in such a way that it is not possible to determine
which entries correspond to a specific person. Studies have shown that
87% of the US population can likely be identified given only a table
containing just zipcode, birthdate, and gender, a set of data which
may appear to grant a sufficient amount of anonymization. Tables
released using the author's scheme guarantee that any given entry
cannot be distinguished from atleast k others in the table, thus
allowing control over the degree to which data can be linked to a
specific person.

The author provides several examples of releases of information that
have been thought to be sufficiently anonymized, but in fact are able
to be linked using other publically available information. In one
specific case, the author combined a voter list with medical data
released by a hospital to re-identify the entry corresponding to the
Governor of Massachusetts. Other examples include releases of the same
Government document censored by different departments.  These examples
clearly illustrate the motivation for researching the proper
anonymization of data. A summary of previous work in the field, and
summaries of potential attacks against k-anonymity and their solutions
are also given.

While k-anonymity provides a nice theoretical model, a real-world
implementation is made difficult because of an assumption made on what
external data is available. To effectively implement k-anonymity, one
must be aware of which portions of the private data are publically
available and could be used in identifying subjects in the anonymized
table. In practice, this is extremely difficult, and requires
information release policies (which are not addressed in the paper) to
be in effect. In addition, information released using k-anonymity may
be compromised if other identifying factors are made public in the
future.  In addition, k-anonymity cannot account for data that can be
inferred from other sources. For example, if medical diagnosis were
released under 2-anonymity, then an entry may be linked to exactly 2
identifiable persons. If the diagnosis was obesity, this could be
possibly be used to link the data to a specific person based on
physical characteristics. Future work in the area should address these
problems.

=============================================================================


The paper entitled "k-Anonymity: A Model For Protecting Privacy" by
Latanya Sweeney is an intriguing and well organized report.  It
discusses a model for protecting the privacy of person specific, field
structured data found from data holders such as hospitals and banks.
Sweeney presents examples that show how anyone with access to public
information lists, such as voter lists, can link to sensitive and
private data, including medical records.  This leaves the reader more
interested in the problem that this paper explores since it seems to
be a real life issue that should be solved.
 
The introduction is well presented as it is clear and easy to
understand.  The example of finding out some private information about
the past Governor of Massachusetts catches the attention of the reader
and encourages him/her to read on with interest.  Sweeney researches
well into this problem as she cites many past and current work on the
problem.
 
Sweeney also touches on the related area of multi-level databases
where the primary technique used to control the flow of sensitive
information is suppression.  This strengthens the purpose of why a new
method is needed for this area of research since suppression can
reduce the quality of the data and perhaps "rendering the data
practically useless" for the purposes of research.  The author also
discusses that computer security is not sufficient for this particular
problem of privacy since she argues that we must be aware of what
values will constitute a possible leak of information.  This once
again strengthens the purpose for this type of model for protecting
privacy.
 
This paper is covered with several definitions followed by helpful
examples.  This paper is easy to understand and easy to follow because
of these definitions and examples.  The author also sets up this paper
so that the actual definition of k-Anonymity is about 3/4 into the
paper.  This allows the reader to fully understand all of the
background information before leading into the main topic.
k-Anonymity is thus explained well and easy to understand at this
point.
 
In my opinion, once the k-Anonymity related attacks were described,
the paper weakened in substance.  I was left with many questions and
also felt that the examples of the attacks could be better.
 
I was left with the following questions/comments:

 - I am a little concerned with why the values of certain properties,
 in the examples with tables, were changed to more general values
 (such as from "male" or "female" to "human").  Does this not degrade
 the information which would lead to the same argument of suppressing
 data?
 
 - The examples used showed simple tables.  Larger and more
 complicated tables lead to more complex solutions.  Is this model
 scalable?
 
 - This model still leaves some vulnerablilty in that k individuals
 may be pointed out.  This means that a small k is not good.  What
 would be a "normal" sized k?  Is this good enough?
 
 - It seems like there is still alot more work needed in this research
 area.

=============================================================================

  What are the contributions of the paper?  Many large databases
  contain confidential data that can be of great use to
  humanity. Naturally, privacy is important, but even more so, in the
  context of medical data, there are privacy laws upon doctors that
  can be legally liable for breeches of confidentially. Such doctors
  would thus be less enthusiastic about releasing useful information
  into the public domain without the express consent of patients, and
  obtaining individual consent would be certainly time consuming.
  This paper presents a means to provide large amounts of useful
  information in a way that ensures that information can be released
  without violating privacy. Specifically, information is presented in
  such a way that for any indentifiable attribute, or set of
  attributes that can be cross linked to existing public data for
  identification (called a quasi-identifier) matches to at least k
  other entries in the data set. K is a parameter that controls the
  degree to which the data is private, where there is assured a 1 in k
  chance of guessing randomly based on cross listing a database with
  personal names to the information that is presented.

     What is the quality of the presentation?

* Figure 3 is poorly placed. Since all the figures use the same
* example to illustrate different aspects, the fact that table three
* appears in section three makes the reader confused about its
* significations since it blends nicely with the other sections. its
* not until the bottom of the page 3 paragraphs later when figure
* three is referenced. Also in figure 3 the construction of GT1,GT2
* from PT is not mentioned, that is, erasing race and replacing with
* person, and especially wiping the last digit of the zip code, could
* be illustrated with a bold type over the corrected
* component. Additionally, the data is race column: white, black,
* asian, person; is not uniform. that is, some races are identified by
* a colour and others by their anthropological title.  Figure 2,3,4
* all represent the same sort of data, that is, tuples in a database,
* but they all have different formatting: two has a bottom/right
* shadow and intercolumn and interow lines and the head is very
* lightly shaded; three is without interrow or intercolumn lines and
* the header is unshaded; 4 has interrow and intercolumn lines with
* darkly shaded headers; it would have been wise to use the same
* standard for all these similar tables.  Figure 1 illustrates the
* union of two sets. However, the none intersection text is light and
* harder to read. it would have been nicer to bold the intersection
* instead of fade the non-intersection The horizontal rule on the top
* of page 14 is likely not needed

     What are the strengths of the paper?

* Identifies that there is already a problem with the means that
* confidential data is released into the public domain, that is anyone
* could take a list of zip/birthday/gender information and cross link
* it to the public voter registry to resolve 87% of Americans. Since
* medical information is already released with this information, this
* paper illustrates an existing major leak in privacy, thus providing
* an urgency to the problem it discusses.  Models a real world
* application of appropriate privacy laws restraining good for
* humanity.  Model of having each row correspond to a single person,
* with some collection of attributes forming a means to identify them
* and having other *Implementation of the model is useful. Often data
* collected cannot influence the intended data mining, for instance,
* birth year, i.e. age, may be useful to scientists, but birthday,
* i.e. zodiac sign, is only useful to astrologists. Such data can then
* be removed increasing the number of people who share similar
* characteristics. Therefore, the solution it presents is useful, and
* also not all columns need to be removed, merely that some entries
* must have their specificity reduced to ensure that it has more
* matches.

     What are its weaknesses?  *Does not mention that some data can be
     reduced to a "wild card", (however it does illustrate this in an
     example in figure 3 where the entire column is reduced to a base
     wild card). For instance, by setting 4 entries race to "person",
     then when matching only on race we know that there will be at
     least 4-anonymity in the results, because the person results will
     match to both categories. By taking a only k results and removing
     all the identifiers and setting them all to wild cards, we can
     have that already a lower bound of k-anonymity.  *The solution it
     suggests to ensure data is sufficiently duplicated is to remove
     information from some columns, such as removing race information
     for some participants, or removing zip-code information.  This
     may remove useful data from the analysis instead of ensuring that
     the sample size for each match is naturally greater than or equal
     to k.

* When a dataset is released into the public domain, the method of
* removing data from some of the quasi-identifier columns is fixed,
* and all future updates or releases are constrained by this original,
* and already released data or poorly planned released data have
* already imposed lifelong constraints, without any suggestions on how
* such data columns should be constrained.

* It does not mention the following weakness: A set of quasi
* identifiers may match at least k records in the data table. However,
* if all k records have the same associated data, i.e. for medical
* history they all have the same symptoms and diseases, then if you
* know someone is on the list, you can match their partial result to
* find k people who have the same diseases and conclude that the
* person you were matching has the disease, an information leak.

* No algorithm was presented to create a table of k-anonymity from a
* data set and a list of quasi identifiers.

    What is some possible future work?

* Perhaps there is a method to isomorph the data into another dataset
* that retains all statistical information, but the inverse
* isomorphism is computationally hard to compute. This way the privacy
* is guarantied by mathematically hard problems instead of by removing
* certain aspects of data. It would also allow updating without worry
* of cross linking the old set, but the difference of the two
* isomorphed sets may leak data.  Such an isomorphism may involve
* probability, where a partial unit of one disease is spread over a
* variety of people such that the end it will all weight and sum to
* the same result, however an observer cannot determine what
* specifically each person has even if they were to link their names.

*Each value has levels of specificity. for instance, to specify
address they could specify country, province, city, street,
etc. Perhaps an algorithm can look at the number of people that match
a quasi identifier at one level, and if the projection is below the
thresh hold k, then the data generator would bump them up to the next
level of generality. There could be different nodes in a tree that
branch into different specific values on lower levels, and each tuple
that is set to a higher node must be considered as a candidate for any
of the nodes beneath it. An algorithm that sets tuple data to the
appropriate level can thus ensure that a minimal amount of reduction
in specificity occurs when setting the levels of the quasi identifiers
in each tuple.

* Perhaps there is a way to provide the information in a method that
* permits people to analyse components of it without compromising
* privacy.  Such a system would interpret individual requests and
* limit the output to that which would not compromise privacy with
* regards to the other requests that have been made. Such a system may
* allow people to run analysis methods on the entire data system and
* the system would only return results instead of raw data, and the
* results are ensured to protect security. This way the person who
* wishes to examine the data provides an algorithm to run upon the
* data, and the data server executes it after ensuring its not a data
* leak.

* A method to revoke data that has been provided, or to change the
* methodology of release in case of poorly designed released patterns
* that lack the usefulness required of the full system.

=============================================================================

This paper have presented the k-anonymity protection model, which
alters the released information to map to many possible people (at
least k) in order to thwart linking attack. The author also have
explored related attacks to this method and provided ways in which
these attacks can be thwarted.  The presentation of this paper is not
very good. The author uses half of this paper on background and
related research before talking about the "real stuff". Moreover, the
presentation is not very clear sometimes. For example, the symbol PT
appears before its definition, which may confuse readers. No possible
future work is mentioned in this paper, and I think in order to prove
that the model introduced in this paper is really good, the author
should do more comparison between this model and other known methods.
The model presented in this paper can do a good job in privacy
preserving, the larger k is the better privacy is preserved. The
author has done a good job exploring related attacks and providing
ways in which these attacks can be thwarted. These efforts help make
the model more applicable in practice.  However, if k is too small,
little privacy can be preserved. This doesn't mean the larger k is the
better the model is. Too much anonymity may make the data useless. In
fact, even when k is very small, the usability of the data can be
badly damaged with this model, because the data that is most useful
may not be involved in "QI" and thus may not be released according to
this model.  Some survey can be done for future work to find out what
is a proper k in most people's eyes and measure the usability of this
model

=============================================================================

The main contribution of this paper is a model for protecting personal
information when data holders release data. More precisely, the model
makes individual information contained in the released data
undistinguishable from other k-1 individuals which also contain data
in the release. Another interesting contribution of this paper is the
attacks than can be performed against k-Anonymity (Unsorted matching
attack, Complementary release attack, and Temporal attack). In
general, the paper is very well presented, but the author should be
careful when using abbreviations. For example, in page 8, the term PT
is used, but without mentioning what is means. I would also suggest
the author to be more detailed when defining a "Quasi-identifier",
because this definition is intensively used in the rest of the
paper. Presenting several examples was also interesting, making better
the understanding of the paper.

The main strength of this paper is that it presents a comprehensive
study of privacy protection in data releases. If k-anonymity is used
in a real-world application, it is also to prevent some attacks by
taking this paper as reference. In my opinion, a weakness of this
paper is that is should consider a real-world (and big) database, and
show how to use k-anonymity to prevent data disclosure. This way, the
author could show how to create a search space of size k, so that it
is hard (or even infeasible) to find a person from his/her data.

Finally, some future work that I would consider is how to efficiently
identify Quasi-identifiers in a huge database. In small tables this
can be easily done, just by inspecting the table. However, when the
database is quite big, one would need some methods and tools to
perform this efficiently and still guarantee that such a set of
Quasi-identifiers will not lead to a data disclosure. Another
interesting future work would determine how big k must be so that it
becomes harder (or even infeasible) to find someone from the released
data. From this, two questions arise (and should be answered in a
future work): Would it be necessary to adopt safety margins? Would it
be interesting to consider levels of anonymity?

=============================================================================

Due to an exponential growth in the number/variety of data collections
containing person-specific information and the demand to ultimately
release this data for research purposes, there is a strong need to
protect the privacy of the individuals involved. This paper presents a
very simple yet effective model to address this problem without
compromising the usefulness of the data itself.

The author very systematically proceeds by first identifying the
issue, providing a concrete example of how an individual can be
re-identified from the supposedly anonymous data by linking it with
easily available external data sources. She then establishes the
scope/need of the work by laying down similar work being done in other
areas, briefly pointing out their shortcomings. Next the author moves
on to present the actual ‘k-anonymity’ model providing the
definitions needed to comprehend the work along the way. Finally,
three possible attacks on the model and ways of getting around them
are mentioned.

The model is very intuitive and easily understandable. The paper
proceeds in more or less plain English without involving many of the
complex jargons, providing examples at each step for further
clarification. But at times author has made deliberate attempts to
keep the discussion as simple as possible which leaves an impression
of incompleteness on the readers mind. While listing the risks and
difficulties involved on the part of data holder the author refers to
contracts and policies that can provide complementary protection but
doesn’t provide any example of what they might look like. The
examples provided are very trivial and limited to protecting a
person's identity. Furthermore the author has made extensive
assumptions that make the solution work only for known attacks also
identifying the Quasi-identifier itself in real world data can pose
serious challenges.

Future extensions to this work could include making the model more
robust with fewer assumptions made on the part of data holder. For
example ‘k-anonymity’works only under the assumption that the
data-holder is able to predict with a reasonable accuracy about other
external data sources and the attributes present that could be used to
potentially link with the released data to re-identify sensitive
information. What if this assumption didn’t hold? Another way to
improve the model might be to identify more attacks that can be
launched against it and then incorporating measures to prevent them.

=============================================================================

# What are the contributions of the paper?  The paper contributes a
# model for protecting privacy in sharing, exchanging personal data
# between agents, organizations,.. It also contributes a framework for
# working on algorithms and systems in releasing information without
# revealing properties of the entities that are to be protected.  What
# is the quality of the presentation?  The presentation is clear and
# well-organized.  What are the strengths of the paper? The
# k-anonymity model is straightforward and effective The paper
# fingures out some good attacks on the k-anonymity model and also
# suggests appropriate solutions to those attacks.  What are its
# weaknesses?  The paper claims that the k-anomynity model prevents
# individual re-identified while the data remain practically useful,
# however the usefulness of the data really depends on the context,
# then in a specific situation, the k-anonymity model might destroy
# the usefulness of the data.  The k-anonymity model is not proven
# totally secure and there still exists possible another attack on it.
# What is some possible future work? We might want to know the
# trade-off in practice between the k-anonymity model and the
# usefulness of the data that adheres to the model.  The k-anomynity
# model prevents only data linking attacks, we might need to develop a
# different model or a model based on the k-anomynity model to prevent
# more kind of attacks.  We might want to know if it is possible to
# apply the k-anonymity model in computer security in general to
# protect personal privacy ?

=============================================================================

*  What are the contributions of the paper?  This paper addresses the
*  problem of releasing the person-specific data for scientific
*  research without compromising on the privacy of the individuals,
*  which are the subjects in the released data. It provides formal
*  protection model, k-anonymity, in which information of a subject
*  can not be distinguished from k-1 other subjects in a release. It
*  also deals with some know attacks on the anonymizing systems,
*  specifically against introduced k-Anonymity model.

* What is the quality of the presentation?  Paper is informative and
* explores background of the problem with real life examples. Figures
* are added to give the visual information. But, Figure 1 "Linking to
* re-identify data" is unnecessary. Also, few rudimentary definitions
* like Attributes could have avoided.

* What are the strengths of the paper?

It provides a formal way to achieve a balance between data release and
privacy concern by avoiding data released to be useless at the same
time providing user defined anonymity level.

* What are its weaknesses?  Implementation techniques for k-anonymity
* are missing. Also, only few known attacks are considered. With
* increases value of k, the technique might become verse than older
* results, which author refers to. A practical example to k-anonymity
* system could have strengthen the claim.

* What is some possible future work?  Practical implementation of the
* system to strengthen the claim is necessary. Also, automatic
* identification of

=============================================================================

What are the contributions of the paper?

The paper's primary contribution is the k-anonymity method for
anonymizing sensitive, publicly-released data. It provides a formal
description of the method and examples of its use. It also enumerates
several possible attacks that could be used to infer sensitive data
from k-anonymous releases and provides additional procedures to follow
to ensure that these attacks cannot be used. In addition, the possible
attacks identified could be useful as a basis for evaluating other
protection models.

What is the quality of the presentation?

The presentation of the information in the paper is of high
quality. The motivation, background and the method itself are all
explained clearly. In particular, the method is described both in
precise mathematical notation, useful for theoretical analysis, and in
ordinary language. Concrete examples are used to good effect
throughout the paper, making it all the more easy to understand the
definition and applications of the model.

What are the strengths of the paper?

The method provided by the paper is simple but effective, and can
allow the release of data that is reasonably anonymous but still
useful for analysis. It is presented clearly and convincingly. The use
of examples, for motivation and demonstration, is particularly
effective for explaining the model.

What are its weaknesses?

While the k-anonymity method is theoretically sound, there are a few
assumptions made in the paper that may reduce its effectiveness in
practical situations. In particular, the quasi-identifier, all the
attributes in the private information that could be used for linking
with external data, must be identified by the data holder, which may
not always be possible or practical.

What is some possible future work?

Since the anonymity guarantees provided by the model rely on the
identification of the quasi-identifier, future work could explore
methods for making this task easier and more practical for the data
holders. Similarly, since the method presented protects against known
attacks, more work could be done to identify potential avenues for
attacks. Also, research could be carried on real data released using
this model to determine whether it is effective in practice by
searching for unanticipated linkage attacks.

=============================================================================

The paper proposes a formal protection model, k-anonymity, for
constructing and evaluating systems where private data is
protected. The framework provides guarantee on the anonymity of data
when stated assumption is satisfied.  The paper also address realistic
attacks that cannot be protected by such model.

The quality of the presentation is clear and concise, except table
naming gets a little confusing in Section 4.

The assumption of identifying quasi-identifier can be a little too
strong in practice as each data source may not have enough knowledge
about all other available data to accurately identify all
quasi-identifiers. As mentioned in the paper, the model still exhibits
certain degrees of vulnerability. As a future work, it would be
interesting to extend the current model to capture characteristics of
the attacks mentioned so we can provide better guarantee on anonymity
of private data.

=============================================================================

The author has proposed a k-anonymity model to provide a
person-specific protection from re-identifying an individual when a
data holder, such as hospital or insurance company, released data for
pleasant reasons. First, the author defined quasi-identifier which is
that the sequence of values in a group of data appears at least k
occurrence which is available for attackers. Lastly, the author
described different attacks scenario against k-anonymity.

The author briefly discussed the previous work and then provided a
formal framework for constructing the system of k-anonymity
model. After that the author gave several attacks against the
k-anonymity. This presentation seems to me as high-quality
presentation. One of the weaknesses I can see is that the author
motioned the tubular privacy protection.