Skip to content

Commit

Permalink
Merge pull request #25 from vgherard/dev
Browse files Browse the repository at this point in the history
Fix for CRAN resubmission v0.1.5
  • Loading branch information
vgherard authored Apr 17, 2023
2 parents e235810 + 911ac65 commit 33af34f
Show file tree
Hide file tree
Showing 7 changed files with 66 additions and 60 deletions.
1 change: 1 addition & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ Doxyfile
^CRAN-RELEASE$
^LICENSE$
^\.scribblr$
^CRAN-SUBMISSION$
6 changes: 3 additions & 3 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
Package: kgrams
Title: Classical k-gram Language Models
Version: 0.1.4
Version: 0.1.5
Authors@R:
person(given = "Valerio",
family = "Gherardi",
role = c("aut", "cre"),
email = "[email protected]",
comment = c(ORCID = "0000-0002-8215-3013"))
Description:
Tools for training and evaluating k-gram language models in R,
Training and evaluating k-gram language models in R,
supporting several probability smoothing techniques,
perplexity computations, random text generation and more.
License: GPL (>= 3)
Encoding: UTF-8
LazyData: true
Roxygen: list(markdown = TRUE, roclets = c ("namespace", "rd"))
RoxygenNote: 7.2.1
RoxygenNote: 7.2.3
LinkingTo:
Rcpp, RcppProgress
Imports:
Expand Down
6 changes: 6 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# kgrams 0.1.5

* Removed "Tools for..." at the beginning of package DESCRIPTION, as per CRAN's
request.
* Simplified examples in `?kgram_freqs`.

# kgrams 0.1.4

* Updated `R` requirements `3.5 -> 4.0`.
Expand Down
29 changes: 2 additions & 27 deletions R/kgram_freqs.R
Original file line number Diff line number Diff line change
Expand Up @@ -157,38 +157,13 @@
#' # Build a k-gram frequency table from a file connection
#'
#' \dontrun{
#' f <- kgram_freqs(file("myfile.txt"), 3)
#' f <- kgram_freqs(file("my_text_file.txt"), 3)
#' }
#'
#'
#' # Build a k-gram frequency table from an URL connection
#' \dontrun{
#' ### Shakespeare's "Much Ado About Nothing" (entire play)
#' con <- url("http://shakespeare.mit.edu/much_ado/full.html")
#'
#' # Apply some basic preprocessing
#' .preprocess <- function(x) {
#' # Remove character names and locations (boldfaced in original html)
#' x <- gsub("<b>[A-z]+</b>", "", x)
#' # Remove other html tags
#' x <- gsub("<[^>]+>||<[^>]+$||^[^>]+>$", "", x)
#' # Apply standard preprocessing including lower-case
#' x <- kgrams::preprocess(x)
#' return(x)
#' }
#'
#' .tknz_sent <- function(x) {
#' # Tokenize sentences keeping Shakespeare's punctuation
#' x <- kgrams::tknz_sent(x, keep_first = TRUE)
#' # Remove empty sentences
#' x <- x[x != ""]
#' return(x)
#' }
#'
#' f <- kgram_freqs(con, 3, .preprocess, .tknz_sent, batch_size = 1000)
#' summary(f)
#'
#' query(f, c("leonato", "thy", "smartphones")) # c(145, 52, 0)
#' f <- kgram_freqs(url("http://my.website/my_text_file.txt"), 3)
#' }
#' @name kgram_freqs
NULL
Expand Down
53 changes: 51 additions & 2 deletions cran-comments.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## kgrams v0.1.4
## kgrams v0.1.5

`kgrams` v0.1.2 got archived because the former vignette used an online data
source which became unavailable. This online source has been replaced by local
Expand All @@ -10,4 +10,53 @@ R CMD CHECK produces a Note:
All declared Imports should be used.

This note is spurious, as the imported package is used. It is only called from
C++ source code, which may be the reason behind this false positive.
C++ source code, which may be the reason behind this false positive.

---

### Follow-up to CRAN review

#### Comment 1

The Title field should be in title case. Current version is:
'Classical k-gram Language Models'
In title case that is:
'Classical k-Gram Language Models'

"k-gram" is a mathematical term, as such it should be considered as a
single word. In particular, "gram" does not need to be capitalized.

#### Comment 2

Please omit the redundant "Tools for" at the beginning of your description.

Done.

#### Comment 3

If there are references describing the methods in your package, please add these in the description field of your DESCRIPTION file in the form
authors (year) <doi:...>
authors (year) <arXiv:...>
authors (year, ISBN:...)
or if those are not available: <https:...>
with no space after 'doi:', 'arXiv:', 'https:' and angle brackets for auto-linking.
(If you want to add a title as well please put it in quotes: "Title")

The package implements a manifold of mathematical methods for language models
(that can, to some extent, be considered "classical" literature). These are
properly referenced throughout the package documentation.
Documenting them in the DESCRIPTION would require citing tens of articles (some
of them published in the beginning of XX-th century), which I think is beside
the point.

#### Comment 4

\dontrun{} should only be used if the example really cannot be executed (e.g. because of missing additional software, missing API keys, ...) by the user. That's why wrapping examples in \dontrun{} adds the comment ("# Not run:") as a warning for the user.
Does not seem necessary.
Please unwrap the examples if they are executable in < 5 sec, or replace \dontrun{} with \donttest{}.

The only two examples that are left wrapped in the \dontrun{} command
cannot be run with 100% confidence - because they reference a dummy
"my_text_file.txt" local or online resource, which does not exist. I believe
this is the most transparent way to document (abstractly) the relevant features
here.
29 changes: 2 additions & 27 deletions man/kgram_freqs.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion man/kgrams-package.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 33af34f

Please sign in to comment.