Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SuffixTree segfaults #274

Open
jilljenn opened this issue Oct 18, 2024 · 7 comments
Open

SuffixTree segfaults #274

jilljenn opened this issue Oct 18, 2024 · 7 comments

Comments

@jilljenn
Copy link

Any test like

#include <bits/stdc++.h>
using namespace std;

#define rep(i, a, b) for(int i = a; i < (b); ++i)
#define all(x) begin(x), end(x)
#define sz(x) (int)(x).size()
typedef long long ll;
typedef pair<int, int> pii;
typedef vector<int> vi;

/**
 * Author: Unknown
 * Date: 2017-05-15
 * Source: https://e-maxx.ru/algo/ukkonen
 * Description: Ukkonen's algorithm for online suffix tree construction.
 *  Each node contains indices [l, r) into the string, and a list of child nodes.
 *  Suffixes are given by traversals of this tree, joining [l, r) substrings.
 *  The root is 0 (has l = -1, r = 0), non-existent children are -1.
 *  To get a complete tree, append a dummy symbol -- otherwise it may contain
 *  an incomplete path (still useful for substring matching, though).
 * Time: $O(26N)$
 * Status: stress-tested a bit
 */

struct SuffixTree {
	enum { N = 20000, ALPHA = 26 }; // N ~ 2*maxlen+10
	int toi(char c) { return c - 'a'; }
	string a; // v = cur node, q = cur position
	int t[N][ALPHA],l[N],r[N],p[N],s[N],v=0,q=0,m=2;

	void ukkadd(int i, int c) { suff:
		if (r[v]<=q) {
			if (t[v][c]==-1) { t[v][c]=m;  l[m]=i;
				p[m++]=v; v=s[v]; q=r[v];  goto suff; }
			v=t[v][c]; q=l[v];
		}
		if (q==-1 || c==toi(a[q])) q++; else {
			l[m+1]=i;  p[m+1]=m;  l[m]=l[v];  r[m]=q;
			p[m]=p[v];  t[m][c]=m+1;  t[m][toi(a[q])]=v;
			l[v]=q;  p[v]=m;  t[p[m]][toi(a[l[m]])]=m;
			v=s[p[m]];  q=l[m];
			while (q<r[m]) { v=t[v][toi(a[q])];  q+=r[v]-l[v]; }
			if (q==r[m])  s[m]=v;  else s[m]=m+2;
			q=r[v]-(q-r[m]);  m+=2;  goto suff;
		}
	}

	SuffixTree(string a) : a(a) {
		fill(r,r+N,sz(a));
		memset(s, 0, N);
		memset(t, 0, N);
		fill(t[1],t[1]+ALPHA,0);
		s[0] = 1; l[0] = l[1] = -1; r[0] = r[1] = p[0] = p[1] = 0;
		rep(i,0,sz(a)) ukkadd(i, toi(a[i]));
	}

	// example: find longest common substring (uses ALPHA = 28)
	pii best;
	int lcs(int node, int i1, int i2, int olen) {
		if (l[node] <= i1 && i1 < r[node]) return 1;
		if (l[node] <= i2 && i2 < r[node]) return 2;
		int mask = 0, len = node ? olen + (r[node] - l[node]) : 0;
		rep(c,0,ALPHA) if (t[node][c] != -1)
			mask |= lcs(t[node][c], i1, i2, len);
		if (mask == 3)
			best = max(best, {len, r[node] - len});
		return mask;
	}
	
    auto LCS(string s, string t) {
		SuffixTree st = SuffixTree(s + "$" + t + "#");
		st.lcs(0, sz(s), sz(s) + 1 + sz(t), 0);
		return st.best;
	}
};

int main(void) {

    std::string s = "abcde#cdefg$";
    auto st = SuffixTree(s);

    return 0;
}

Results in segmentation fault.

@tyilo
Copy link
Contributor

tyilo commented Oct 18, 2024

Your ALPHA is wrong. Your suffix tree can only store a-z, not # and $.

@jilljenn
Copy link
Author

Thanks but even with this it was segfaulting.

int main(void) {

    std::string s = "abcde";
    std::string t = "cdefg";

    SuffixTree *st = new SuffixTree(s + "{" + t + "}");
    
    st->lcs(0, sz(s), sz(s) + 1 + sz(t), 0);
    auto result = st->best;
    cout << st->best.first << " " << st->best.second << endl;

    return 0;
}

This works.

@jilljenn
Copy link
Author

I think the default value allocates too much for most systems?

@jilljenn
Copy link
Author

Probably the problem is:
https://github.com/kth-competitive-programming/kactl/blob/main/content/strings/SuffixTree.h#L61
Default ALPHA should be at least 28 maybe?

@bjorn-martinsson
Copy link
Contributor

Your ALPHA is wrong. Your suffix tree can only store a-z, not # and $.

Thanks but even with this it was segfaulting.

int main(void) {

    std::string s = "abcde";
    std::string t = "cdefg";

    SuffixTree *st = new SuffixTree(s + "{" + t + "}");
   ...

This string "abcde{cdefg}" contains { and } which are not characters in a-z...

That said, the documentation for this algorithm is confusing. There is a comment about ALPHA=28 in the code. But I don't know when/why you need to set ALPHA=28.

@jilljenn
Copy link
Author

Sorry I can explain :) As I edited the code too much.

		SuffixTree st(s + (char)('z' + 1) + t + (char)('z' + 2));

This requires ALPHA = 28. But this could be the default.
My example with { and } required ALPHA = 30.
I think one problem was the array t that is by default 200010 × 26 which may be too much for the stack?

@tyilo
Copy link
Contributor

tyilo commented Oct 19, 2024

Issues with your code:

  • ALPHA should be 28
  • You have changed LCS to use $ and # instead of { ('z' + 1) and | ('z' + 2)
  • You have changed the memset calls: (They are both using the wrong size and you set the values of t to 0 instead of -1).
-		memset(s, 0, sizeof s);
-		memset(t, -1, sizeof t);
+		memset(s, 0, N);
+		memset(t, 0, N);

You can run ulimit -s unlimited on Linux to increase the stack size. Most online judges also does this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants